linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Lakshmanan M	c041ad5b4b	gpu: nvgpu: split nvgpu power on sequence into 2 stages 1) nvgpu poweron sequence split into two stages: - nvgpu_early_init() - Initializes the sub units which are required to be initialized before the grgmr init. For creating dev node, grmgr init and its dependency unit needs to move to early stage of GPU power on. After successful nvgpu_early_init() sequence, NvGpu can indetify the number of MIG instance required for each physical GPU. - nvgpu_finalize_poweron() - Initializes the sub units which can be initialized at the later stage of GPU power on sequence. - grmgr init depends on the following HAL sub units, * device - To get the device caps. * priv_ring - To get the gpc count and other MIG config programming. * fb - MIG config programming. * ltc - MIG config programming. * bios, bus, ecc and clk - dependent module of priv_ring/fb/ltc. 2) g->ops.xve.reset_gpu() should be called before GPU sub unit initialization. Hence, added g->ops.xve.reset_gpu() HAL in the early stage of dGPU power on sequence. 3) Increased xve_reset timeout from 100ms to 200ms. 4) Added nvgpu_assert() for gpc_count, gpc_mask and max_veid_count_per_tsg for identify the GPU boot device probe failure during nvgpu_init_gr_manager(). JIRA NVGPU-6633 Change-Id: I5d43bf711198e6b3f8eebcec3027ba17c15fc692 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521894 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-29 14:23:48 -07:00
Lakshmanan M	3f8c562004	gpu: nvgpu: Add nvgpu_early_poweron() support 1) NvGpu dev node needs to be created in gpu power on early stage to avoid latency introduced by udevd. For creating dev node, device and grmgr init needs to move to early stage of GPU power on. After grmgr init, NvGpu can identify the number of MIG instance required for each physical GPU. For that, added a new API nvgpu_early_poweron() to handle early init which is required for before dev node creation. 2) Removed fifo dependency in nvgpu_init_gr_manager() 3) Used get_max_subctx_count() directly to query the veid/subctx count. JIRA NVGPU-6633 Change-Id: Ib9d7c3e184c71237b0da9305515ccd8ceda1d5ad Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2517173 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-22 15:00:54 -07:00
Seshendra Gadagottu	21e1328ea1	gpu: nvgpu: add fb gops for set_atomic_mode Separated set_atomic_mode functionality from init_fs_state/enable_nvlink and created new fb gops for set_atomic_mode. In gpu init sequence, set_atomic_mode is called after acr_construct_execute to take care of design changes required for nvgpu-next architectures. Updated fb_gv11b_init_test to use set_atomic_mode gops along with init_fs_state. Bug 3268664 Change-Id: I1ab9eb21cc4cce77f3325c4e8821a75b6e85fba2 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2508095 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-22 14:58:36 -07:00
absalam	3ec369d60a	gpu: nvgpu: Disable Clock Arbitor for TU104 This patch is to disable the clock arbitor for TU104. TU104 is not a POR for Drive 6.0 so disabling it to easy migration of clk arb for GA100. As a first step all the NVRM Clock tests will be skipped by setting NVGPU_SUPPORT_CLOCK_CONTROLS to false for TU104. Then clk arbitor will be rewritten for GA100 and enabled back. This patch implements by adding a new flag NVGPU_CLK_ARB_ENABLED which holds the status of clk arbitor for each platform and disables them for TU104 Bug 200699763 Change-Id: I51cd5c7821bdc0b48080c17a70735925b278ddf5 Signed-off-by: absalam <absalam@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515086 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-20 07:47:38 -07:00
Lili Sang	3f0ea98b73	gpu: nvgpu: Add get_gr_context support for Linux. Implement the feature of retrieving gr context contents for all chips. Two IOCTLs, NVGPU_DBG_GPU_IOCTL_GET_GR_CONTEXT_SIZE and _GET_GR_CONTEXT, are added. Bug 3102903 Change-Id: If11006f4e294f190785a2c3159ca491b9f3b5187 Signed-off-by: Lili Sang <lilis@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2449183 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Chris Johnson <cwj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:48 -06:00
Antony Clince Alex	c36752fe3d	gpu: nvgpu: sim: make ring buffer independent of PAGE_SIZE The simulator ring buffer DMA interface supports buffers of the following sizes: 4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once 4K is reached. However, this is not always true; for instance, take 64K pages. Hence, replace PAGE_SIZE with SIM_BFR_SIZE. Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace latter with former. Bug 200658101 Jira NVGPU-6018 Change-Id: I83cc62b87291734015c51f3e5a98173549e065de Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Prateek sethi	223baa5883	gpu: nvgpu: add support for ACB SLCG on gv11b Register list for ACB SLCG is auto generated with scripts. Add HAL operations to enable/disable ACB clock gating. Bug 200647909 Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549 Signed-off-by: Prateek sethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	2ecb5feaad	gpu: nvgpu: Skip graphics CB programming for MIG Added logic to skip the following graphics CB allocation, map and programming sequence when MIG is enabled. Global CB: 1) NVGPU_GR_GLOBAL_CTX_CIRCULAR 2) NVGPU_GR_GLOBAL_CTX_PAGEPOOL 3) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE 4) NVGPU_GR_GLOBAL_CTX_CIRCULAR_VPR 5) NVGPU_GR_GLOBAL_CTX_PAGEPOOL_VPR 6) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE_VPR 7) NVGPU_GR_GLOBAL_CTX_RTV_CIRCULAR_BUFFER CTX CB: 1) NVGPU_GR_CTX_CIRCULAR_VA 2) NVGPU_GR_CTX_PAGEPOOL_VA 3) NVGPU_GR_CTX_ATTRIBUTE_VA 4) NVGPU_GR_CTX_RTV_CIRCULAR_BUFFER_VA JIRA NVGPU-5650 Change-Id: I38c2859ce57ad76c58a772fdf9f589f2106149af Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2423450 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Debarshi Dutta	38ce6fa717	gpu: nvgpu: change unnamed structs to named structs Following changes are made in this patch. 1) Change unnamed structs within gpu_ops to named structs with the prefix gops_. 2) Each named struct gops_ are moved into a separate gops specific file under include/nvgpu/gops/ 3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h and all other dependent struct gops_ are included in this header. 4) Direct references to include/nvgpu/gops are removed from files as its enough to include gk20a.h. Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Deepak Nibade	a2809088eb	gpu: nvgpu: remove unnecessary hal gops.gr.gr_enable_hw() gops.gr.gr_enable_hw() is a common function and not referred on vGPU. Remove HAL pointer and directly use nvgpu_gr_enable_hw() instead. Jira NVGPU-5648 Change-Id: Id031024ed01f9d890cffb5902cc433800810b219 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403548 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	8cccb49bd2	gpu: nvgpu: collapse nvgpu_gr_prepare_sw into nvgpu_gr_alloc common.gr unit exports a separate API nvgpu_gr_prepare_sw to initialize some SW pieces required for nvgpu_gr_enable_hw(). A separate API is really unnecessary since same initialization can be performed in nvgpu_gr_alloc(). Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw(). Initialize falcon and interrupt structures in loop from nvgpu_gr_alloc(). Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to common init path since netlist parsing need not be done from common.gr unit. It just needs to happen before nvgpu_gr_enable_hw(). Also, trigger nvgpu_gr_free() from gr_remove_support() instead of OS specific paths. Also remove nvgpu_gr_free() calls from probe error paths since nvgpu_gr_alloc is no longer called in probe path. Move interrupt and falcon data structure free calls to nvgpu_gr_free(). Also remove corresponding unit testing code that tests nvgpu_gr_prepare_sw() specifically. Update some unit tests to initialize ecc counters and netlist. Disable some unit tests that fail for reasons unknown. Jira NVGPU-5648 Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	fba96fdc09	gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device Delete the struct nvgpu_engine_info as it's essentially identical to struct nvgpu_device. Duplicating data structures is not ideal as it's terribly confusing what does what. Update all uses of nvgpu_engine_info to use struct nvgpu_device. This is often a fairly straight forward replacement. Couple of places though where things got interesting: - The enum_type that engine_info uses is defined in engines.h and has a bit of SW abstraction - in particular the GRCE type. The only place this seemed to be actually relevant (the IOCTL providing device info to userspace) the GRCE engines can be worked out by comparing runlist ID. - Addition of masks based on intr_id and reset_id; those can be computed easily enough using BIT32() but this is an area that could be improved on. This reaches into a lot of extraneous code that traverses the fifo active engines list and dramtically simplifies this. Now, instead of having to go through a table of engine IDs that point to the list of all host engines, the active engine list is just a list of pointers to valid engines. It's now trivial to do a for-all-active-engines type loop. This could even be turned into a generic macro or otherwise abstracted in the future. JIRA NVGPU-5421 Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	48f1da4dde	gpu: nvgpu: Add bundle skip sequence in MIG mode In MIG mode, 2D, 3D, I2M and ZBC classes are not supported by GR engine. So skip those bundle programming sequence in MIG mode. JIRA NVGPU-5648 Change-Id: I7ac28a40367e19a3e31e63f3e25991c0ed4d2d8b Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397912 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Tejal Kudav	71b005c1ef	gpu: nvgpu: Enter Quiesce if GPU drops off the bus Currently, we reboot the entire system using kernel_restart() if the GPU registers become inaccessible due to GPU disappearing from the bus. GPU hitting high temperatures is one of the reasons we might end up in above scenario. Replace kernel_restart() with quiesce call as a more graceful way of notifying about GPU's unavailability. While entering quiesce state, make sure we do not trigger any register accesses which are bound to fail in this case. Bug 2919899 Change-Id: Ia9d413e04c7d205752414ff3e892f055c4363cce Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398801 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	ae25924393	gpu: nvgpu: print enabled_flags after poweron GPU enabled_flags indicate features supported by nvgpu. Add nvgpu_print_enabled() to print GPU enabled_flags. Print flag value after poweron complete to help during debug. Add verbose function to print flag name and status if gpu_dbg_info is set. JIRA NVGPU-5838 Change-Id: I3b0ddb8c6872f4f3b6101050da087ff553c16f84 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2383531 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	010f818596	gpu: nvgpu: initialize gr struct in poweron path struct nvgpu_gr is right now initialized during probe and from OS specific code. To support multiple instances of graphics engine, nvgpu needs to initialize nvgpu_gr after number of engine instances have been enumerated in poweron path. Hence move nvgpu_gr_alloc() to poweron path and after gr manager has been initialized. Some of the members of nvgpu_gr are initialized in probe path and they too are in OS specific code. Move them to common code in nvgpu_gr_alloc() Add field fecs_feature_override_ecc_val to struct gk20a to store the override flag read from device tree. This flag is later copied to nvgpu_gr in poweron path. Update tpc_pg_mask_store() to check for g->gr being NULL before accessing golden image pointer. Update tpc_fs_mask_store() to return error if g->gr is not initialized. This path needs nvgpu_gr struct initialized. Also fix the incorrect NULL pointer check in tpc_fs_mask_store() which breaks the write path to this sysfs. Jira NVGPU-5648 Change-Id: Ifa2f66f3663dc2f7c8891cb03b25e997e148ab06 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397259 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	2a6fcec078	gpu: nvgpu: add gr manager ops-2 and mig infra-2 This CL covers the code changes related to following support, - Enabled gr manager ops. - Added gr manager init/remove support. - Refactor in gpu instance config infra. - Refactor in gr syspipe gpcs config infra. JIRA NVGPU-5645 JIRA NVGPU-5646 Change-Id: Ib2fab2796d76fe105fc5a08f2c5f9bfa36317f7c Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393550 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Deepak Nibade	08308bc936	gpu: nvgpu: rework pm resource reservation system Current PM resource reservation system is limited to HWPM resources only. And reservation tracking is done using boolean variables. New upcoming profiler support requires reservation for all the PM resources like SMPC and PMA stream. Using boolean variables is not scalable and confusing. Plus the variables have to be replicated on gpu server in case of virtualization. Remove flag tracking mechanism and use list based approach to track all PM reservations. Also, current HALs are defined on debugger object. Implement new HALs in new pm_reservation object since it is really an independent functionality. Add new source file common/profiler/pm_reservation.c which implements functions to reserve/release resources and to check if any resource is reserved or not. Add common/vgpu/pm_reservation_vgpu.c for vGPU which simply forwards the request to gpu server. Define new HAL object gops.pm_reservation and assign above functions to below respective HALs : g->ops.pm_reservation.acquire() g->ops.pm_reservation.release() g->ops.pm_reservation.release_all_per_vmid() Last HAL above is only used for gpu server cleanup of guest OS. Add below new common profiler functions that act as APIs to reserve/ release resources for rest of the units in nvgpu. nvgpu_profiler_pm_resource_reserve() nvgpu_profiler_pm_resource_release() Initialize the meta data required for reservtion system in nvgpu_pm_reservation_init() and call it during nvgpu_finalize_poweron. Clean up the meta data before releasing struct gk20a. Delete below HALs : g->ops.debugger.check_and_set_global_reservation() g->ops.debugger.check_and_set_context_reservation() g->ops.debugger.release_profiler_reservation() Bug 2510974 Jira NVGPU-5360 Change-Id: I4d9f89c58c791b3b2e63099a8a603462e5319222 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367224 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
lm	83cb8be984	nvgpu: linux: uapi: Add MIG new caps 1) In MIG mode, 2D, 3D, I2M and ZBC classes are not supported by GR engine. NvGpu shall expose the HWCaps through "struct nvgpu_gpu_characteristics". 2) NvGpu shall expose the following MIG related new caps through "struct nvgpu_gpu_characteristics". * mig_enabled - Flag to indicate whether MIG is enabled/disabled. * gpu_instance_id - GPU instaces Id. * gr_instance_id - graphics execution unit id. * gr_sys_pipe_id - Sys pipe id of GR engine. 3) populate num_ppc_per_gpc - Pixel Processing cluster per GPC 4) populate max_veid_count_per_tsg - Maximum veid count per TSG 5) populate num_sub_partition_per_fbpa - Sub partition per FBPA. JIRA NVGPU-5762 Change-Id: I06b5bcd3f568eb0b9c78c8fc6ce155b39aaeaba5 Signed-off-by: lm <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2352100 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	8c5972ac7f	gpu: nvgpu: Move device de-init call Move the device de-init call to when the gk20a struct is being freed; the device list can live for as long as the gk20a struct does. This will be a problem later, since the current location causes the device structs to get freed and allocatoed over and over. That'll cause gross corruption in the FIFO code when the engine_info struct is replaced with pointers to the device structs. JIRA NVGPU-5421 Change-Id: If4e08ea88dbcae7acd599e3fad29f72ece63b8e0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361269 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	4dcfbc19de	gpu: nvgpu: Trigger quiesce on spurious FBPA intr In Bug 200588835, the spurious FBPA interrupts are seen on couple of boards. These interrupts were found to be EDC (Error detection and Correction) interrupts which are triggered due to ECC errors. The EDC registers are not exposed to the driver, so the interrupt status register cannot be cleared; resulting in interrupt storm. Also, it was concluded that only bad HW can cause this failure scenario. So, in the ISR for FBPA interrupts, get the GPU into quiesce state as we don't expect the GPU to be in usable state post such unrecoverable errors. Adapt the quiesce code for Linux build too. 1. On Linux, we cannot exit the nvgpu process after quiesce like we do on QNX. So, add nvgpu_disable_irqs() call to quiesce implementation which is done as part of process exit handler on QNX. Masking interrupts which is already done as part of quiesce would be sufficient in most cases, but to be fail-safe disable_irqs too. 3. Also, the IOCTL code looks at g->sw_ready, hence add nvgpu_start_gpu_idle() to set g->sw_ready to false along with setting NVGPU_DRIVER_IS_DYING = true. We expect the nvgpu_sw_quiesce() call to finish before quiesce thread wakes up from 50ms sleep. Hence, critical step like nvgpu_start_gpu_idle() is added to nvgpu_sw_quiesce(), whereas the somewhat redundant disable IRQs call is added to quiesce thread. nvgpu_fifo_quiesce() was called twice by mistake; remove one of the them. Bug 2919899 Bug 200588835 Change-Id: I9beec688c2e1c0d8dfc1327ddf122684576f8684 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354537 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	fc5b45ea83	gpu: nvgpu: move init_ltc_support sequence Currently, ltc fs_state is initialized during ltc init support. However, ltc cbc_param and cbc_param2 registers do not seem to be providing correct data if ltc.init_fs_state is called before fb.init_fs_state. - Create fb.init_fb_support hal to initialize fb. - Trigger init_fb_support before init_ltc_support. Bug 2969956 Bug 2957808 JIRA NVGPU-4666 Change-Id: I54d697d27b9d9c6318c4ef459d215b6f82cd5571 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2345673 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
tkudav	957b19092f	gpu: nvgpu: Enable Quiesce on all builds Make Recovery and quiesce co-exist to support quiesce state on unrecoverrable errors. Currently, the quiesce code is wrapped under ifndef CONFIG_NVGPU_RECOVERY. Isolate the quiesce code from recovery config, thereby enabling it on all builds. On Linux, the hung_task checker(check_hung_uninterruptible_tasks() in kernel/hung_task.c) complains that quiesce thread is stuck for more than 120 seconds. INFO: task sw-quiesce:1068 blocked for more than 120 seconds. The wait time of more than 120 seconds is expected as quiesce thread will wait until quiesce call is triggered on fatal unrecoverable errors. However, the INFO print upsets the kernel_warning_test(KWT) on Linux builds. To fix the failing KWT, change the quiesce task to interruptible instead of uninterruptible as checker only looks at uninterruptible tasks. Bug 2919899 JIRA NVGPU-5479 Change-Id: Ibd1023506859d8371998b785e881ace52cb5f030 Signed-off-by: tkudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342774 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sami Kiminki	23cda4f4a9	gpu: nvgpu: add PDI for TU104 (Linux) Add reporting for the per-device identifier (PDI) in the Linux GPU characteristics. Implement PDI read for TU104. Bug 2957580 Signed-off-by: Sami Kiminki <skiminki@nvidia.com> Change-Id: I6ac0e4f74378564d82955b431d4c1fd6c0daeb13 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346933 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Konsta Hölttä	dd2fb50a1a	gpu: nvgpu: require deferred cleanup for aggressive sync destroy Aggressive sync destroy is used on some platforms where the amount of syncpoints is limited. It can cause sync objects to get allocated and freed in the submit path and when jobs are cleaned up, so require deferred cleanup. Allocations do not belong to job tracking in a deterministic submit path. Although this has been technically allowed before, deterministic channels have likely not been a priority on those old platforms with aggressive sync destroy set. Update virtualized gp10b platform data to match on a gp10b-vgpu compat string instead of gk20a-vgpu. gk20a (Tegra T124) hasn't been supported for a long time. Delete the aggressive sync destroy field from this platform. It's got enough syncpoints to not dynamically allocate them; having this property set for gp10b-vgpu has likely been a mistake. This is not a completely pure cherry-pick: also extend the gpu characteristics to not advertise full deterministic submit support when aggressive sync destroy is off. This platform flag cannot be adjusted by the user unlike many other flags. Jira NVGPU-4548 Change-Id: I283f546d48b79ac94b943d88e5dce55710858330 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322042 (cherry picked from commit b1ba2b997b2174e365bcb0782ef3e67260ff9e57) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328411 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	4f80c6b8a9	gpu: nvgpu: add channel_user_syncpt Refactor user managed syncpoints out of the channel sync infrastructure that deals with jobs submitted via the kernel api. The user syncpt only needs to expose the id and gpu address of the reserved syncpoint. None of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been ideal to couple with the user-allocated syncpts. With user syncpts now provided by channel_user_syncpt, remove the user_managed flag from the kernel sync api. This allows moving all the kernel submit sync code to be conditionally compiled in only when needed, and separates the user sync functionality in a more clear way from the rest with a minimal API. [this is squashed with commit 5111caea601a (gpu: nvgpu: guard user syncpt with nvhost config) from https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009] Jira NVGPU-4548 Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768 (cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	d0ffb335dc	gpu: nvgpu: move nvgpu_has_syncpoints nvgpu_has_syncpoints is more general than a channel synchronization related, so move it to nvhost.c from channel_sync.c. Move the declaration from gk20a.h to nvhost.h. As the debugfs knob is Linux related, move it from struct gk20a to struct nvgpu_os_linux. Jira NVGPU-4548 Change-Id: I4236086744993c3daac042f164de30939c01ee77 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318814 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Philip Elcan	20a4080be0	gpu: nvgpu: quiesce: stop thread gracefully Previously, nvgpu_sw_quiesce_remove_support() stopped the quiesce thread abruptly with nvgpu_thread_stop(), which could mean the thread was killed while still waiting on the cond. Then when the cond was destroyed, there may be an error since the underlying implementation may think there is still a thread waiting (such as the Posix implementation). Change nvgpu_sw_quiesce_remove_support() to use nvgpu_thread_stop_graceful() and signal the cond in the callback after the thread is marked to be stopped. The quiesce thread will then wake up from the cond wait and see the thread should stop. JIRA NVGPU-4987 Change-Id: I29322d7867acc33a91092016c540e00bb1ae945a Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2306024 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vaibhav Kachore	bbb63c0a8c	gpu: nvgpu: remove "trace/events/gk20a.h" from QNX build - "include/trace/events/gk20a.h" file was having GPL2 license (which should not used for QNX code). This file was used for compiling linux userspace driver("libnvgpu-drv.so") and was used for unit testing on QNX. - This patch removes stubs in "include/trace/events/gk20a.h" file. (which were used for linux userspace driver.) - For QNX driver, "nvgpu_rmos/trace/events/gk20a.h" was used. This patch moves that file to "include/nvgpu/posix/trace_gk20a.h" and does relevant license change. This same file will be used for linux userspace driver. - This patch also creates a new file "include/nvgpu/trace.h" which selects proper trace file depending on the config. Bug 2802414 Change-Id: Icdfb251e5698073f986753a969e804161af3ecc5 Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286388 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Thomas Fleury	e257f96911	gpu: nvgpu: use cond signal for SW quiesce nvgpu_cond_broadcast error code is not checked in nvgpu_sw_quiesce, which causes a Coverity violation. Use nvgpu_cond_signal instead, since only one thread needs to be woken up. Jira NVGPU-4512 Change-Id: I4f6c3956f792487ba9c1eed09db09fd86ac56ffe Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286056 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Philip Elcan <pelcan@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
ajesh	1041167668	gpu: nvgpu: remove usage of __must_check Remove the usage of __must_check compiler directive. Also rename __user as nvgpu_user and make the required changes for linux and posix builds. Jira NVGPU-4903 Change-Id: If4a18761cca84eb12e0babc0d528666673fca9e8 Signed-off-by: ajesh <akv@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283404 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Thomas Fleury	d5833d1b8e	gpu: nvgpu: add BUG callbacks to SW quiesce After initializing support for SW quiesce, register callback to be invoked in case of BUG(). The callback will invoke nvgpu_sw_quiesce with "g" parameter. Jira NVGPU-4512 Change-Id: Id6bd73268d832e003cf66534bd0cbaa4b1f32a6c Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283011 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	e0a6000456	gpu: nvgpu: update SW quiesce Update SW quiesce as follows: - After waking up sw_quiesce_thread, nvgpu_sw_quiesce masks interrupts, then disables and preempts runlists without lock. There could be still a concurrent thread that would re-enable the runlist by accident. This is very unlikely and would mean we are not in mission mode anyway. - In sw_quiesce_thread, wait NVGPU_SW_QUIESCE_TIMEOUT_MS, to leave some time for interrupt handler to set error notifier (in case of HW error interrupt). Then disable and preempt runlists, and set error notifier for remaining channels before exiting the process. Also modified nvgpu_can_busy to return false in case SW quiesce is pending. This will make subsequent devctl to fail. Jira NVGPU-4512 Change-Id: I36dd554485f3b9b08f740f352f737ac4baa28746 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2266389 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Scott Long	5ee9a446b5	gpu: nvgpu: misra 12.1 fixes MISRA Advisory Rule states that the precedence of operators within expressions should be made explicit. This change removes the Advisory Rule 12.1 violations from various common units. Jira NVGPU-3178 Change-Id: I4b77238afdb929c81320efa93ac105f9e69af9cd Signed-off-by: Scott Long <scottl@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2277480 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	f3421645b2	gpu: nvgpu: compile out fb and ramin non-fusa code fbpa related functions are not supported on igpu safety. Don't compile them if CONFIG_NVGPU_DGPU is not set. Also compile out fb and ramin hals that are dgpu specific. Update the tests for the same. JIRA NVGPU-4529 Change-Id: I1cd976c3bd17707c0d174a62cf753590512c3a37 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2265402 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	fba516ffae	gpu: nvgpu: enable PMU ECC interrupt early PMU IRQs were not enabled assuming entire functionality for LS PMU. Debugging early init issues of PMU falcon ECC errors triggered during nvgpu power-on will be cumbersome if interrupts are not enabled early. FMEA analysis of the nvgpu init path also requires this interrupt be enabled earlier. Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron. pmu_enable_irq is updated to enable interrupts differently for safety and non-safety. PMU interrupts disabling is moved out of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new wrapper API nvgpu_pmu_enable_irq. PMU ECC init and isr mutex init is moved to the beginning of nvgpu_pmu_early_init as for safety, ls pmu code path is disabled. Fixed the pmu_early_init dependent and mc interrupt related unit tests. Update the doxygen for changed functions. JIRA NVGPU-4439 Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2251732 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	13b02091bb	gpu: nvgpu: init fbpa ecc before initializing fbpa hw fbpa ecc counters need to be allocated before enabling the fbpa irqs. Bug 200572453 Change-Id: Ifdf31f342bf86cd905bf57dbee654ac5483ee777 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2263979 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kadamati	42ccc21c62	gpu: nvgpu: fix static violations in common * Updated types and added error checks * Modified GR condition for ctxsw disable count CERT-C error check was added to detect error on integer overflow But below logic couldn't detect first overflow, so updated condition INT_MAX < gr->ctxsw_disable_count --> it became true after overflow So, we didn't detected in first overflow and lead to assert on enable JIRA NVGPU-3400 Change-Id: I6b0265a464f8f19efa7b0761612c6e9ffb3bd2bd Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2206282 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Lakshmanan M	a52ee77837	gpu: nvgpu: Add SM diversity gpu characteristic flag To achieve permanent fault coverage, the CTAs launched by each kernel in the mission and redundant contexts must execute on different hardware resources. This feature requires a change in software to make it possible to modify the virtual SM id to TPC mapping across mission and redundant contexts. This CL adds only SM diversity flags which are exposed to its clients through ioctl/devctl interfaces. Actual virtual SM id to TPC mapping implementation will be part of upcoming patch sets. Added NvGpu CFLAGS to identify the safety build "CONFIG_NVGPU_BUILD_CONFIGURATION_IS_SAFETY" JIRA NVGPU-4133 Change-Id: I5a18256780e6726e399e39c1c8d155d2ef07d7bd Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2250461 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	a8866825d2	gpu: nvgpu: fix the doxygen comments due to ECC and MC refactoring changes nvgpu_mc_log_pending_intrs is debugging related function hence compile out that and related functionality under CONFIG_NVGPU_NON_FUSA. nvgpu_mc_intr_enable is applicable for older chips hence compile out under CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Update BUS, CE, ECC, FIFO, MC, PRIV_RING, GR, LTC, FB, PMU units' doxygen comments based on recent ECC and MC refactoring. JIRA NVGPU-4439 Change-Id: I337318683d6311b9c2b5748f2fb07dff29a6584f Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2252853 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	b8c25a5a55	gpu: nvgpu: unit: init: add quiesce testing Add testing of quiesce functionality to init unit test. JIRA NVGPU-3981 Change-Id: Idc64179bc8d532bea385e705d96fb4b376d15cd9 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2247154 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Divya Singhatwaria	84a24c9593	gpu: nvgpu: Remove TPC powergate from safety build - Remove non-safe TPC powergate feature from the safety build by introducing a new flag: CONFIG_NVGPU_TPC_POWERGATE - Move nvgpu_init_power_gate_gr() under same compile time flag. and move HAL function gr_gv11b_powergate_tpc() to tpc_gv11b.c - Also, remove the negative test scenario and usage of tpc_powergate from unit tests JIRA NVGPU-4149 Change-Id: If489482401e94de499e472b16b1bc091b00992e6 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2242323 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	a8c9c800cd	gpu: nvgpu: reorganization of MC interrupts control Previously, unit interrupt enabling/disabling and corresponding MC level interrupt enabling/disabling was not done at the same time. With this change, stall and nonstall interrupt for units are programmed at MC level along with individual unit interrupts. Kept access to MC interrupt registers through mc.intr_lock spinlock. For doing this separated CE and GR interrupt mask functions. mc.intr_enable is only used when there is global interrupt control to be set. Removed mc_gp10b.c as mc_gp10b_intr_enable is now removed. Removed following functions - mc_gv100_intr_enable, mc_gv11b_intr_enable & intr_tu104_enable. Removed intr_pmu_unit_config as we can use the generic unit interrupt control function. JIRA NVGPU-4336 Change-Id: Ibd296d4a60fda6ba930f18f518ee56ab3f9dacad Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2196178 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	daf5475f50	gpu: nvgpu: split ecc support per GPU HW unit To enable ecc interrupts early during nvgpu_finalize_poweron, ecc support has to be enabled early. ecc support was being initialized together for GR, LTC, PMU, FB units late in the poweron sequence. Move the ecc init for each unit to respective unit's init functions. And separate out the hal ecc functions from GR ecc unit to respective hal units. JIRA NVGPU-4336 Change-Id: I2c42fb6ba3192dece00be61411c64a56ce16740a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2239153 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	8dd18e6f5e	gpu: nvgpu: init: reduce CCM for nvgpu_finalize_poweron Change how nvgpu_finalize_poweron() detects and reports units that do not need initializing. This reduces the Code Complexity of this function. This update reduces the TCC metric from 12 to 9. JIRA NVGPU-4327 Change-Id: I95a5d60bc90fe09358ed47a54eca700bd51d688f Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2235339 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	06a8fd2ecb	gpu: nvgpu: init: reduce CCM for nvgpu_prepare_poweroff Change how nvgpu_prepare_poweroff() handles multiple errors from the unit poweroff functions. Previously, the last error was returned. It doesn't really matter much which error is returned, just return an error. This update reduces the TCC metric from 11 to 8. JIRA NVGPU-4327 Change-Id: Ic84f7e25eef2657c3d11881f221c26c9b09bed27 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2235338 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	b26acdeb87	gpu: nvgpu: move mc_boot_0 function to hals and rename to get_chip_details This function gets the GPU chip architecture, implementation and revision information by reading the MC boot register, hence it is more suited to be located in HAL files. test_check_gpu_state is now being run after test_hal_init as the gops.mc needs to be initialized for test_check_gpu_state subtest. JIRA NVGPU-2524 Change-Id: I85355af11d3505a9eb4f10a3fe4e6d9b56285047 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2226018 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	2edf3db10a	gpu: nvgpu: move mc gpu_ops out of gk20a.h and add doxygen comments for HALs gk20a.h will include gops_mc.h to contain the mc ops definitions. Add doxygen comments for the HAL functions that are called directly. Also move mc_gp10b_intr_pmu_unit_config to non-fusa HAL file. JIRA NVGPU-2524 Change-Id: I4f326332d7842211b004b372d79fac9fe6ed40e7 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2226017 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	d22fc21a5c	gpu: nvgpu: sync with sw quiesce thread on init Sometimes nvgpu_sw_quiesce_thread does not get a chance to run before common.init unit tests complete. When it finally gets scheduled, related gk20a context is invalid and it crashes in the pthread wrapper. Make sure nvgpu_sw_quiesce_thread has started in nvgpu_sw_quiesce_init_support, to avoid such issues. Bug 2732985 Change-Id: I0a46f271a926b5f0c203b465f69556b2b46d96c5 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2222560 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00

1 2 3

150 Commits