linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-24 02:22:34 +03:00

Author	SHA1	Message	Date
Kishan	5adf709506	gpu: nvgpu: Enable GPCCS debug data logging. Currently in case of any fecs error, we only dump fecs cxtsw fw related registers, mailboxes and trace registers. With this change, we want to ensure we dump gpccs register space as well. This will help in debugging ctxsw related failures JIRA NVGPU-9560 Bug 3907163 Change-Id: I61e25883da4455ea1412ca70c5fc3377d9a786a3 Signed-off-by: Kishan <kpalankar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2850402 Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2023-02-06 16:38:32 -08:00
Divya Singhatwaria	464de27507	async cmd resp for gv11b - When DISALLOW cmd is sent from driver to PMU the actual completion of the disallow will be acknowledged by PMU via a PG EVENT: ASYNC_CMD_RESP. - Disallow needs a delayed ACK from PMU in order to disable the ELPG. - If ELPG is already engaged, the DISALLOW cmd will trigger ELPG exit and then transition to PMU_PG_STATE_DISALLOW. - After this whole process is completed, PMU will send DISALLOW_ACK through ASYNC_CMD_RESP msg. - After disallow command is sent from the driver, NvGPU driver waits/polls for disallow command ack. This is sent immediately by msg framework of PMU. - Then, the driver will poll/wait for ASYNC_CMD_RESP event which is the delayed DISALLOW ACK. - The driver captures the ASYNC_CMD_RESP sent from PMU. - set disallow_state to ELPG_OFF. - If the driver does not wait/poll for this delayed disallow ack from PMU, it can result in erros as PMU is still processing DISALLOW cmd but the driver progressed further. Bug 3580271 Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2 Signed-off-by: Divya <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500 (cherry picked from commit `fb019bf43a`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694312 Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-04 15:09:38 -07:00
Sagar Kamble	5fb06d03ca	gpu: nvgpu: stop ELPG init thread during unload ELPG initialization thread creation can fail when the process is killed. That leads to driver resume failure. That thread was stopped on suspend and re-created on resume. To avoid the issue above, don't stop the ELPG thread in suspend and let the first created thread handle the ELPG state transitions always. And stop the ELPG thread during unload. bug 3345977 bug 200685277 Change-Id: I8952edf8d1664ed258f238e265002e716d1bf5c2 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573763 (cherry picked from commit `f4571194b0`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2574436 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-12 05:39:59 -07:00
Sagar Kamble	bb8bf1c76c	gpu: nvgpu: fix tsg unbind failure paths nvgpu_tsg_unbind_channel_common failure handling missed channel.clear & nvgpu_tsg_set_mmu_debug_mode calls. Bug 200711183 Change-Id: I19fd53be55db9df725b7cf467b2673e4cd29deb5 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521972 (cherry picked from commit `89ec2afbd4`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524251 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-05-04 14:40:56 -07:00
Sagar Kamble	13fc430775	gpu: nvgpu: retry tsg unbind if NEXT is set The NEXT bit can remain set for the channel if timeslice expires before scheduler clears it. Due to this nvgpu fails TSG unbind and in turn nvrm_gpu fails channel close. In this case, checking the channel hw state after some time can help see NEXT bit cleared by scheduler. Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again. Bug 3144960 Bug 200520811 Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391 (cherry picked from commit `cf287a4ef5`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2479106 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-03-19 14:39:39 -07:00
Divya Singhatwaria	9170f2b77c	gpu: nvgpu: remove ZBC save/restore by PMU - ZBC save/restore registers are removed in GP10B PMU ucode. - These registers are saved/restored from CTXSW ucode during ELPG entry/exit. - Accessing the ZBC registers will cause PMU EXTERR error. - To resolve this, ZBC functionality is removed from GP10B feature list in PMU ucode. - From NvGPU driver, set NVGPU_PMU_ZBC_SAVE bit to false for GP10B - Updated the GP10B PMU app version for the ucode: https://git-master.nvidia.com/r/c/tegra/kernel-firmware-t18x/+/2476260 P4 CL link related to this PMU ucode change: https://p4sw-swarm.nvidia.com/changes/29594520 Bug 3233071 Bug 200696431 Change-Id: If3f1707b79699e7e2e65367418b25ac71b09cf0b Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2476259 Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-03-17 09:54:54 -07:00
Nitin Kumbhar	7882f15ff6	gpu: nvgpu: fix possible buffer overflow issue As sprintf() is used to populate pool_name[20], it can overflow for larger u32 values (u32 max decimal number chars are 10) i.e. 20 < strlen("semaphore_pool-") i.e. 15 + 10. Fix this overflow by removing pool_name as it's not used. Bug 2626446 Bug 3273414 Change-Id: I4e0a222a2cd34dcd09e69294bc46e2242abb04bb Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2205356 (cherry picked from commit baa86cf134ee6753beabfa974a10faffc5775ee8) Signed-off-by: ByungKuk Seo <bseo@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2496976 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Harsh Sinha <hsinha@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-03-12 07:54:52 -08:00
Sumit Gupta	535e9b1dd7	gpu: nvgpu: fix mutex wrong acquire Wrong acquire/release sequence. DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current) .... CPU: 4 PID: 5404 Comm: cyclictest.sh Not tainted 4.9.201-rt134-tegra #1 Hardware name: Jetson-AGX (DT) .... Call trace: [<ffffff800810e4f8>] debug_rt_mutex_unlock+0x58/0x68 [<ffffff8008f34d0c>] rt_mutex_unlock+0x4c/0xb0 [<ffffff8008f36ea8>] _mutex_unlock+0x20/0x2c [<ffffff8000f69d80>] nvgpu_cg_elcg_set_elcg_enabled+0x78/0xf0 [nvgpu] [<ffffff8000f7bd44>] nvgpu_intr_nonstall_cb+0x21bc/0x22f0 [nvgpu] [<ffffff800875b304>] dev_attr_store+0x44/0x60 [<ffffff80082dca44>] sysfs_kf_write+0x5c/0x78 [<ffffff80082dbd28>] kernfs_fop_write+0xc0/0x1d8 [<ffffff8008245b60>] __vfs_write+0x48/0x128 [<ffffff8008246b3c>] vfs_write+0xac/0x1b8 [<ffffff800824808c>] SyS_write+0x5c/0xc8 Bug 3227296 Suggested-by: Bibek Basu <bbasu@nvidia.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Change-Id: I932a23700539422c07de045dde516c52dd8348cf Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2472903 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Bibek Basu <bbasu@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-02-19 10:39:57 -08:00
prsethi	8cb168632b	gpu: nvgpu: add support for ACB SLCG on gv11b Register list for ACB SLCG is auto generated with scripts. Add HAL operations to enable/disable ACB clock gating. Cherry-pick/manually port from dev-main Bug 200647909 Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549 Signed-off-by: Prateek sethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355 (cherry picked from commit c7c04d3a28c2eb0edc8e015dd0130fa50d3496c7) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2434464 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-by: Phoenix Jung <pjung@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-10-27 09:24:56 -07:00
Peter Daifuku	5a948ccca9	gpu: nvgpu: limit PD cache to < pgsize for linux For Linux, limit the use of the cache to entries less than the page size, to avoid potential problems with running out of CMA memory when allocating large, contiguous slabs, as would be required for non-iommmuable chips. Also, in nvgpu_pd_cache_do_free(), zero out entries only if iommu is in use and PTE entries use the cache (since it's the prefetch of invalid PTEs by iommu that needs to be avoided). Bug 3093183 Bug 3100907 Change-Id: I363031db32e11bc705810a7e87fc9e9ac1dc00bd Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2422039 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Satish Arora <satisha@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-10-06 10:10:02 -07:00
Konsta Hölttä	cd134bb198	gpu: nvgpu: delete priv cmd buf size warnings Running out of priv cmd buffer allocation capacity is typically a recoverable "error" caused by extra pressure wrt. allocation sizes based on number of inflight jobs chosen by userspace. These conditions return -EAGAIN and further retries will succeed as long as the channel advances with submitted jobs. Remove the unnecessary debug spew. Bug 200641803 Bug 200651329 Change-Id: I4dfc38cfc3eb10d57ac11c1b7164c3d84f9034d3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2388799 (cherry picked from commit 29ad324f8226ed3326f5de9117b9115a15cdd032) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410069 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-09-22 19:08:37 -07:00
Peter Daifuku	4f66942afa	nvgpu: fix resource leaks when cleaning up In gk20a_free_channel, destroy notifier_wq and semaphore_wq In __nvgpu_vm_remove, destroy the update_gmmu_lock mutex Bug 200647668 Change-Id: Icbb4e626c0fa9fa2dcf1430b3112b51829b00e4f Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414820 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Satish Arora <satisha@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-09-18 07:09:09 -07:00
Peter Daifuku	036e000a17	nvgpu: add PD cache support for page-sized PTEs Large buffers being mapped to GMMU end up needing many pages for the PTE tables. Allocating these pages one by one can end up being a performance bottleneck, particularly in the virtualized case. Add support for page-sized PTEs to the existing PD cache: - define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab for the PD cache, effectively set to 64K bytes - Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE - When freeing up cached entries, avoid prefetch errors by invalidating the entry (memset to 0) Bug 3093183 Bug 3100907 Change-Id: I2302a1dfeb056b9461159121bbae1be70524a357 Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401783 Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Satish Arora <satisha@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-09-15 02:38:45 -07:00
Debarshi Dutta	c46d6fbc5b	gpu: nvgpu: Discard coherency check on gmmu With MSS Nvlink set for force snoop, check for the coherency flag in gmmu attribute and setting pte aperture to coherent type based on that checking is not relevant. coherent variable removed from nvgpu_gmmu_attrs struct. Bug 200473147 Bug 3057980 Change-Id: Idf76cac901ef7c70faa2c4f7f11a046d94b9466a Signed-off-by: Vinod G <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2013212 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `4e17690975` in rel-32) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387272 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Aayush Rajoria <arajoria@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-08-07 09:11:01 -07:00
Debarshi Dutta	570f03764f	gpu: nvgpu: Remove force coherency Remove the code that set default aperture mask as coherent. MSS nvlink is set for force snoop, so default aperture mask is set as non-coherent. Bug 200473147 Bug 3057980 Change-Id: Ia8f826b8414826d2642f9c35c14ffba1cd0b9353 Signed-off-by: Vinod G <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2011966 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `aec64d8f8b` in dev-main) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387271 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Aayush Rajoria <arajoria@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-08-07 09:10:56 -07:00
Peter Daifuku	1b161b6c7a	gpu: nvgpu: fix value leaked in log The timeout message of nvgpu_timeout_expired_msg() leaks a stack value (%llx) in error log on timeout. As the format expects 1 argument and none is given, fix this by specifying the required argument. Manual port of https://git-master.nvidia.com/r/c/linux-nvgpu/+/2205423 Bug 2780861 Bug 3051385 Change-Id: Ic223e4b79bde718108826f095740b10b54a5e84d Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366452 (cherry picked from commit 372837506af77e2c5b8489ee2123292778abe75d) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2370285 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Sungwook Kim <sungwookk@nvidia.com> Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-07-08 16:09:04 -07:00
Ranjanikar Nikhil Prabhakarrao	f56874aec2	gpu: nvgpu: add speculative barrier Data can be speculativerly stored and code flow can be hijacked. To mitigate this problem insert a speculation barrier. Bug 200447167 Change-Id: Ia865ff2add8b30de49aa970715625b13e8f71c08 Signed-off-by: Ranjanikar Nikhil Prabhakarrao <rprabhakarra@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1972221 (cherry picked from commit `f0762ed483`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/1996052 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Deepak Nibade <dnibade@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-06-30 10:07:26 -07:00
dmitry pervushin	aa6d4d08d4	strncpy: it should depend on size of 1st argument There is no point to depend on strlen of second argument, otherwise it could be a simple strcpy. Instead, let's make it depending on sizeof(destination) ...and make sure that result is NUL-terminated, too Bug 2973859 Change-Id: Ifc941fab07e503b7b980696950d65b8bb10bf4ff Signed-off-by: dmitry pervushin <dpervushin@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342281 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Phoenix Jung <pjung@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-05-13 12:40:51 -07:00
ddutta	bb2c8ef511	gpu: nvgpu: decrease refcount when sync-unmap fails When nvgpu_vm_unmap_sync fails, nvgpu_unmap_sync currently bails out without decreasing the buffer refcount. This prevents from releasing the buffer, in case a deferred job completes after the timeout (which was observed 2 times during overnight stress tests). This also means that the fixed address is not re-useable. Throw out a warning when nvgpu_vm_unmap_sync fails, but proceed with decreasing refcount. Bug 200578193 Change-Id: Ie0cc7caa7d12ca0a3b42123a5f7a28bda72dabbc Signed-off-by: ddutta <ddutta@nvidia.com> (cherry picked from commit `a433f26d5b` in dev-main) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2291352 Tested-by: Naveen Kumar S <nkumars@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-03-05 04:54:42 -08:00
ddutta	fbad02d5e0	gpu: nvgpu: remove blcg_enable/disable blcg is always enabled by default and there is no need for disabling this during gr init or gr reset. Bug 2866010 Change-Id: Iaf17b7fdf05ad04fe435e1a1fda758deedc6484c Signed-off-by: ddutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2303114 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-02-28 07:42:46 -08:00
Peter Daifuku	ea14973b14	gpu: nvgpu: vgpu: fix tsg_unbind in recovery case When unbinding a channel from a tsg when virtual, vgpu_tsg_unbind_channel would return an error if unbinding the channel on the guest side failed, and did so before notifying the RM server of the unbind. Later on in the recovery process, the guest OS would remove the channel from the TSG's list, but this would leave the RM server with an out-of-date channel list. Fix this by making the tsg_unbind_channel HAL optional and implemented only for vgpu: the vgpu version now just notifies the RM server so that it can clean up its version of the TSG; if vgpu, always call the tsg_unbind_channel HAL whether or not the local unbind succeeded. Minimal port from dev-main of https://git-master.nvidia.com/r/c/linux-nvgpu/+/2084029 Bug 2766920 Bug 200587845 Change-Id: I75bddf3a28ac20bf4fb7510ff64097a32c7eec3f Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287774 (cherry picked from commit 471c72c1efcc4fe6d547f556edf7773827fd2674) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2289928 Reviewed-by: Thomas Steinle <tsteinle@nvidia.com> Reviewed-by: Satish Arora <satisha@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-02-22 04:24:36 -08:00
Debarshi Dutta	e45e7b5cf8	gpu: nvgpu: move cg_enable after pmu_init is complete This patch help resolve the boot time failures happening with pmu_exterr for porg. cg_enable can race with pmu_init thread, cg_enable is moved post pmu init thread to avoid the above race. Bug 200565050 Change-Id: I2192053eff8767847ea012ca20b3607d2f6cd26f Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2239959 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-02-19 10:41:27 -08:00
Thomas Fleury	e41fd09031	gpu: nvgpu: use refcnt for ch mmu_debug_mode Replaced ch->mmu_debug_mode_enabled with ch->mmu_debug_mode_refcnt. If channel is enabled multiple times by userspace, then ref count is updated accordingly. There is an expectation that enable/disable calls are balanced for setting channel's mmu debug mode. When unbinding the channel, decrease refcnt for the channel until it reaches 0. Also, removed tsg parameter from nvgpu_tsg_set_mmu_debug_mode as it can be retrieved from ch. Bug 2515097 Bug 2713590 Change-Id: If334e374a55bd14ae219edbfd3b1fce5ff25c226 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2184702 (cherry picked from commit `f422aee393`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208772 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Kajetan Dutka <kdutka@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Winnie Hsu <whsu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Kajetan Dutka <kdutka@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-01-29 23:42:46 -08:00
Thomas Fleury	e0587aaf4d	gpu: nvgpu: set FB/HSMMU debug mode Set NV_PFB_HSMMU_PRI_MMU_DEBUG_CTRL and NV_PFB_PRI_MMU_DEBUG_CTRL in addition to NV_PGRAPH_PRI_GPCS_MMU_DEBUG_CTRL, in NVGPU_DBG_GPU_IOCTL_SET_CTX_MMU_DEBUG_MODE Bug 2515097 Bug 2713590 Change-Id: I1763b43e79fac3edb68a35980683d58bfa89519f Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2115785 (cherry picked from commit `8057514a9f`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208771 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Kajetan Dutka <kdutka@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Winnie Hsu <whsu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Kajetan Dutka <kdutka@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-01-29 23:42:34 -08:00
Thomas Fleury	9e328ed6b8	gpu: nvgpu: add refcounting for MMU debug mode GPC MMU debug mode should be set if at least one channel in the TSG has requested it. Add refcounting for MMU debug mode, to make sure debug mode is disabled only when no channel in the TSG is using it. Bug 2515097 Bug 2713590 Change-Id: Ic5530f93523a9ec2cd3bfebc97adf7b7000531e0 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2123017 (cherry picked from commit `a1248d87fe`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208769 Reviewed-by: Kajetan Dutka <kdutka@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Winnie Hsu <whsu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Kajetan Dutka <kdutka@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-01-29 23:42:10 -08:00
Vinod G	7a26ad57a7	gpu: nvgpu: enable platform atomic feature Support following changes related to platform atomic feature NV_PFB_PRI_MMU_CTRL_ATOMIC_CAPABILITY_MODE to RMW MODE NV_PFB_PRI_MMU_CTRL_ATOMIC_CAPABILITY_SYS_NCOH_MODE to L2 NV_PFB_HSHUB_NUM_ACTIVE_LTCS_HUB_SYS_ATOMIC_MODE to USE_RMW NV_PFB_FBHUB_NUM_ACTIVE_LTCS_HUB_SYS_ATOMIC_MODE to USE_RMW NV_PFB_FBHUB_NUM_ACTIVE_LTCS_HUB_SYS_NCOH_ATOMIC_MODE to USE_READ In gv11b, FBHUB_NUM_ACTIVE_LTCS register has read only privilege, so atomic mode register bits cannot be updated from kernel code. atomic capability and atomic_sys_ncoh_mode bits are copied from fb mmu_ctrl to gpcs_mmu_ctrl register. new tu104 hal for fb_enable_nvlink function. bug 200580236 Change-Id: Ia78986c1c56795c6efad20f4ba42700ef1c2c1ad Signed-off-by: Vinod G <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2013481 (cherry picked from commit `251e3eaa80`) Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2274932 GVS: Gerrit_Virtual_Submit Tested-by: Sreeniketh H <sh@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-01-08 08:35:39 -08:00
Vinod G	dacb06f464	gpu: nvgpu: add platform atomic support Add new variable in nvgpu_as_map_buffer_ex_args for app to specify the platform atomic support for the page. When platform atomic attribute flag is set, pte memory aperture is set to be coherent type. renamed nvgpu_aperture_mask_coh -> nvgpu_aperture_mask_raw function. bug 200580236 Change-Id: I18266724dafdc8dfd96a0711f23cf08e23682afc Signed-off-by: Vinod G <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2012679 (cherry picked from commit `9e0a9004b7`) Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2274914 Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Sreeniketh H <sh@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-01-08 08:35:30 -08:00
Peter Daifuku	264691e69d	gpu: nvgpu: re-enable elpg after golden img init Typically, the PMU init thread will finish up long before the golden context image has been initialized, which means that ELPG hasn't truly been enabled at that point. Create a new function, nvgpu_pmu_reenable_pg(), which checks if elpg had been enabled (non-zero refcnt), and if so, disables then re-enables it. Call this function from gk20a_alloc_obj_ctx() after the golden context image has been initialized to ensure that elpg is truly enabled. Manually ported from dev-main Bug 200543218 Change-Id: I0e7c4f64434c5e356829581950edce61cc88882a Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2245768 (cherry picked from commit 077b6712b5a40340ece818416002ac8431dc4138) Reviewed-on: https://git-master.nvidia.com/r/2250091 GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-11-29 01:23:45 -08:00
Deepak Nibade	0ffc5fa5e4	gpu: nvgpu: add clock gating support for HSHUB Add BLCG and SLCG clock gating support for HSHUB unit on gv11b Register list for BLCG and SLCG is auto generated with scripts. Add HAL operations to enable/disable HSHUB clock gating Re-generate gv11b reglist so that all the manually commented registers are automatically deleted. Some of the unicast registers are also deleted. We already have corresponding broadcast registers present. Cherry-pick/manually port from dev-main Bug 2526212 Change-Id: I2654f158daa802bcf992e103ed4a44675aa5fd4d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2150199 (cherry picked from commit `e34b6f76d3`) Reviewed-on: https://git-master.nvidia.com/r/2224708 Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-by: Luis Dib <ldib@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-11-04 06:10:39 -08:00
Peter Daifuku	56f8e5b878	gpu: nvgpu: channel_setup_bind: must be bound to TSG In nvgpu_channel_setup_bind, return an error if the channel isn't bound to a TSG, as future operations rely on being bound. Bug 200543218 Change-Id: If33b01b8176c7488445c23080ad9d11f341bff43 Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2215160 Reviewed-by: Thomas Fleury <tfleury@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Luis Dib <ldib@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-10-14 19:09:43 -07:00
Abhiroop Kaginalkar	99700222a5	gpu: nvgpu: Fix PMU destroy sequence A call to exit the PMU state machine/kthread must be prioritized over any other state change. It was possible to set the state as PMU_STATE_EXIT, signal the kthread and overwrite the state before the kthread has had the chance to exit its loop. This may lead to a "lost" signal, resulting in indefinite wait during the destroy sequence. Faulting sequence: 1. pmu_state = PMU_STATE_EXIT in nvgpu_pmu_destroy() 2. cond_signal() 3. pmu_state = PMU_STATE_LOADING_PG_BUF 4. PMU kthread wakes up 5. PMU kthread processes PMU_STATE_LOADING_PG_BUF 6. PMU kthread sleeps 7. nvgpu_pmu_destroy() waits indefinitely This patch adds a sticky flag to indicate PMU_STATE_EXIT, irrespective of any subsequent changes to pmu_state. The PMU PG init kthread may wait on a call to NVGPU_COND_WAIT_INTERRUPTIBLE, which requires a corresponding call to nvgpu_cond_signal_interruptible() as the core kernel code requires this task mask to wake-up an interruptible task. Bug 2658750 Bug 200532122 Change-Id: I61beae80673486f83bf60c703a8af88b066a1c36 Signed-off-by: Abhiroop Kaginalkar <akaginalkar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2177112 (cherry picked from commit afa49fb073a324c49a820e142aaaf80e4656dcc6) Reviewed-on: https://git-master.nvidia.com/r/2190733 Tested-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-09-23 09:25:58 -07:00
Leon Yu	d601ff5159	nvgpu: don't report max load when counter overflow This is to prevent GPU (and thus EMC) frequency from being boosted from time to time when system is completely idle. It's caused by max GPU load being incorrectly reported by perfmon. When the issue happens, it can be observed that max load is reported but busy_cycles read from PMU is actually zero. Even though busy and total cycles returned by PMU may not be completely accurate when counter overflows, the counters accumulated so far still have some value that we shouldn't ignore. OTOH, returning max load could be the least accurate approximation in such cases. So let's just clear the interrupt status and let rest of the code handle the exception cases. Bug 200545546 Change-Id: I6882ae265029e881f5417fb2b82005b0112b0fda Signed-off-by: Leon Yu <leoyu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2180771 Reviewed-by: Peng Liu <pengliu@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Mubushir Rahman <mubushirr@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-30 01:25:01 -07:00
Vedashree Vidwans	84f48df530	gpu: nvgpu: use vpr resize API This patch adds nvgpu API in linux and qnx to query vpr resize. The new API nvgpu_is_vpr_resize_enabled() is used in nvgpu_submit_channel_gpfifo(). Previously, if non-deterministic channel has timeout disabled and GPU cannot railgate on some platform, then channel doesn't power ref count and results in video freeze. This requires non-determinstic channel job tracking to be enabled if vpr resize is supported or if GPU can railgate. Bug 200532122 Change-Id: Icfbff6253762b195b2f5955749343974b1a7a269 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2167082 Reviewed-on: https://git-master.nvidia.com/r/2180581 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-30 01:24:52 -07:00
Vedashree Vidwans	6500ce7581	gpu: nvgpu: fix race for channel sync read/write CTS test dEQP-VK.api.object_management.max_concurrent.device_group crashes with invalid userspace memory access. Currently, nvgpu_submit_prepare_syncs() races with gk20a_channel_clean_up_jobs() and this race condition is exposed when aggressive_sync_destroy_thresh is set to non-zero value. nvgpu_submit_prepare_syncs() gets ref for c->sync to submit job and releases channel sync_lock immediately. Meanwhile, gk20a_channel_worker_process() triggers gk20a_channel_clean_up_jobs(), which destroys ref'd c->sync pointer. Channel sync is deleted by gk20a_channel_clean_up_jobs() only if aggressive_sync_destroy_thresh is non-zero. So, gk20a_channel_clean_up_jobs() and nvgpu_submit_prepare_syncs() will race only in this scenario. Hence, if aggressive_sync_destroy_thresh value is non-zero, this patch protects channel's sync pointer by holding channel sync_lock during complete execution of nvgpu_submit_prepare_syncs(). Bug 2613870 Change-Id: I6f3d48aff361d1cb38c30d2ce5de276d0c55fb6f Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2180550 Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-27 20:12:03 -07:00
Konsta Holtta	8b484c0b53	gpu: nvgpu: support usermode submit buffers Import userd and gpfifo buffers from userspace if provided via NVGPU_IOCTL_CHANNEL_ALLOC_GPFIFO_EX. Also supply the work submit token (i.e., the hw channel id) to userspace. To keep the buffers alive, store their dmabuf and attachment/sgt handles in nvgpu_channel_linux. Our nvgpu_mem doesn't provide such data for buffers that are mainly in kernel use. The buffers are freed via a new API in the os_channel interface. Fix a bug in gk20a_channel_free_usermode_buffers: also unmap the usermode gpfifo buffer. Bug 200145225 Bug 200541476 Change-Id: I8416af7085c91b044ac8ccd9faa38e2a6d0c3946 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1795821 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `99b1c6dcdf` in dev-main) Reviewed-on: https://git-master.nvidia.com/r/2170603 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-15 00:58:54 -07:00
Debarshi Dutta	58ee7561f7	gpu: nvgpu: Add CHANNEL_SETUP_BIND IOCTL For a long time now, the ALLOC_GPFIFO_EX channel IOCTL has done much more than just gpfifo allocation, and its signature does not match support that's needed soon. Add a new one called SETUP_BIND to hopefully cover our future needs and deprecate ALLOC_GPFIFO_EX. Change nvgpu internals to match this new naming as well. Bug 200145225 Bug 200541476 Change-Id: I766f9283a064e140656f6004b2b766db70bd6cad Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1835186 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `e0c8a16c8d` in dev-main) Reviewed-on: https://git-master.nvidia.com/r/2169882 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-15 00:57:45 -07:00
Seema Khowala	e5c8bbb391	gpu: nvgpu: set channel to serviceable after it is bound to tsg Channel's unserviceable status should to set to false only after channel is bound to tsg. Bug 200460037 Change-Id: I24976c673b3b08cc652d2c203b9fc1f3aaed403f Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2135923 Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-06-14 23:59:39 -07:00
Kary Jin	fea9e05454	gpu: nvgpu: add check for "vm->num_user_mapped_buffers" The "nvgpu_big_zalloc()" will be failed if the passed-in argument "vm->num_user_mapped_buffers" is zero. The returned value is 16 which will bypass the NULL-check and then causes the panic. This patch adds a check on the "vm->num_user_mapped_buffers" to avoid the zero is passed-in the "nvgpu_big_zalloc()". Bug 2603292 Change-Id: I399eecf72a288e13992730651a34a6cea1ef56d1 Signed-off-by: Kary Jin <karyj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2123499 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Daniel Fu <danifu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-30 22:17:11 -07:00
Shih-hsin Li	af95d14bb0	gpu: nvgpu: fix synchronization in nvgpu_vm_map The mapping early returned from nvgpu_vm_map might already be unmapped during channel clean up. Increase refcount of an already mapped buffer inside the scope of update_gmmu_lock mutex to avoid this race. Bug 200494150 Change-Id: I66d9272e42c40cd3aae7ba3bb8106ec37691bf8e Signed-off-by: Shih-hsin Li <seasonl@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2114163 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Vinayak Pane <vpane@nvidia.com> Reviewed-by: Daniel Fu <danifu@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-09 20:43:17 -07:00
Debarshi Dutta	6509bb49da	gpu: nvgpu: protect recovery with engines_reset_mutex Rename gr_reset_mutex to engines_reset_mutex and acquire it before initiating recovery. Recovery running in parallel with engine reset is not recommended. On hitting engine reset, h/w drops the ctxsw_status to INVALID in fifo_engine_status register. Also while the engine is held in reset h/w passes busy/idle straight through. fifo_engine_status registers are correct in that there is no context switch outstanding as the CTXSW is aborted when reset is asserted. Use deferred_reset_mutex to protect deferred_reset_pending variable If deferred_reset_pending is true then acquire engines_reset_mutex and call gk20a_fifo_deferred_reset. gk20a_fifo_deferred_reset would also check the value of deferred_reset_pending before initiating reset process Bug 2092051 Bug 2429295 Bug 2484211 Bug 1890287 Change-Id: I47de669a6203e0b2e9a8237ec4e4747339b9837c Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2022373 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `cb91bf1e13` in dev-main) Reviewed-on: https://git-master.nvidia.com/r/2024901 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-09 14:42:33 -07:00
Debarshi Dutta	4d8ad643d6	gpu: nvgpu: wait for gr.initialized before changing cg/pg set gr.initialized to false in the beginning of gk20a_gr_reset() and set it to true at the end of successful execution of gk20a_gr_reset. Use gk20a_gr_wait_initialized() to enable/disable cg/pg functions to make sure engine is out of reset and initialized. Bug 2092051 Bug 2429295 Bug 2484211 Bug 1890287 Change-Id: Ic7b0b71382c6d852a625c603dad8609c43b7f20f Signed-off-by: Seema Khowala <seemaj@nvidia.com> Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `7e2f124fd1` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2111038 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-09 14:42:14 -07:00
Debarshi Dutta	c81cc032c4	gpu: nvgpu: add cg and pg function Add new power/clock gating functions that can be called by other units. New clock_gating functions will reside in cg.c under common/power_features/cg unit. New power gating functions will reside in pg.c under common/power_features/pg unit. Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable elpg and also in gr_gk20a_elpg_protected macro to access gr registers. Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled and slcg_enabled thread safe. JIRA NVGPU-2014 Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2025493 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from `c905858565` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2108406 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-09 14:41:30 -07:00
Peng Liu	3a11883f7f	gpu: nvgpu: using pmu counters for load estimate PMU counters #0 and #4 are used to count total cycles and busy cycles. These counts are used by podgov to estimate GPU load. PMU idle intr status register is used to monitor overflow. Overflow rarely occurs because frequency governor reads and resets the counters at a high cadence. When overflow occurs, 100% work load is reported to frequency governor. Bug 1963732 Change-Id: I046480ebde162e6eda24577932b96cfd91b77c69 Signed-off-by: Peng Liu <pengliu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1939547 (cherry picked from commit `34df003519`) Reviewed-on: https://git-master.nvidia.com/r/1979495 Reviewed-by: Aaron Tian <atian@nvidia.com> Tested-by: Aaron Tian <atian@nvidia.com> Reviewed-by: Rajkumar Kasirajan <rkasirajan@nvidia.com> Tested-by: Rajkumar Kasirajan <rkasirajan@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-01 15:27:17 -07:00
Seema Khowala	e00804594b	gpu: nvgpu: remove gk20a_is_channel_marked_as_tsg Use tsg_gk20a_from_ch to get tsg pointer for tsgid of a channel. For invalid tsgid, tsg pointer will be NULL Bug 2092051 Bug 2429295 Bug 2484211 Change-Id: I82cd6a2dc5fab4acb147202af667ca97a2842a73 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2006722 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `13f37f9c70` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2025507 GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-18 11:30:16 -07:00
Seema Khowala	c9d4df288d	gpu: nvgpu: remove code for ch not bound to tsg - Remove handling for channels that are no more bound to tsg as channel could be referenceable but no more part of a tsg - Use tsg_gk20a_from_ch to get pointer to tsg for a given channel - Clear unhandled gr interrupts Bug 2429295 JIRA NVGPU-1580 Change-Id: I9da43a2bc9a0282c793b9f301eaf8e8604f91d70 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1972492 (cherry picked from commit `013ca60edd` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2018262 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Tested-by: Debarshi Dutta <ddutta@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-22 18:59:18 -08:00
Seema Khowala	465aff5f0d	gpu: nvgpu: do not use raw spinlock for ch->timeout.lock With PREEMPT_RT kernel, regular spinlocks are mapped onto sleeping spinlocks (rt_mutex locks), and raw spinlocks retain their behaviour. Schedule while atomic can occur in gk20a_channel_timeout_start, as it acquires ch->timeout.lock raw spinlock, and then calls functions that acquire ch->ch_timedout_lock regular spinlock. Bug 200484795 Change-Id: Iacc63195d8ee6a2d571c998da1b4b5d396f49439 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2004100 (cherry picked from commit `aacc33bb47` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2017923 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Tested-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-18 06:02:00 -08:00
Konsta Holtta	5e440e63d6	gpu: nvgpu: abstract out timeout rewinding The channel timeout ends up in a strange state during timeout handling for a brief moment; it can become stopped and started again, and the timeout lock is released in the middle. Add a more explicit rewind function to reset the timeout to start if it's active. The active check allows to use this from gk20a_channel_timeout_restart_all_channels(), so that's also modified. Also replace the return statements with more readable control flow in gk20a_channel_timeout_handler(). Bug 200484795 Change-Id: Ia7d67242dfc149ace1f4f841a837e90b6c985308 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1989327 Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> (cherry picked from commit `8979a97af3` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2017922 Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Tested-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-18 06:01:57 -08:00
Seema Khowala	cbf6394482	gpu: nvgpu: check ch_timedout for poll/restart poll_timeouts and timeout_restart_all_channels should only handle channels that have not been recovered/aborted. Check ch_timedout status of the channel to make sure channel is still alive to be used. A channel reference could still be available even if it is recovered but not closed. Bug 2404865 Change-Id: I016c8b9952ef1d4c349c2a2a2ca55cb81326d380 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1929339 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `def687d4df` in rel-32) Reviewed-on: https://git-master.nvidia.com/r/2016995 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-13 13:19:45 -08:00
Seema Khowala	f78918fd6c	gpu: nvgpu: do not suspend/resume recovered channel Already torn down channels should not be suspended or resumed. A channel reference could still be available even if it is recovered but not closed. Use ch_timedout status to check if channel is already recovered/aborted. Bug 2404865 Change-Id: I718eab6032ee94a9322da7a239a978b388de2b01 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1929338 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `88cff206ae` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2016994 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-13 13:19:41 -08:00
Seema Khowala	220860d043	gpu: nvgpu: rename has_timedout and make it thread safe Currently has_timedout variable is protected by wmb at places where it is being set and there is no correspoding rmb whenever has_timedout variable is read. This is prone to errors for concurrent execution. This change is supposed to fix this issue. Rename has_timedout variable of channel struct to ch_timedout. Also to avoid rmb every time ch_timedout is read, ch_timedout_spinlock is added to protect ch_timedout variable for taking care of concurrent execution. Bug 2404865 Bug 2092051 Change-Id: I0bee9f50af0a48720aa8b54cbc3af97ef9f6df00 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1930935 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `1f54ea09e3` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2016975 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Bibek Basu <bbasu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-02-13 13:19:37 -08:00

1 2 3 4 5 ...

964 Commits