Commit Graph

964 Commits

Author SHA1 Message Date
Kishan
5adf709506 gpu: nvgpu: Enable GPCCS debug data logging.
Currently in case of any fecs error, we only dump fecs
cxtsw fw related registers, mailboxes and trace registers.
With this change, we want to ensure we dump gpccs register
space as well. This will help in debugging ctxsw related
failures

JIRA NVGPU-9560
Bug 3907163 

Change-Id: I61e25883da4455ea1412ca70c5fc3377d9a786a3
Signed-off-by: Kishan <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2850402
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-02-06 16:38:32 -08:00
Divya Singhatwaria
464de27507 async cmd resp for gv11b
- When DISALLOW cmd is sent from driver to PMU the actual
  completion of the disallow will be acknowledged by PMU
  via a PG EVENT: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
  the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
  ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
  DISALLOW_ACK through ASYNC_CMD_RESP msg.
- After disallow command is sent from the driver, NvGPU driver
  waits/polls for disallow command ack. This is sent immediately
  by msg framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
  is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
  ack from PMU, it can result in erros  as PMU is still
  processing DISALLOW cmd but the driver progressed further.

Bug 3580271

Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500
(cherry picked from commit fb019bf43a)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694312
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-04 15:09:38 -07:00
Sagar Kamble
5fb06d03ca gpu: nvgpu: stop ELPG init thread during unload
ELPG initialization thread creation can fail when the process is killed.
That leads to driver resume failure.

That thread was stopped on suspend and re-created on resume. To avoid
the issue above, don't stop the ELPG thread in suspend and let the
first created thread handle the ELPG state transitions always.
And stop the ELPG thread during unload.

bug 3345977
bug 200685277

Change-Id: I8952edf8d1664ed258f238e265002e716d1bf5c2
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573763
(cherry picked from commit f4571194b0)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2574436
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-12 05:39:59 -07:00
Sagar Kamble
bb8bf1c76c gpu: nvgpu: fix tsg unbind failure paths
nvgpu_tsg_unbind_channel_common failure handling missed
channel.clear & nvgpu_tsg_set_mmu_debug_mode calls.

Bug 200711183

Change-Id: I19fd53be55db9df725b7cf467b2673e4cd29deb5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521972
(cherry picked from commit 89ec2afbd4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524251
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-04 14:40:56 -07:00
Sagar Kamble
13fc430775 gpu: nvgpu: retry tsg unbind if NEXT is set
The NEXT bit can remain set for the channel if timeslice expires before
scheduler clears it. Due to this nvgpu fails TSG unbind and in turn
nvrm_gpu fails channel close. In this case, checking the channel hw
state after some time can help see NEXT bit cleared by scheduler.

Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again.

Bug 3144960
Bug 200520811

Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391
(cherry picked from commit cf287a4ef5)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2479106
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-19 14:39:39 -07:00
Divya Singhatwaria
9170f2b77c gpu: nvgpu: remove ZBC save/restore by PMU
- ZBC save/restore registers are removed in GP10B PMU ucode.
- These registers are saved/restored from CTXSW ucode during
  ELPG entry/exit.
- Accessing the ZBC registers will cause PMU EXTERR error.
- To resolve this, ZBC functionality is removed from GP10B
  feature list in PMU ucode.
- From NvGPU driver, set NVGPU_PMU_ZBC_SAVE bit to false
  for GP10B
- Updated the GP10B PMU app version for the ucode:
  https://git-master.nvidia.com/r/c/tegra/kernel-firmware-t18x/+/2476260

P4 CL link related to this PMU ucode change:
https://p4sw-swarm.nvidia.com/changes/29594520

Bug 3233071
Bug 200696431

Change-Id: If3f1707b79699e7e2e65367418b25ac71b09cf0b
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2476259
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-17 09:54:54 -07:00
Nitin Kumbhar
7882f15ff6 gpu: nvgpu: fix possible buffer overflow issue
As sprintf() is used to populate pool_name[20], it can overflow
for larger u32 values (u32 max decimal number chars are 10) i.e.
20 < strlen("semaphore_pool-") i.e. 15 + 10.

Fix this overflow by removing pool_name as it's not used.

Bug 2626446
Bug 3273414

Change-Id: I4e0a222a2cd34dcd09e69294bc46e2242abb04bb
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2205356
(cherry picked from commit baa86cf134ee6753beabfa974a10faffc5775ee8)
Signed-off-by: ByungKuk Seo <bseo@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2496976
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Harsh Sinha <hsinha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-12 07:54:52 -08:00
Sumit Gupta
535e9b1dd7 gpu: nvgpu: fix mutex wrong acquire
Wrong acquire/release sequence.

 DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current)
 ....
 CPU: 4 PID: 5404 Comm: cyclictest.sh Not tainted 4.9.201-rt134-tegra #1
 Hardware name: Jetson-AGX (DT)
 ....
 Call trace:
 [<ffffff800810e4f8>] debug_rt_mutex_unlock+0x58/0x68
 [<ffffff8008f34d0c>] rt_mutex_unlock+0x4c/0xb0
 [<ffffff8008f36ea8>] _mutex_unlock+0x20/0x2c
 [<ffffff8000f69d80>] nvgpu_cg_elcg_set_elcg_enabled+0x78/0xf0 [nvgpu]
 [<ffffff8000f7bd44>] nvgpu_intr_nonstall_cb+0x21bc/0x22f0 [nvgpu]
 [<ffffff800875b304>] dev_attr_store+0x44/0x60
 [<ffffff80082dca44>] sysfs_kf_write+0x5c/0x78
 [<ffffff80082dbd28>] kernfs_fop_write+0xc0/0x1d8
 [<ffffff8008245b60>] __vfs_write+0x48/0x128
 [<ffffff8008246b3c>] vfs_write+0xac/0x1b8
 [<ffffff800824808c>] SyS_write+0x5c/0xc8

Bug 3227296

Suggested-by: Bibek Basu <bbasu@nvidia.com>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Change-Id: I932a23700539422c07de045dde516c52dd8348cf
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2472903
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-02-19 10:39:57 -08:00
prsethi
8cb168632b gpu: nvgpu: add support for ACB SLCG on gv11b
Register list for ACB SLCG is auto generated with scripts.
Add HAL operations to enable/disable ACB clock gating.

Cherry-pick/manually port from dev-main

Bug 200647909

Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549
Signed-off-by: Prateek sethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355
(cherry picked from commit c7c04d3a28c2eb0edc8e015dd0130fa50d3496c7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2434464
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-10-27 09:24:56 -07:00
Peter Daifuku
5a948ccca9 gpu: nvgpu: limit PD cache to < pgsize for linux
For Linux, limit the use of the cache to entries less than the page size, to
avoid potential problems with running out of CMA memory when allocating large,
contiguous slabs, as would be required for non-iommmuable chips.

Also, in nvgpu_pd_cache_do_free(), zero out entries only if iommu is in use
and PTE entries use the cache (since it's the prefetch of invalid PTEs by
iommu that needs to be avoided).

Bug 3093183
Bug 3100907

Change-Id: I363031db32e11bc705810a7e87fc9e9ac1dc00bd
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2422039
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-10-06 10:10:02 -07:00
Konsta Hölttä
cd134bb198 gpu: nvgpu: delete priv cmd buf size warnings
Running out of priv cmd buffer allocation capacity is typically a
recoverable "error" caused by extra pressure wrt. allocation sizes based
on number of inflight jobs chosen by userspace. These conditions return
-EAGAIN and further retries will succeed as long as the channel advances
with submitted jobs. Remove the unnecessary debug spew.

Bug 200641803
Bug 200651329

Change-Id: I4dfc38cfc3eb10d57ac11c1b7164c3d84f9034d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2388799
(cherry picked from commit 29ad324f8226ed3326f5de9117b9115a15cdd032)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410069
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-09-22 19:08:37 -07:00
Peter Daifuku
4f66942afa nvgpu: fix resource leaks when cleaning up
In gk20a_free_channel, destroy notifier_wq and
semaphore_wq

In __nvgpu_vm_remove, destroy the update_gmmu_lock mutex

Bug 200647668

Change-Id: Icbb4e626c0fa9fa2dcf1430b3112b51829b00e4f
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414820
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-09-18 07:09:09 -07:00
Peter Daifuku
036e000a17 nvgpu: add PD cache support for page-sized PTEs
Large buffers being mapped to GMMU end up needing many
pages for the PTE tables. Allocating these pages one
by one can end up being a performance bottleneck, particularly
in the virtualized case.

Add support for page-sized PTEs to the existing PD cache:

- define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab
  for the PD cache, effectively set to 64K bytes
- Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE
- When freeing up cached entries, avoid prefetch errors by
  invalidating the entry (memset to 0)

Bug 3093183
Bug 3100907

Change-Id: I2302a1dfeb056b9461159121bbae1be70524a357
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401783
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-09-15 02:38:45 -07:00
Debarshi Dutta
c46d6fbc5b gpu: nvgpu: Discard coherency check on gmmu
With MSS Nvlink set for force snoop, check for the coherency flag in
gmmu attribute and setting pte aperture to coherent type based on that
checking is not relevant.

coherent variable removed from nvgpu_gmmu_attrs struct.

Bug 200473147
Bug 3057980

Change-Id: Idf76cac901ef7c70faa2c4f7f11a046d94b9466a
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2013212
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from 4e17690975
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387272
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Aayush Rajoria <arajoria@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-08-07 09:11:01 -07:00
Debarshi Dutta
570f03764f gpu: nvgpu: Remove force coherency
Remove the code that set default aperture mask as coherent.
MSS nvlink is set for force snoop, so default aperture mask is set as
non-coherent.

Bug 200473147
Bug 3057980

Change-Id: Ia8f826b8414826d2642f9c35c14ffba1cd0b9353
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2011966
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from aec64d8f8b
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387271
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Aayush Rajoria <arajoria@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-08-07 09:10:56 -07:00
Peter Daifuku
1b161b6c7a gpu: nvgpu: fix value leaked in log
The timeout message of nvgpu_timeout_expired_msg() leaks
a stack value (%llx) in error log on timeout. As the format
expects 1 argument and none is given, fix this by specifying
the required argument.

Manual port of https://git-master.nvidia.com/r/c/linux-nvgpu/+/2205423

Bug 2780861
Bug 3051385

Change-Id: Ic223e4b79bde718108826f095740b10b54a5e84d
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366452
(cherry picked from commit 372837506af77e2c5b8489ee2123292778abe75d)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2370285
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sungwook Kim <sungwookk@nvidia.com>
Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-07-08 16:09:04 -07:00
Ranjanikar Nikhil Prabhakarrao
f56874aec2 gpu: nvgpu: add speculative barrier
Data can be speculativerly stored and
code flow can be hijacked.

To mitigate this problem insert a
speculation barrier.

Bug 200447167

Change-Id: Ia865ff2add8b30de49aa970715625b13e8f71c08
Signed-off-by: Ranjanikar Nikhil Prabhakarrao <rprabhakarra@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1972221
(cherry picked from commit f0762ed483)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/1996052
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Deepak Nibade <dnibade@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-06-30 10:07:26 -07:00
dmitry pervushin
aa6d4d08d4 strncpy: it should depend on size of 1st argument
There is no point to depend on strlen of second argument,
otherwise it could be a simple strcpy. Instead, let's make
it depending on sizeof(destination)

...and make sure that result is NUL-terminated, too

Bug 2973859

Change-Id: Ifc941fab07e503b7b980696950d65b8bb10bf4ff
Signed-off-by: dmitry pervushin <dpervushin@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342281
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-05-13 12:40:51 -07:00
ddutta
bb2c8ef511 gpu: nvgpu: decrease refcount when sync-unmap fails
When nvgpu_vm_unmap_sync fails, nvgpu_unmap_sync currently bails
out without decreasing the buffer refcount. This prevents from
releasing the buffer, in case a deferred job completes after the
timeout (which was observed 2 times during overnight
stress tests). This also means that the fixed address is not
re-useable.

Throw out a warning when nvgpu_vm_unmap_sync fails, but proceed
with decreasing refcount.

Bug 200578193

Change-Id: Ie0cc7caa7d12ca0a3b42123a5f7a28bda72dabbc
Signed-off-by: ddutta <ddutta@nvidia.com>
(cherry picked from commit a433f26d5b
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2291352
Tested-by: Naveen Kumar S <nkumars@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-03-05 04:54:42 -08:00
ddutta
fbad02d5e0 gpu: nvgpu: remove blcg_enable/disable
blcg is always enabled by default and there is no need for disabling
this during gr init or gr reset.

Bug 2866010

Change-Id: Iaf17b7fdf05ad04fe435e1a1fda758deedc6484c
Signed-off-by: ddutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2303114
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-02-28 07:42:46 -08:00
Peter Daifuku
ea14973b14 gpu: nvgpu: vgpu: fix tsg_unbind in recovery case
When unbinding a channel from a tsg when virtual, vgpu_tsg_unbind_channel
would return an error if unbinding the channel on the guest side failed,
and did so before notifying the RM server of the unbind.

Later on in the recovery process, the guest OS would remove the channel from the
TSG's list, but this would leave the RM server with an out-of-date channel list.

Fix this by making the tsg_unbind_channel HAL optional and implemented only for vgpu:
the vgpu version now just notifies the RM server so that it can clean up its version
of the TSG; if vgpu, always call the tsg_unbind_channel HAL whether or not
the local unbind succeeded.

Minimal port from dev-main of https://git-master.nvidia.com/r/c/linux-nvgpu/+/2084029

Bug 2766920
Bug 200587845

Change-Id: I75bddf3a28ac20bf4fb7510ff64097a32c7eec3f
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287774
(cherry picked from commit 471c72c1efcc4fe6d547f556edf7773827fd2674)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2289928
Reviewed-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-02-22 04:24:36 -08:00
Debarshi Dutta
e45e7b5cf8 gpu: nvgpu: move cg_enable after pmu_init is complete
This patch help resolve the boot time failures happening with
pmu_exterr for porg. cg_enable can race with pmu_init thread,
cg_enable is moved post pmu init thread to avoid the above race.

Bug 200565050

Change-Id: I2192053eff8767847ea012ca20b3607d2f6cd26f
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2239959
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-02-19 10:41:27 -08:00
Thomas Fleury
e41fd09031 gpu: nvgpu: use refcnt for ch mmu_debug_mode
Replaced ch->mmu_debug_mode_enabled with ch->mmu_debug_mode_refcnt.
If channel is enabled multiple times by userspace, then ref count is
updated accordingly. There is an expectation that enable/disable
calls are balanced for setting channel's mmu debug mode.
When unbinding the channel, decrease refcnt for the channel until it
reaches 0.
Also, removed tsg parameter from nvgpu_tsg_set_mmu_debug_mode as it
can be retrieved from ch.

Bug 2515097
Bug 2713590

Change-Id: If334e374a55bd14ae219edbfd3b1fce5ff25c226
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2184702
(cherry picked from commit f422aee393)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208772
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Kajetan Dutka <kdutka@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Kajetan Dutka <kdutka@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-01-29 23:42:46 -08:00
Thomas Fleury
e0587aaf4d gpu: nvgpu: set FB/HSMMU debug mode
Set NV_PFB_HSMMU_PRI_MMU_DEBUG_CTRL and NV_PFB_PRI_MMU_DEBUG_CTRL
in addition to NV_PGRAPH_PRI_GPCS_MMU_DEBUG_CTRL, in
NVGPU_DBG_GPU_IOCTL_SET_CTX_MMU_DEBUG_MODE

Bug 2515097
Bug 2713590

Change-Id: I1763b43e79fac3edb68a35980683d58bfa89519f
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2115785
(cherry picked from commit 8057514a9f)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208771
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Kajetan Dutka <kdutka@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Kajetan Dutka <kdutka@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-01-29 23:42:34 -08:00
Thomas Fleury
9e328ed6b8 gpu: nvgpu: add refcounting for MMU debug mode
GPC MMU debug mode should be set if at least one channel
in the TSG has requested it. Add refcounting for MMU debug
mode, to make sure debug mode is disabled only when no
channel in the TSG is using it.

Bug 2515097
Bug 2713590

Change-Id: Ic5530f93523a9ec2cd3bfebc97adf7b7000531e0
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2123017
(cherry picked from commit a1248d87fe)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2208769
Reviewed-by: Kajetan Dutka <kdutka@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Kajetan Dutka <kdutka@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-01-29 23:42:10 -08:00
Vinod G
7a26ad57a7 gpu: nvgpu: enable platform atomic feature
Support following changes related to platform atomic feature
NV_PFB_PRI_MMU_CTRL_ATOMIC_CAPABILITY_MODE to RMW MODE
NV_PFB_PRI_MMU_CTRL_ATOMIC_CAPABILITY_SYS_NCOH_MODE to L2
NV_PFB_HSHUB_NUM_ACTIVE_LTCS_HUB_SYS_ATOMIC_MODE to USE_RMW
NV_PFB_FBHUB_NUM_ACTIVE_LTCS_HUB_SYS_ATOMIC_MODE to USE_RMW
NV_PFB_FBHUB_NUM_ACTIVE_LTCS_HUB_SYS_NCOH_ATOMIC_MODE to USE_READ

In gv11b, FBHUB_NUM_ACTIVE_LTCS register has read only privilege,
so atomic mode register bits cannot be updated from kernel code.

atomic capability and atomic_sys_ncoh_mode bits are copied from
fb mmu_ctrl to gpcs_mmu_ctrl register.

new tu104 hal for fb_enable_nvlink function.

bug 200580236

Change-Id: Ia78986c1c56795c6efad20f4ba42700ef1c2c1ad
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2013481
(cherry picked from commit 251e3eaa80)
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2274932
GVS: Gerrit_Virtual_Submit
Tested-by: Sreeniketh H <sh@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-01-08 08:35:39 -08:00
Vinod G
dacb06f464 gpu: nvgpu: add platform atomic support
Add new variable in nvgpu_as_map_buffer_ex_args for app
to specify the platform atomic support for the page.
When platform atomic attribute flag is set, pte memory
aperture is set to be coherent type.

renamed nvgpu_aperture_mask_coh -> nvgpu_aperture_mask_raw
function.

bug 200580236

Change-Id: I18266724dafdc8dfd96a0711f23cf08e23682afc
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2012679
(cherry picked from commit 9e0a9004b7)
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2274914
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Sreeniketh H <sh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-01-08 08:35:30 -08:00
Peter Daifuku
264691e69d gpu: nvgpu: re-enable elpg after golden img init
Typically, the PMU init thread will finish up long
before the golden context image has been initialized,
which means that ELPG hasn't truly been enabled at that
point.

Create a new function, nvgpu_pmu_reenable_pg(), which
checks if elpg had been enabled (non-zero refcnt), and
if so, disables then re-enables it.

Call this function from gk20a_alloc_obj_ctx() after
the golden context image has been initialized to ensure
that elpg is truly enabled.

Manually ported from dev-main

Bug 200543218

Change-Id: I0e7c4f64434c5e356829581950edce61cc88882a
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2245768
(cherry picked from commit 077b6712b5a40340ece818416002ac8431dc4138)
Reviewed-on: https://git-master.nvidia.com/r/2250091
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-11-29 01:23:45 -08:00
Deepak Nibade
0ffc5fa5e4 gpu: nvgpu: add clock gating support for HSHUB
Add BLCG and SLCG clock gating support for HSHUB unit on gv11b

Register list for BLCG and SLCG is auto generated with scripts.
Add HAL operations to enable/disable HSHUB clock gating

Re-generate gv11b reglist so that all the manually commented registers
are automatically deleted. Some of the unicast registers are also
deleted. We already have corresponding broadcast registers present.

Cherry-pick/manually port from dev-main

Bug 2526212

Change-Id: I2654f158daa802bcf992e103ed4a44675aa5fd4d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2150199
(cherry picked from commit e34b6f76d3)
Reviewed-on: https://git-master.nvidia.com/r/2224708
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-by: Luis Dib <ldib@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-11-04 06:10:39 -08:00
Peter Daifuku
56f8e5b878 gpu: nvgpu: channel_setup_bind: must be bound to TSG
In nvgpu_channel_setup_bind, return an error if the
channel isn't bound to a TSG, as future operations
rely on being bound.

Bug 200543218

Change-Id: If33b01b8176c7488445c23080ad9d11f341bff43
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2215160
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Luis Dib <ldib@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-10-14 19:09:43 -07:00
Abhiroop Kaginalkar
99700222a5 gpu: nvgpu: Fix PMU destroy sequence
A call to exit the PMU state machine/kthread must
be prioritized over any other state change.
It was possible to set the state as PMU_STATE_EXIT,
signal the kthread and overwrite the state before
the kthread has had the chance to exit its loop.
This may lead to a "lost" signal, resulting in
indefinite wait during the destroy sequence.

Faulting sequence:
1. pmu_state = PMU_STATE_EXIT in nvgpu_pmu_destroy()
2. cond_signal()
3. pmu_state = PMU_STATE_LOADING_PG_BUF
4. PMU kthread wakes up
5. PMU kthread processes PMU_STATE_LOADING_PG_BUF
6. PMU kthread sleeps
7. nvgpu_pmu_destroy() waits indefinitely

This patch adds a sticky flag to indicate PMU_STATE_EXIT,
irrespective of any subsequent changes to pmu_state.

The PMU PG init kthread may wait on a call to
NVGPU_COND_WAIT_INTERRUPTIBLE, which requires a
corresponding call to nvgpu_cond_signal_interruptible()
as the core kernel code requires this task mask to
wake-up an interruptible task.

Bug 2658750
Bug 200532122

Change-Id: I61beae80673486f83bf60c703a8af88b066a1c36
Signed-off-by: Abhiroop Kaginalkar <akaginalkar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2177112
(cherry picked from commit afa49fb073a324c49a820e142aaaf80e4656dcc6)
Reviewed-on: https://git-master.nvidia.com/r/2190733
Tested-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-09-23 09:25:58 -07:00
Leon Yu
d601ff5159 nvgpu: don't report max load when counter overflow
This is to prevent GPU (and thus EMC) frequency from being boosted
from time to time when system is completely idle. It's caused by max
GPU load being incorrectly reported by perfmon. When the issue
happens, it can be observed that max load is reported but busy_cycles
read from PMU is actually zero.

Even though busy and total cycles returned by PMU may not be
completely accurate when counter overflows, the counters
accumulated so far still have some value that we shouldn't ignore.
OTOH, returning max load could be the least accurate approximation in
such cases. So let's just clear the interrupt status and let rest of
the code handle the exception cases.

Bug 200545546

Change-Id: I6882ae265029e881f5417fb2b82005b0112b0fda
Signed-off-by: Leon Yu <leoyu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180771
Reviewed-by: Peng Liu <pengliu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mubushir Rahman <mubushirr@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-30 01:25:01 -07:00
Vedashree Vidwans
84f48df530 gpu: nvgpu: use vpr resize API
This patch adds nvgpu API in linux and qnx to query vpr resize.
The new API nvgpu_is_vpr_resize_enabled() is used in
nvgpu_submit_channel_gpfifo().
Previously, if non-deterministic channel has timeout disabled and
GPU cannot railgate on some platform, then channel doesn't power ref
count and results in video freeze. This requires non-determinstic
channel job tracking to be enabled if vpr resize is supported or if GPU
can railgate.

Bug 200532122

Change-Id: Icfbff6253762b195b2f5955749343974b1a7a269
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2167082
Reviewed-on: https://git-master.nvidia.com/r/2180581
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-30 01:24:52 -07:00
Vedashree Vidwans
6500ce7581 gpu: nvgpu: fix race for channel sync read/write
CTS test dEQP-VK.api.object_management.max_concurrent.device_group
crashes with invalid userspace memory access.
Currently, nvgpu_submit_prepare_syncs() races with
gk20a_channel_clean_up_jobs() and this race condition is exposed when
aggressive_sync_destroy_thresh is set to non-zero value.
nvgpu_submit_prepare_syncs() gets ref for c->sync to submit job and
releases channel sync_lock immediately. Meanwhile,
gk20a_channel_worker_process() triggers gk20a_channel_clean_up_jobs(),
which destroys ref'd c->sync pointer.
Channel sync is deleted by gk20a_channel_clean_up_jobs() only if
aggressive_sync_destroy_thresh is non-zero.
So, gk20a_channel_clean_up_jobs() and nvgpu_submit_prepare_syncs() will
race only in this scenario.
Hence, if aggressive_sync_destroy_thresh value is non-zero, this patch
protects channel's sync pointer by holding channel sync_lock
during complete execution of nvgpu_submit_prepare_syncs().

Bug 2613870

Change-Id: I6f3d48aff361d1cb38c30d2ce5de276d0c55fb6f
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180550
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-27 20:12:03 -07:00
Konsta Holtta
8b484c0b53 gpu: nvgpu: support usermode submit buffers
Import userd and gpfifo buffers from userspace if provided via
NVGPU_IOCTL_CHANNEL_ALLOC_GPFIFO_EX. Also supply the work submit token
(i.e., the hw channel id) to userspace.

To keep the buffers alive, store their dmabuf and attachment/sgt handles
in nvgpu_channel_linux. Our nvgpu_mem doesn't provide such data for
buffers that are mainly in kernel use. The buffers are freed via a new
API in the os_channel interface.

Fix a bug in gk20a_channel_free_usermode_buffers: also unmap the
usermode gpfifo buffer.

Bug 200145225
Bug 200541476

Change-Id: I8416af7085c91b044ac8ccd9faa38e2a6d0c3946
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1795821
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 99b1c6dcdf
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2170603
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-15 00:58:54 -07:00
Debarshi Dutta
58ee7561f7 gpu: nvgpu: Add CHANNEL_SETUP_BIND IOCTL
For a long time now, the ALLOC_GPFIFO_EX channel IOCTL has done much
more than just gpfifo allocation, and its signature does not match
support that's needed soon. Add a new one called SETUP_BIND to hopefully
cover our future needs and deprecate ALLOC_GPFIFO_EX.

Change nvgpu internals to match this new naming as well.

Bug 200145225
Bug 200541476

Change-Id: I766f9283a064e140656f6004b2b766db70bd6cad
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1835186
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from e0c8a16c8d
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2169882
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-15 00:57:45 -07:00
Seema Khowala
e5c8bbb391 gpu: nvgpu: set channel to serviceable after it is bound to tsg
Channel's unserviceable status should to set to false only
after channel is bound to tsg.

Bug 200460037

Change-Id: I24976c673b3b08cc652d2c203b9fc1f3aaed403f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2135923
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-14 23:59:39 -07:00
Kary Jin
fea9e05454 gpu: nvgpu: add check for "vm->num_user_mapped_buffers"
The "nvgpu_big_zalloc()" will be failed if the passed-in argument
"vm->num_user_mapped_buffers" is zero. The returned value is 16
which will bypass the NULL-check and then causes the panic.

This patch adds a check on the "vm->num_user_mapped_buffers" to
avoid the zero is passed-in the "nvgpu_big_zalloc()".

Bug 2603292

Change-Id: I399eecf72a288e13992730651a34a6cea1ef56d1
Signed-off-by: Kary Jin <karyj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2123499
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Daniel Fu <danifu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-30 22:17:11 -07:00
Shih-hsin Li
af95d14bb0 gpu: nvgpu: fix synchronization in nvgpu_vm_map
The mapping early returned from nvgpu_vm_map might already
be unmapped during channel clean up. Increase refcount of
an already mapped buffer inside the scope of update_gmmu_lock
mutex to avoid this race.

Bug 200494150

Change-Id: I66d9272e42c40cd3aae7ba3bb8106ec37691bf8e
Signed-off-by: Shih-hsin Li <seasonl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2114163
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vinayak Pane <vpane@nvidia.com>
Reviewed-by: Daniel Fu <danifu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 20:43:17 -07:00
Debarshi Dutta
6509bb49da gpu: nvgpu: protect recovery with engines_reset_mutex
Rename gr_reset_mutex to engines_reset_mutex and acquire it
before initiating recovery. Recovery running in parallel with
engine reset is not recommended.

On hitting engine reset, h/w drops the ctxsw_status to INVALID in
fifo_engine_status register. Also while the engine is held in reset
h/w passes busy/idle straight through. fifo_engine_status registers
are correct in that there is no context switch outstanding
as the CTXSW is aborted when reset is asserted.

Use deferred_reset_mutex to protect deferred_reset_pending variable
If deferred_reset_pending is true then acquire engines_reset_mutex
and call gk20a_fifo_deferred_reset.
gk20a_fifo_deferred_reset would also check the value of
deferred_reset_pending before initiating reset process

Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287

Change-Id: I47de669a6203e0b2e9a8237ec4e4747339b9837c
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2022373
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from cb91bf1e13
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2024901
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 14:42:33 -07:00
Debarshi Dutta
4d8ad643d6 gpu: nvgpu: wait for gr.initialized before changing cg/pg
set gr.initialized to false in the beginning of gk20a_gr_reset() and
set it to true at the end of successful execution of gk20a_gr_reset.

Use gk20a_gr_wait_initialized() to enable/disable cg/pg
functions to make sure engine is out of reset and initialized.

Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287

Change-Id: Ic7b0b71382c6d852a625c603dad8609c43b7f20f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from 7e2f124fd1 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2111038
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 14:42:14 -07:00
Debarshi Dutta
c81cc032c4 gpu: nvgpu: add cg and pg function
Add new power/clock gating functions that can be called by
other units.

New clock_gating functions will reside in cg.c under
common/power_features/cg unit.

New power gating functions will reside in pg.c under
common/power_features/pg unit.

Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable
elpg and also in gr_gk20a_elpg_protected macro to access gr registers.

Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled
and slcg_enabled thread safe.

JIRA NVGPU-2014

Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2025493
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from c905858565 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2108406
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 14:41:30 -07:00
Peng Liu
3a11883f7f gpu: nvgpu: using pmu counters for load estimate
PMU counters #0 and #4 are used to count total cycles and busy cycles.
These counts are used by podgov to estimate GPU load.

PMU idle intr status register is used to monitor overflow. Overflow
rarely occurs because frequency governor reads and resets the counters
at a high cadence. When overflow occurs, 100% work load is reported to
frequency governor.

Bug 1963732

Change-Id: I046480ebde162e6eda24577932b96cfd91b77c69
Signed-off-by: Peng Liu <pengliu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1939547
(cherry picked from commit 34df003519)
Reviewed-on: https://git-master.nvidia.com/r/1979495
Reviewed-by: Aaron Tian <atian@nvidia.com>
Tested-by: Aaron Tian <atian@nvidia.com>
Reviewed-by: Rajkumar Kasirajan <rkasirajan@nvidia.com>
Tested-by: Rajkumar Kasirajan <rkasirajan@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-01 15:27:17 -07:00
Seema Khowala
e00804594b gpu: nvgpu: remove gk20a_is_channel_marked_as_tsg
Use tsg_gk20a_from_ch to get tsg pointer for tsgid of a channel. For
invalid tsgid, tsg pointer will be NULL

Bug 2092051
Bug 2429295
Bug 2484211

Change-Id: I82cd6a2dc5fab4acb147202af667ca97a2842a73
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2006722
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 13f37f9c70
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2025507
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-18 11:30:16 -07:00
Seema Khowala
c9d4df288d gpu: nvgpu: remove code for ch not bound to tsg
- Remove handling for channels that are no more bound to tsg
  as channel could be referenceable but no more part of a tsg
- Use tsg_gk20a_from_ch to get pointer to tsg for a given channel
- Clear unhandled gr interrupts

Bug 2429295
JIRA NVGPU-1580

Change-Id: I9da43a2bc9a0282c793b9f301eaf8e8604f91d70
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1972492
(cherry picked from commit 013ca60edd
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2018262
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-22 18:59:18 -08:00
Seema Khowala
465aff5f0d gpu: nvgpu: do not use raw spinlock for ch->timeout.lock
With PREEMPT_RT kernel, regular spinlocks are mapped onto sleeping
spinlocks (rt_mutex locks), and raw spinlocks retain their behaviour.

Schedule while atomic can occur in gk20a_channel_timeout_start,
as it acquires ch->timeout.lock raw spinlock, and then calls
functions that acquire ch->ch_timedout_lock regular spinlock.

Bug 200484795

Change-Id: Iacc63195d8ee6a2d571c998da1b4b5d396f49439
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004100
(cherry picked from commit aacc33bb47
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2017923
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-18 06:02:00 -08:00
Konsta Holtta
5e440e63d6 gpu: nvgpu: abstract out timeout rewinding
The channel timeout ends up in a strange state during timeout handling
for a brief moment; it can become stopped and started again, and the
timeout lock is released in the middle. Add a more explicit rewind
function to reset the timeout to start if it's active. The active check
allows to use this from gk20a_channel_timeout_restart_all_channels(), so
that's also modified.

Also replace the return statements with more readable control flow in
gk20a_channel_timeout_handler().

Bug 200484795

Change-Id: Ia7d67242dfc149ace1f4f841a837e90b6c985308
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1989327
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
(cherry picked from commit 8979a97af3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2017922
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-18 06:01:57 -08:00
Seema Khowala
cbf6394482 gpu: nvgpu: check ch_timedout for poll/restart
poll_timeouts and timeout_restart_all_channels should
only handle channels that have not been recovered/aborted.
Check ch_timedout status of the channel to make sure
channel is still alive to be used. A channel reference
could still be available even if it is recovered but not
closed.

Bug 2404865

Change-Id: I016c8b9952ef1d4c349c2a2a2ca55cb81326d380
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1929339
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit def687d4df
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/2016995
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:45 -08:00
Seema Khowala
f78918fd6c gpu: nvgpu: do not suspend/resume recovered channel
Already torn down channels should not be suspended or
resumed. A channel reference could still be available
even if it is recovered but not closed. Use ch_timedout
status to check if channel is already recovered/aborted.

Bug 2404865

Change-Id: I718eab6032ee94a9322da7a239a978b388de2b01
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1929338
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 88cff206ae
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2016994
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:41 -08:00
Seema Khowala
220860d043 gpu: nvgpu: rename has_timedout and make it thread safe
Currently has_timedout variable is protected by wmb at places
where it is being set and there is no correspoding rmb whenever
has_timedout variable is read. This is prone to errors for
concurrent execution. This change is supposed to fix this issue.
Rename has_timedout variable of channel struct to ch_timedout.
Also to avoid rmb every time ch_timedout is read,
ch_timedout_spinlock is added to protect ch_timedout
variable for taking care of concurrent execution.

Bug 2404865
Bug 2092051

Change-Id: I0bee9f50af0a48720aa8b54cbc3af97ef9f6df00
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1930935
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 1f54ea09e3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2016975
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:37 -08:00