Commit Graph

4798 Commits

Author SHA1 Message Date
Sagar Kamble
3b414dbf07 gpu: nvgpu: wait for engines to go idle before suspend
Wait for pbdma and engine to go idle so that the tasks get completed before
suspending.

Updated the logic in gk20a_wait_engine_idle to consider the ctxsw status.
And updated PBDMA idle logic to check the pbdma status and the pb/gp
get/put pointers.

Bug 3789519
Bug 3832838

Change-Id: Ifd105bbb305eaf358423281b192f67d782d773a4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2870162
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-08-18 07:45:15 -07:00
Sagar Kamble
5a2ed4df76 gpu: nvgpu: update poweroff sequence
To support concurrent UMD kickoff and railgate (due to VPR resize), It
is necessary that nvgpu immediately prevents more work submission when
it's known gpu needs to be idled. So nvgpu needs to unmap the usermode
region earliest during suspend sequence.

Otherwise, engines will not go idle before poweroff. Hence move the
call to nvgpu_hide_usermode_poweroff to the beginning of
gk20a_pm_prepare_poweorff.

Also during suspend we ensure that the channels are preempted cleanly.
IRQs should be kept enabled until after channels are suspended as the
stalling IRQ can block the preemption. Hence moved the IRQ disable
post channel_suspend call.

gk20a_prepare_poweroff unconditionally sets power_on to false. Hence
there is no need to reenable IRQs, resume scale in the failure path
of gk20a_pm_prepare_poweroff as those will be done during call to
gk20a_pm_finalize_poweron.

Bug 3789519

Change-Id: I03064e7e636252a8f3d8fe9c8c05629ce2ba5fba
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2853584
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-08-18 07:45:09 -07:00
Debarshi Dutta
ac2dfc554f gpu: nvgpu: address VPR resize for deterministic channels
UMD is unable to restrict railgate for deterministic channels
when NVGPU_CAN_RAILGATE is set to false. This causes issues
with VPR resize as there is no means of preventing an active
VPR resize in progress.

Add a fault handler for usermode region. The fault handler's purpose is
to intercept UMD accesses into the doorbell region when a GPU reset
is in progress. GPU reset could be triggered by VPR resize. During a
reset, the corresponding PTEs for the usermode region are zapped. The
fault handler tries to have a read access to g->deterministic_busy
and blocks till the reset is finished. A VPR resize is guaranteed
to be mutually exclusive due to use of the g->deterministic_busy
RW semaphore.

Bug 3789519

Change-Id: Ie046ee9be8d9b5d4019359c60a4578097b8d55a3
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2802185
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-08-18 07:45:03 -07:00
Jake Park
96c85f2f2e gpu: nvgpu: replace %p with %px for cache name
%p prints (ptrval) instead of hexadecimal value until it gathers
enough entropy. During early boot stage, it can make invalid
cache name like "nvgpu-cache-0x        (ptrval)-128-1" and that
kind of cache name can make failure of kmem_cache_create().
To avoid invalid cache name, replace %p with %px for cache name.

Bug 4100509

Change-Id: Iae0ae9cf1a30ec91aeddddaafda9e7376fc80796
Signed-off-by: Jake Park <jakep@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2929270
Reviewed-by: Dmitry Pervushin <dpervushin@nvidia.com>
Reviewed-by: Kwangwoo Lee <kwangwool@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-07-05 06:08:57 -07:00
Sagar Kamble
6dfacee682 gpu: nvgpu: rd coalesce WAR applies pre-Volta
WAR to disable to rd coalescing for lg, su and tex units is applicable
only before Volta (i.e. Maxwell and Pascal). Hence set the hal to
NULL for gv100 and gv11b.

Bug 3881919

Change-Id: Iab5dd8caf6539f0bb3cc4987f2b5f114db4c2c20
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2864093
Reviewed-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
tegra-l4t-r32.7.4
2023-03-01 03:38:28 -08:00
Sagar Kamble
b2c8827c65 gpu: nvgpu: fix tex rd coalesce disable logic
NETLIST_REGIONID_SW_CTX_LOAD writes update gr_gpcs_tpcs_tex_m_dbg2_r to
default value that keeps rd coalesce enabled for LG & SU.

Disable rd coalesce for tex, lg and su after NETLIST_REGIONID_SW_CTX_LOAD
writes during gr init and golden ctx init for it to take effect.

For gr sw method handling, don't update the tex rd coalesce on interrupt
with offset *_SET_RD_COALESCE as we want to keep rd coalescing disabled.

Bug 3881919

Change-Id: Ie7e6616d48f84547ce3380bfa395910b7995c05b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2857141
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-02-17 00:23:47 -08:00
Kishan
5adf709506 gpu: nvgpu: Enable GPCCS debug data logging.
Currently in case of any fecs error, we only dump fecs
cxtsw fw related registers, mailboxes and trace registers.
With this change, we want to ensure we dump gpccs register
space as well. This will help in debugging ctxsw related
failures

JIRA NVGPU-9560
Bug 3907163 

Change-Id: I61e25883da4455ea1412ca70c5fc3377d9a786a3
Signed-off-by: Kishan <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2850402
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-02-06 16:38:32 -08:00
Sagar Kamble
49a6676ef6 gpu: nvgpu: remove unnecessary devfreq limit checks
gk20a_scale_target is called through the target member of a
devfreq_profile. It is only called from devfreqs update_devfreq
function or through governor_passive. governor_passive is not used for
nvgpu.

Since update_devfreq already enforce the devfreq limits,
gka20_scale_target can be simplified by only checking pm_qos
limits and also only if GK20A_PM_QOS is enabled.

This also resolves a race between creating devfreq sysfs
files and setting 'l->devfreq' in gk20a_scale_init that can
lead to accessing a NULL pointer by writing to the sysfs files.

Example:
Unable to handle kernel NULL pointer dereference at virtual address 00000430
<snip>
Call trace:
[<000000006aa50d89>] gk20a_scale_target+0x5c/0x120 [nvgpu]
[<00000000e5a63f7c>] update_devfreq+0xec/0x22c
[<0000000014a13c8a>] max_freq_store+0xa8/0xfc
[<0000000072139393>] dev_attr_store+0x48/0x60
[<000000008ec280df>] sysfs_kf_write+0x60/0x70
[<0000000038427ed5>] kernfs_fop_write+0xc4/0x1e0
[<00000000c0b74aa9>] __vfs_write+0x60/0x14c
[<0000000078fcebb4>] vfs_write+0xb0/0x1b4
[<000000007720da30>] SyS_write+0x74/0xf0
[<0000000067443e2c>] __sys_trace_return+0x0/0x4

Bug 3910155

Change-Id: I7193cc5ea85454acf0890b3ca8d1c3526ca8517e
Signed-off-by: Ken Chang <kenc@nvidia.com>
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2828219
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
2023-02-06 10:24:37 -08:00
Debarshi Dutta
b432b5f41a gpu: nvgpu: update dma_mask based on H/W compatibility
To be able to access the full physical memory range, gpu's dma_mask
needs to be set to the max value of H/W compatible range.

For example. In order to support from 2GB to 66 GB, GV11B's dma_mask
needs to be atleast 37 bits. Set GV11B's dma_mask to 38 bit.
This value is supported by H/W.

Bug 3656729

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icfff3c36a8c9cf074a254fa773c42e18020ae5de
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2723640
(cherry picked from commit 1bf9309f17)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2724565
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Brad Griffis <bgriffis@nvidia.com>
GVS: Gerrit_Virtual_Submit
tegra-l4t-r32.7.3
2022-06-14 15:40:25 -07:00
Divya Singhatwaria
464de27507 async cmd resp for gv11b
- When DISALLOW cmd is sent from driver to PMU the actual
  completion of the disallow will be acknowledged by PMU
  via a PG EVENT: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
  the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
  ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
  DISALLOW_ACK through ASYNC_CMD_RESP msg.
- After disallow command is sent from the driver, NvGPU driver
  waits/polls for disallow command ack. This is sent immediately
  by msg framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
  is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
  ack from PMU, it can result in erros  as PMU is still
  processing DISALLOW cmd but the driver progressed further.

Bug 3580271

Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500
(cherry picked from commit fb019bf43a)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694312
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-04 15:09:38 -07:00
Debarshi Dutta
46b43d2b24 gpu: nvgpu: add support for disabling l3 via DT
On volta the GPU determines whether to do L3 allocation for a mapping by
checking bit 36 of the physical address. So if a mapping should allocate lines
in the L3 this bit must be set.

However, when the physical addresses for 64GB of RAM uses the 36th bit
resulting in a conflict. Thus, add support for disabling l3 support
for SKUs having 64GB of physical memory.

Bug 3486025

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Ic540e754274cf1d9e6625493962699d21509e540
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2661548
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Brad Griffis <bgriffis@nvidia.com>
GVS: Gerrit_Virtual_Submit
tegra-l4t-r32.7.1
2022-02-02 12:10:51 -08:00
Vikas Siddhabhaktula
de418f6ef6 nvgpu: fix incorrect mem_desc_count
-   Fix incorrect mem_desc_count increment in the case of failure
-   Increment it only when there is a success

Bug 3399680

Change-Id: I8c04e4859422fb86367113c58ce3e34cab952b63
Signed-off-by: Vikas Siddhabhaktula <vsiddhabhakt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618229
Reviewed-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-25 07:24:41 -08:00
Alvin Park
4d17d8b104 gpu: nvgpu: add check for is_railgated
When try to read '/sys/kernel/debug/gpu.0/railgate_residency'
debug fs node, NULL pointer access error can be happened if
is_railgated function is not assinged.
Add check for is_railgated before calling the function pointer.

Bug 200773027

Change-Id: I914b5b0aa48ccb15affe79510b696ebc91129f67
Signed-off-by: Aditya Gupta <adigupta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2596320
(cherry picked from commit e649029c7bed3c7afbd454d7e94f9173377f4c64)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2614156
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Rohit Upadhyay <rupadhyay@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-10-21 07:10:24 -07:00
smadhavan
725a5eaa80 nvgpu: gpu: adds support for ACR dbg/prod.
ACR ucode is encrypted using different keys for prod/dbg boards.
This change adds a check to select ACR ucode based on board type.

ACR ucode binaries are also renamed with "nv_" prefix to conform
to release naming conventions.

Bug 2672836

Change-Id: I48818f018f903c0d03642c12485d60e392121eb6
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2492587
(cherry picked from commit 5dacead521)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2597878
Reviewed-by: Mayur Poojary <mpoojary@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Mayur Poojary <mpoojary@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-10-11 12:56:53 -07:00
mpoojary
3af391e862 gpu: nvgpu: adds support for ACR dbg/prod.
ACR ucode is encrypted using different keys for prod/dbg boards.
This change adds a check to select ACR ucode based on board type.
Note: This support is added for t18x. In the sub-sequent CL, support
for T210 will be added and since ACR binaries are different for
gp10b and gm20b, a new ACR init function is created for gp10b to
accept new ACR prod/dbg binaries.

Bug 2672836

dev-main reference patch:
https://git-master.nvidia.com/r/c/linux-nvgpu/+/2471590

Change-Id: Ib0a01bce4f3a3187aa15a669649f8510c88dfd0a
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601970
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-10-07 06:56:10 -07:00
smadhavan
d87030e730 nvgpu: gpu: adds support for ACR dbg/prod.
ACR ucode is encrypted using different keys for prod/dbg boards.
This change adds a check to select ACR ucode based on board type.
Note: This support is added only for t19x.

Bug 2350733
Bug 2672832
Bug 2672836
Bug 2674821
JIRA NVGPU-4001

(cherry picked from commit c19a0f0c26ab94f6bbf4380ab93e458b88589c82)

Change-Id: I2febc2cbe869c06bca0adebd7723b0d6fc1d4b23
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2483968
Tested-by: Amulya Yarlagadda <ayarlagadda@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Amulya Yarlagadda <ayarlagadda@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-10-05 19:10:26 -07:00
Thomas Steinle
db1393886b drivers: nvgpu: Print chid instead of ch->chid
During VM guest reset GPU interrupts were triggered after the
corresponding channel has been disabled. This leads to a
NULL-Ptr access in fecs error handling. This change removes the
access to invalid ch->chid.

Bug 3362082

Change-Id: I2d51a62ec47a07ae7ea90394fec76d3c3a8d186c
Signed-off-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2590478
(cherry picked from commit b1b04915bfad7060479624d5ec85894c6bac3ba6)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2596201
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-17 14:09:55 -07:00
Sagar Kamble
5fb06d03ca gpu: nvgpu: stop ELPG init thread during unload
ELPG initialization thread creation can fail when the process is killed.
That leads to driver resume failure.

That thread was stopped on suspend and re-created on resume. To avoid
the issue above, don't stop the ELPG thread in suspend and let the
first created thread handle the ELPG state transitions always.
And stop the ELPG thread during unload.

bug 3345977
bug 200685277

Change-Id: I8952edf8d1664ed258f238e265002e716d1bf5c2
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573763
(cherry picked from commit f4571194b0)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2574436
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-12 05:39:59 -07:00
Sagar Kamble
ce8548ec05 gpu: nvgpu: fix clk_arb completion file private data access race
clk_arb completion file descriptor can get closed immediately after
poll finishes in the work item gp10b_clk_arb_run_arbiter_cb. In
that case, the refcount for nvgpu_clk_dev can become zero in
the work item and can lead to invalid access while removing
nvgpu_clk_dev from the lists.

Remove nvgpu_clk_dev from the list before dropping the reference to
it.

Also, delete the nvgpu_clk_dev in completion file release handler
within the session and requests spinlocks to avoid race with
gp10b_clk_arb_run_arbiter_cb using it.

bug 200757277

Change-Id: I054eee547f2a6fa633d7ef55df216ec36647a826
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2569522
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-03 11:39:44 -07:00
Konsta Hölttä
2c441a83d4 gpu: nvgpu: keep usermode region flags on railgate
When the gpu is railgated, the usermode region mappings must be cleared.
This is already done with zap_vma_ptes() but as an extra measure the vm
flags are also zeroed. That is an oversight, so delete that code; in
particular the VM_DONTCOPY flag is important so that the mapping does
not follow fork, as the design does not allow that.

Bug 200726443

Change-Id: I84ed4e38b7de1f0c8cbf4cca6276abfa2409ac3b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2538481
(cherry picked from commit e44ece25ba)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548631
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
tegra-l4t-r32.6.1
2021-06-24 08:09:34 -07:00
Konsta Hölttä
511344da85 gpu: nvgpu: avoid faulty elpg protection
Don't store the return value of elpg re-enable if disable fails; this
could make the local status value zero again, causing the elpg-protected
call to be executed with elpg still enabled and elpg re-enabled twice.

Commit c905858565 ("gpu: nvgpu: add cg and pg function") introduced
this bug; failure of re-enabling after a failed disable might be another
problem (and it's not clear why this is done in the first place) which
isn't propagated to the caller, but that would belong to another patch.

Bug 200565050

Change-Id: I7cf7a0887ae59e85bf0c56c38aaaadfefd16cc1c
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2541859
(cherry picked from commit 4b3591aafb)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2543030
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-15 00:24:16 -07:00
Sagar Kamble
12e89c21de gpu: nvgpu: fix the usermode mappings deadlock during railgate and munmap
Following locking sequence leads to deadlock:

1. gk20a_pm_prepare_poweroff (alter_usermode_mappings):
   ctrl_privs_lock -> mmap_lock
2. __do_munmap (usermode_vma_close):
   mmap_lock -> ctrl_privs_lock

This lock contention can be resolved by retrying the usermode mapping
alteration after a while releasing the ctrl_priv_lock for munmap to
proceed.

Below is the kernel panic log with deadlock.

[] INFO: task kworker/1:1:116 blocked for more than 120 seconds.
[]       Tainted: G        W         5.10.17-tegra #1
[] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[] task:kworker/1:1     state:D stack:    0 pid:  116 ppid:     2 flags:0x00000028
[] Workqueue: pm pm_runtime_work
[] Call trace:
[]  __switch_to+0x104/0x160
[]  __schedule+0x3d4/0x900
[]  schedule+0x74/0x100
[]  rwsem_down_write_slowpath+0x250/0x4b0
[]  down_write+0x6c/0x80
[]  alter_usermode_mappings+0xb4/0x160 [nvgpu]
[]  nvgpu_hide_usermode_for_poweroff+0x24/0x30 [nvgpu]
[]  gk20a_pm_prepare_poweroff+0xe8/0x140 [nvgpu]
[]  gk20a_pm_runtime_suspend+0x78/0xf0 [nvgpu]
[]  pm_generic_runtime_suspend+0x3c/0x60
[]  genpd_runtime_suspend+0xb0/0x2c0
[]  __rpm_callback+0x90/0x150
[]  rpm_callback+0x34/0xa0
[]  rpm_suspend+0xe0/0x5e0
[]  pm_runtime_work+0xbc/0xc0
[]  process_one_work+0x1c0/0x4a0
[]  worker_thread+0x11c/0x430
[]  kthread+0x148/0x170
[]  ret_from_fork+0x10/0x18

[] INFO: task nvrm_gpu_tests:1273 blocked for more than 121 seconds.
[]       Tainted: G        W         5.10.17-tegra #1
[] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[] task:nvrm_gpu_tests  state:D stack:    0 pid: 1273 ppid:  1245 flags:0x00000000
[] Call trace:
[]  __switch_to+0x104/0x160
[]  __schedule+0x3d4/0x900
[]  schedule+0x74/0x100
[]  schedule_preempt_disabled+0x28/0x40
[]  __mutex_lock.isra.0+0x184/0x5c0
[]  __mutex_lock_slowpath+0x24/0x30
[]  mutex_lock+0x5c/0x70
[]  usermode_vma_close+0x30/0x50 [nvgpu]
[]  remove_vma+0x34/0x60
[]  __do_munmap+0x1f4/0x4a0
[]  __vm_munmap+0x74/0xd0
[]  __arm64_sys_munmap+0x3c/0x50
[]  el0_svc_common.constprop.0+0x7c/0x1a0
[]  do_el0_svc+0x34/0xa0
[]  el0_svc+0x1c/0x30
[]  el0_sync_handler+0xa8/0xb0
[]  el0_sync+0x160/0x180
[] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---

Bug 200703921

Change-Id: Ie7f017c92f20061d3bf891079f7fc7fe390f7cf7
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2533853
(cherry picked from commit 1dd3e0761c)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2540111
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-07 06:40:08 -07:00
Deepak Nibade
cbad9503a7 gpu: nvgpu: set file private data before installing fd
Make sure file->private_data is set before installing file into file
descriptor with fd_install().

Bug 200724607
Bug 200725718

Change-Id: I03e79a3f8981f959ab5f75f442911253d166aa87
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2520465
(cherry picked from commit c78efae5e7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535099
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Harsh Sinha <hsinha@nvidia.com>
Reviewed-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Byungkuk Seo <bseo@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-04 00:08:57 -07:00
Debarshi Dutta
34993e4f7b gpu: nvgpu: Add ECC Support for GV11B in Linux
Implement nvgpu plumbing to allow reporting ECC errors(corrected
and uncorrected) to a L1SS service(if one exists).

This patch includes the following

1) Added code that submits ECC error reports via the Interrupt context
directly to a L1SS service in linux OS.

2) Added support for enabling/disabling the error reports via L1SS's
registration/deregistration API. Nvgpu simply invokes an empty function
until the registration is successful.

3) Added Spinlock to correctly handle concurrency for accessing the
correct Ops for submitting requests.

4) Adds error reporting for a subset of interrupts that can be verified
via external ECC injection logic. A subsequent patch will add the
API for rest of the interrupts.

5) In case of critical(uncorrected errors), change nvgpu's state to
quiesce state.

Jira L4T-1187
Bug 200700400

Change-Id: Id31f70531fba355e94e72c4f9762593e7667a11c
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530411
Tested-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-28 12:10:24 -07:00
Sagar Kamble
5f88598b9e gpu: nvgpu: use deferred_fault_engines for resetting engines during unbind
Engine reset is skipped if channel is disassociated from the tsg.
During unbind, tsg is disassociated before calling deferred
engine reset. Hence any deferred resets don't work
actually.

Engines to be reset is already set in the variable
deferred_fault_engines. Use it.

Bug 200711183

Change-Id: I0c2bdcad1770e0ccd001c208a9ac0cf499a374e1
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521974
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit 668bd75c1a)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524252
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-04 14:41:07 -07:00
Sagar Kamble
bb8bf1c76c gpu: nvgpu: fix tsg unbind failure paths
nvgpu_tsg_unbind_channel_common failure handling missed
channel.clear & nvgpu_tsg_set_mmu_debug_mode calls.

Bug 200711183

Change-Id: I19fd53be55db9df725b7cf467b2673e4cd29deb5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521972
(cherry picked from commit 89ec2afbd4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524251
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-04 14:40:56 -07:00
Sagar Kamble
0d088ad70c gpu: nvgpu: wait for stalling interrupts to complete during TSG unbind preempt
Some of the engine stalling interrupts can block the context save off
the engine if not handled during fifo.preempt_tsg. They need to be
handled while polling for engine ctxsw status.

Bug 200711183
Bug 200726848

Change-Id: Ie45d76d9d1d8be3ffb842670843507f2d9aea6d0
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521971
(cherry picked from commit I7418a9e0354013b81fbefd8c0cab5068404fc44e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2523938
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-04 14:40:45 -07:00
Sagar Kamble
00c3d98acb gpu: nvgpu: create timed wait functions for stall and nonstall interrupts completion
In order to process stalling interrupts during TSG unbind, we need a API
to wait for the stalling interrupts to complete within certain duration.

Prepare these APIs for stalling and non-stalling interrupts.

Bug 200711183
Bug 200726848

Change-Id: I634738249ade64224326b356d6244ad4299f1baf
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521970
(cherry picked from commit I0b7a64c0f3761bbd0ca0843aea28a591ed23739f)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2523937
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-04 14:40:40 -07:00
Sagar Kamble
13fc430775 gpu: nvgpu: retry tsg unbind if NEXT is set
The NEXT bit can remain set for the channel if timeslice expires before
scheduler clears it. Due to this nvgpu fails TSG unbind and in turn
nvrm_gpu fails channel close. In this case, checking the channel hw
state after some time can help see NEXT bit cleared by scheduler.

Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again.

Bug 3144960
Bug 200520811

Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391
(cherry picked from commit cf287a4ef5)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2479106
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-19 14:39:39 -07:00
Divya Singhatwaria
9170f2b77c gpu: nvgpu: remove ZBC save/restore by PMU
- ZBC save/restore registers are removed in GP10B PMU ucode.
- These registers are saved/restored from CTXSW ucode during
  ELPG entry/exit.
- Accessing the ZBC registers will cause PMU EXTERR error.
- To resolve this, ZBC functionality is removed from GP10B
  feature list in PMU ucode.
- From NvGPU driver, set NVGPU_PMU_ZBC_SAVE bit to false
  for GP10B
- Updated the GP10B PMU app version for the ucode:
  https://git-master.nvidia.com/r/c/tegra/kernel-firmware-t18x/+/2476260

P4 CL link related to this PMU ucode change:
https://p4sw-swarm.nvidia.com/changes/29594520

Bug 3233071
Bug 200696431

Change-Id: If3f1707b79699e7e2e65367418b25ac71b09cf0b
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2476259
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-17 09:54:54 -07:00
Nitin Kumbhar
7882f15ff6 gpu: nvgpu: fix possible buffer overflow issue
As sprintf() is used to populate pool_name[20], it can overflow
for larger u32 values (u32 max decimal number chars are 10) i.e.
20 < strlen("semaphore_pool-") i.e. 15 + 10.

Fix this overflow by removing pool_name as it's not used.

Bug 2626446
Bug 3273414

Change-Id: I4e0a222a2cd34dcd09e69294bc46e2242abb04bb
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2205356
(cherry picked from commit baa86cf134ee6753beabfa974a10faffc5775ee8)
Signed-off-by: ByungKuk Seo <bseo@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2496976
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Harsh Sinha <hsinha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-03-12 07:54:52 -08:00
Thomas Steinle
cc717e3145 drivers: gk20a: Add gr.ops NULL-ptr check
This fix add NULL-ptr checks for some of the user-accessible
ioctl.

Bug 3240771
Bug 200696704

Change-Id: Ibe7f75b31b2521a530883253a93ba832f010dc80
Signed-off-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2483635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-02-25 06:11:02 -08:00
Sumit Gupta
535e9b1dd7 gpu: nvgpu: fix mutex wrong acquire
Wrong acquire/release sequence.

 DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current)
 ....
 CPU: 4 PID: 5404 Comm: cyclictest.sh Not tainted 4.9.201-rt134-tegra #1
 Hardware name: Jetson-AGX (DT)
 ....
 Call trace:
 [<ffffff800810e4f8>] debug_rt_mutex_unlock+0x58/0x68
 [<ffffff8008f34d0c>] rt_mutex_unlock+0x4c/0xb0
 [<ffffff8008f36ea8>] _mutex_unlock+0x20/0x2c
 [<ffffff8000f69d80>] nvgpu_cg_elcg_set_elcg_enabled+0x78/0xf0 [nvgpu]
 [<ffffff8000f7bd44>] nvgpu_intr_nonstall_cb+0x21bc/0x22f0 [nvgpu]
 [<ffffff800875b304>] dev_attr_store+0x44/0x60
 [<ffffff80082dca44>] sysfs_kf_write+0x5c/0x78
 [<ffffff80082dbd28>] kernfs_fop_write+0xc0/0x1d8
 [<ffffff8008245b60>] __vfs_write+0x48/0x128
 [<ffffff8008246b3c>] vfs_write+0xac/0x1b8
 [<ffffff800824808c>] SyS_write+0x5c/0xc8

Bug 3227296

Suggested-by: Bibek Basu <bbasu@nvidia.com>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Change-Id: I932a23700539422c07de045dde516c52dd8348cf
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2472903
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-02-19 10:39:57 -08:00
Alvin Park
438215b056 gpu: nvgpu: add check for is_railgated
When try to read '/sys/kernel/debug/gpu.0/railgate_residency'
debug fs node, NULL pointer access error can be happened if
is_railgated function is not assinged.
Add check for is_railgated before calling the function pointer.

Bug 200682233

Change-Id: I4a03d4e19b04d02815b792d7d967d4a1d5f42c35
Signed-off-by: Alvin Park <apark@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2459751
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Hardik T Shah <hardikts@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: Jay Kumar Bajaj <jbajaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-16 09:09:13 -08:00
Sagar Kamble
7bf2833f34 gpu: nvgpu: do tsg unbind hw state check only for multi-channel TSG
Host scheduler might be confused if more than one channels are present
in TSG and one of the unbound channel has NEXT set.

This is not so much of an issue if there is single channel in the TSG.
So don't fail unbind in that case. ctx_reload and engine_faulted check
can also be skipped for single channel TSG.

Bug 3144960

Change-Id: I85eb9025ea53706ce8fda6d9b4bcf6a15a300d17
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2442970
(cherry picked from commit ad4624aae3f109fc3c8c03653cb691e09f086930)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2445445
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
tegra-l4t-r32.5.1 tegra-l4t-r32.5
2020-11-23 08:40:17 -08:00
prsethi
8cb168632b gpu: nvgpu: add support for ACB SLCG on gv11b
Register list for ACB SLCG is auto generated with scripts.
Add HAL operations to enable/disable ACB clock gating.

Cherry-pick/manually port from dev-main

Bug 200647909

Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549
Signed-off-by: Prateek sethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355
(cherry picked from commit c7c04d3a28c2eb0edc8e015dd0130fa50d3496c7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2434464
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-10-27 09:24:56 -07:00
Peter Daifuku
5a948ccca9 gpu: nvgpu: limit PD cache to < pgsize for linux
For Linux, limit the use of the cache to entries less than the page size, to
avoid potential problems with running out of CMA memory when allocating large,
contiguous slabs, as would be required for non-iommmuable chips.

Also, in nvgpu_pd_cache_do_free(), zero out entries only if iommu is in use
and PTE entries use the cache (since it's the prefetch of invalid PTEs by
iommu that needs to be avoided).

Bug 3093183
Bug 3100907

Change-Id: I363031db32e11bc705810a7e87fc9e9ac1dc00bd
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2422039
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-10-06 10:10:02 -07:00
Konsta Hölttä
cd134bb198 gpu: nvgpu: delete priv cmd buf size warnings
Running out of priv cmd buffer allocation capacity is typically a
recoverable "error" caused by extra pressure wrt. allocation sizes based
on number of inflight jobs chosen by userspace. These conditions return
-EAGAIN and further retries will succeed as long as the channel advances
with submitted jobs. Remove the unnecessary debug spew.

Bug 200641803
Bug 200651329

Change-Id: I4dfc38cfc3eb10d57ac11c1b7164c3d84f9034d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2388799
(cherry picked from commit 29ad324f8226ed3326f5de9117b9115a15cdd032)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410069
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-09-22 19:08:37 -07:00
Peter Daifuku
4f66942afa nvgpu: fix resource leaks when cleaning up
In gk20a_free_channel, destroy notifier_wq and
semaphore_wq

In __nvgpu_vm_remove, destroy the update_gmmu_lock mutex

Bug 200647668

Change-Id: Icbb4e626c0fa9fa2dcf1430b3112b51829b00e4f
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414820
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-09-18 07:09:09 -07:00
Peter Daifuku
036e000a17 nvgpu: add PD cache support for page-sized PTEs
Large buffers being mapped to GMMU end up needing many
pages for the PTE tables. Allocating these pages one
by one can end up being a performance bottleneck, particularly
in the virtualized case.

Add support for page-sized PTEs to the existing PD cache:

- define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab
  for the PD cache, effectively set to 64K bytes
- Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE
- When freeing up cached entries, avoid prefetch errors by
  invalidating the entry (memset to 0)

Bug 3093183
Bug 3100907

Change-Id: I2302a1dfeb056b9461159121bbae1be70524a357
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401783
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-09-15 02:38:45 -07:00
Sagar Kamble
1c34f50227 gpu: nvgpu: remove the root cap check in ctxsw device open
The device node permission for the ctxsw should be set to "root:debug"
instead.

Bug 2823941

Change-Id: I523fdd298b70cac82c0a8d853f3e241a80a2ebf5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2372943
(cherry picked from commit 692eafdd03af2f7ab4164732f878d2699867ac63)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392715
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-08-14 21:39:30 -07:00
Sagar Kamble
363b183756 gpu: nvgpu: advertise RESCHEDULE_RUNLIST capability only for realtime processes
Below change added capability check in the ioctl. nvgpu is advertising
the support for RESCHEDULE_RUNLIST for all processes even though it
fails the ioctl for non-realtime processes.

Clear the ioctl flag for RESCHEDULE_RUNLIST for non-realtime processes.

commit 838ba0a14d ("gpu: nvgpu: check capability for reschedule runlist submit flag")
Author: David Li <davli@nvidia.com>
Date:   Tue Sep 12 18:37:00 2017 -0700

    NVGPU_SUBMIT_GPFIFO_FLAGS_RESCHEDULE_RUNLIST is only used by realtime
    priority EGL context, which checks for CAP_SYS_NICE during context
    creation in userspace, so it wasn't secure against unprivileged program
    spoofing submit ioctl with this flag to stall GPU progress of others.
    This flag does increase duration of submit by approx 16us,
    mostly due to register accesses and PMU FIFO mutex.

Bug 2823941

Change-Id: Iecee3989e5af035264b1ed5c1aa9a8576dd90883
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2372957
(cherry picked from commit 864213ae55b009b0a026ac380b26276332f79177)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392714
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-08-14 21:39:25 -07:00
Sagar Kamble
1bb8314ca7 gpu: nvgpu: remove cap checks from fifo_sched & ctxsw_ring debugfs open
Debugfs can be mounted with root-only permissions hence remove the extra
cap checks in the debugfs open calls for fifo_sched & ctxsw_ring.

Bug 2823941

Change-Id: I41668a887635f34897886b872ad435b183b85959
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2372982
(cherry picked from commit f34037a09f5996762c69bb4ce86751ed7df24ee7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392713
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-08-14 21:39:20 -07:00
Debarshi Dutta
c46d6fbc5b gpu: nvgpu: Discard coherency check on gmmu
With MSS Nvlink set for force snoop, check for the coherency flag in
gmmu attribute and setting pte aperture to coherent type based on that
checking is not relevant.

coherent variable removed from nvgpu_gmmu_attrs struct.

Bug 200473147
Bug 3057980

Change-Id: Idf76cac901ef7c70faa2c4f7f11a046d94b9466a
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2013212
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from 4e17690975
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387272
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Aayush Rajoria <arajoria@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-08-07 09:11:01 -07:00
Debarshi Dutta
570f03764f gpu: nvgpu: Remove force coherency
Remove the code that set default aperture mask as coherent.
MSS nvlink is set for force snoop, so default aperture mask is set as
non-coherent.

Bug 200473147
Bug 3057980

Change-Id: Ia8f826b8414826d2642f9c35c14ffba1cd0b9353
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2011966
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from aec64d8f8b
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2387271
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Aayush Rajoria <arajoria@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-08-07 09:10:56 -07:00
Thomas Fleury
ea40ac7e86 gpu: nvgpu: remove channel cycle stats ioctls
Cycle stats and cycle stats snapshot ioctls have been moved to
debug node. Removing channel ioctls.

Bug 2660206
Bug 220464613

Change-Id: I3aecdf4a8310eeb38de2de5ac076048891afe436
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2030992
(cherry picked from commit f20424ea6a)
Signed-off-by: Gagan Grover <ggrover@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2092020
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Peter Daifuku <pdaifuku@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-07-27 14:55:13 -07:00
Thomas Fleury
5ecc45b5e7 gpu: nvgpu: add cycle stats to debugger node
Add NVGPU_DBG_GPU_IOCTL_CYCLE_STATS to debugger node, to
install/uninstall a buffer for cycle stats.

Add NVGPU_DBG_GPU_IOCTL_CYCLE_STATS_SNAPSHOT to debugger
node, to attach/flush/detach a buffer for Mode-E streamout.

Those ioctls will apply to the first channel in the debug session.

Bug 2660206
Bug 200464613

Change-Id: I0b96d9a07c016690140292fa5886fda545697ee6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2002060
(cherry picked from commit 90b0bf98ac)
Signed-off-by: Gagan Grover <ggrover@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2092008
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Phoenix Jung <pjung@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Peter Daifuku <pdaifuku@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-07-27 14:55:01 -07:00
Deepak Nibade
e878686302 gpu: nvgpu: wait ACK for FECS watchdog timeout
On Volta, nvgpu needs to wait for explicit ACK from CTXSW while
setting FECS watchdog timeoout

This is manual port of the fixes 4d7e5026e38528b88a4a168eca9a8b180475b368
and ad89436b03428a42e43042b6a849c15843fdebc4 on dev-main since clean
cherry-pick is not possible due to huge file and structure differences.

Bug 200603566

Change-Id: Icba69998ab45eee5fdf2a29e1ac1067589301be6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2371708
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-07-14 02:40:09 -07:00
Peter Daifuku
1b161b6c7a gpu: nvgpu: fix value leaked in log
The timeout message of nvgpu_timeout_expired_msg() leaks
a stack value (%llx) in error log on timeout. As the format
expects 1 argument and none is given, fix this by specifying
the required argument.

Manual port of https://git-master.nvidia.com/r/c/linux-nvgpu/+/2205423

Bug 2780861
Bug 3051385

Change-Id: Ic223e4b79bde718108826f095740b10b54a5e84d
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366452
(cherry picked from commit 372837506af77e2c5b8489ee2123292778abe75d)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2370285
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sungwook Kim <sungwookk@nvidia.com>
Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-07-08 16:09:04 -07:00
Deepak Nibade
3d4a70a4b9 gpu: nvgpu: delete unused file
File tu104/gr_tu104.c was added with commit f56874aec2 owing to
incorrect conflict resolution. Delete it.

Bug 200447167

Change-Id: I24648a3130bc76731c888328d5742229f6f6c928
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2371610
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-07-07 10:43:40 -07:00