Writing same value to railgate_enable_store should be treated as nop
and made successfully. Doing so is not only an optimization for the
operation but also convention that users expect for "settings". This
change is primary for fixing a peculiar situation in the driver:
root@localhost:/sys/devices/17000000.gp10b# cat railgate_enable
0
root@localhost:/sys/devices/17000000.gp10b# echo 0 > railgate_enable
bash: echo: write error: Invalid argument
Attempt to disable railgating on a platform where railgating isn't
supported shouldn't be treated as 'invalid'. It's disabled after all.
Bug 200562094
Change-Id: I3c04934bdbaf337c33d7de9cac6d53c96b4dacae
Signed-off-by: Leon Yu <leoyu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2225476
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When called with timeout=0, NVGPU_COND_WAIT_INTERRUPTIBLE macro
ignores the return code from wait_event_interruptible. As a result
we do not detect when the call is interrupted, and the calling
process hangs.
Use wait_event_interruptible return code in case of infinite timeout.
Bug 200384829
Bug 200543218
Change-Id: I930f0d08c73a3b91ab20a6c8faaf633a3d7aee4d
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1982242
(cherry picked from commit 78c513790a)
Reviewed-on: https://git-master.nvidia.com/r/2215159
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Tested-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
A call to exit the PMU state machine/kthread must
be prioritized over any other state change.
It was possible to set the state as PMU_STATE_EXIT,
signal the kthread and overwrite the state before
the kthread has had the chance to exit its loop.
This may lead to a "lost" signal, resulting in
indefinite wait during the destroy sequence.
Faulting sequence:
1. pmu_state = PMU_STATE_EXIT in nvgpu_pmu_destroy()
2. cond_signal()
3. pmu_state = PMU_STATE_LOADING_PG_BUF
4. PMU kthread wakes up
5. PMU kthread processes PMU_STATE_LOADING_PG_BUF
6. PMU kthread sleeps
7. nvgpu_pmu_destroy() waits indefinitely
This patch adds a sticky flag to indicate PMU_STATE_EXIT,
irrespective of any subsequent changes to pmu_state.
The PMU PG init kthread may wait on a call to
NVGPU_COND_WAIT_INTERRUPTIBLE, which requires a
corresponding call to nvgpu_cond_signal_interruptible()
as the core kernel code requires this task mask to
wake-up an interruptible task.
Bug 2658750
Bug 200532122
Change-Id: I61beae80673486f83bf60c703a8af88b066a1c36
Signed-off-by: Abhiroop Kaginalkar <akaginalkar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2177112
(cherry picked from commit afa49fb073a324c49a820e142aaaf80e4656dcc6)
Reviewed-on: https://git-master.nvidia.com/r/2190733
Tested-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_fecs_trace_poll() right now calls gk20a_fecs_trace_ring_read()
to read the trace ring buffer written by FECS
gk20a_fecs_trace_ring_read() returns number of trace entries written
to local buffer if successful, otherwise returns error
In case there is really an invalid entry, gk20a_fecs_trace_poll()
will just stop reading more entries, write current read pointer to
h/w and return
When gk20a_fecs_trace_poll() is called next time, we again read that
invalid entry, and again skip it, and again return
This keeps happening, and we never move on to read new entries
Fix this by always continuing to read next entry irrespective
of current entry is valid or not
gk20a_fecs_trace_poll() now just prints a debug message instead of
breaking the loop
Bug 200491708
Bug 200542611
Reviewed-on: https://git-master.nvidia.com/r/2020167
(cherry picked from commit decbbf3504)
Change-Id: If8b3c8af63ce662a41ada93a6986fa149e34f664
Signed-off-by: seshendra <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2190151
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- This patch fixes enable/disable fecs trace logic.
- Added enable_lock and enable_count to handle multiple
enable/disable of fecs trace logic.
- If user does trace disable twice, enable_count will become
negative and when user tries to re-enable it, fecs trace
will not be enabled.
Bug 2672760
Bug 200542611
Change-Id: Ic7d4883b899f01dcf43058d0e7c9d1223a716c9b
Signed-off-by: seshendra <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2189371
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Winnie Hsu <whsu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- To enable FECS trace support, nvgpu should set the MSB
of the read pointer (MAILBOX1).
- The ucode will check if the feature is enabled/disabled
before writing a record into the circular buffer. If the
feature is disabled, it will not write the record.
- If the feature is enabled and the buffer is not allocated,
HW will throw a page fault error.
Bug 2459186
Bug 200542611
Change-Id: I6f181643737d1cf1bda02077eaa714a3f4ef3d8c
Signed-off-by: seshendra <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2189250
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The current code reads the pbdma_status info after clearing the
interrupt. Other interrupts/sleep can cause enough delay between
clearing the interrupt and pbdma switching the channel leading to
invalid channel/tsg ID. Correct that by reading the pbdma_status info
register before clearing of the pbdma interrupt to correctly read the
context information before the pbdma can switch out the context.
Bug 200533450
Change-Id: Ic2f0682526e00d14ad58f0411472f34388183f2b
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2165047
(cherry-picked from 0ef96e4b1a
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2188861
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This is to prevent GPU (and thus EMC) frequency from being boosted
from time to time when system is completely idle. It's caused by max
GPU load being incorrectly reported by perfmon. When the issue
happens, it can be observed that max load is reported but busy_cycles
read from PMU is actually zero.
Even though busy and total cycles returned by PMU may not be
completely accurate when counter overflows, the counters
accumulated so far still have some value that we shouldn't ignore.
OTOH, returning max load could be the least accurate approximation in
such cases. So let's just clear the interrupt status and let rest of
the code handle the exception cases.
Bug 200545546
Change-Id: I6882ae265029e881f5417fb2b82005b0112b0fda
Signed-off-by: Leon Yu <leoyu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180771
Reviewed-by: Peng Liu <pengliu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mubushir Rahman <mubushirr@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch adds nvgpu API in linux and qnx to query vpr resize.
The new API nvgpu_is_vpr_resize_enabled() is used in
nvgpu_submit_channel_gpfifo().
Previously, if non-deterministic channel has timeout disabled and
GPU cannot railgate on some platform, then channel doesn't power ref
count and results in video freeze. This requires non-determinstic
channel job tracking to be enabled if vpr resize is supported or if GPU
can railgate.
Bug 200532122
Change-Id: Icfbff6253762b195b2f5955749343974b1a7a269
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2167082
Reviewed-on: https://git-master.nvidia.com/r/2180581
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
CTS test dEQP-VK.api.object_management.max_concurrent.device_group
crashes with invalid userspace memory access.
Currently, nvgpu_submit_prepare_syncs() races with
gk20a_channel_clean_up_jobs() and this race condition is exposed when
aggressive_sync_destroy_thresh is set to non-zero value.
nvgpu_submit_prepare_syncs() gets ref for c->sync to submit job and
releases channel sync_lock immediately. Meanwhile,
gk20a_channel_worker_process() triggers gk20a_channel_clean_up_jobs(),
which destroys ref'd c->sync pointer.
Channel sync is deleted by gk20a_channel_clean_up_jobs() only if
aggressive_sync_destroy_thresh is non-zero.
So, gk20a_channel_clean_up_jobs() and nvgpu_submit_prepare_syncs() will
race only in this scenario.
Hence, if aggressive_sync_destroy_thresh value is non-zero, this patch
protects channel's sync pointer by holding channel sync_lock
during complete execution of nvgpu_submit_prepare_syncs().
Bug 2613870
Change-Id: I6f3d48aff361d1cb38c30d2ce5de276d0c55fb6f
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180550
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Import userd and gpfifo buffers from userspace if provided via
NVGPU_IOCTL_CHANNEL_ALLOC_GPFIFO_EX. Also supply the work submit token
(i.e., the hw channel id) to userspace.
To keep the buffers alive, store their dmabuf and attachment/sgt handles
in nvgpu_channel_linux. Our nvgpu_mem doesn't provide such data for
buffers that are mainly in kernel use. The buffers are freed via a new
API in the os_channel interface.
Fix a bug in gk20a_channel_free_usermode_buffers: also unmap the
usermode gpfifo buffer.
Bug 200145225
Bug 200541476
Change-Id: I8416af7085c91b044ac8ccd9faa38e2a6d0c3946
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1795821
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 99b1c6dcdf
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2170603
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For a long time now, the ALLOC_GPFIFO_EX channel IOCTL has done much
more than just gpfifo allocation, and its signature does not match
support that's needed soon. Add a new one called SETUP_BIND to hopefully
cover our future needs and deprecate ALLOC_GPFIFO_EX.
Change nvgpu internals to match this new naming as well.
Bug 200145225
Bug 200541476
Change-Id: I766f9283a064e140656f6004b2b766db70bd6cad
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1835186
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from e0c8a16c8d
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2169882
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- In GV11B, read fuse_status_opt_tpc_gpc register
to read which TPCs are floorswept.
- The driver will also read sysfs node: tpc_pg_mask
- Based on these two values "can_tpc_powergate" will
be set to true or false and mask will be used to write to
fuse_ctrl_opt_tpc_gpc register to powergate the TPC.
- can_tpc_powergate = true indicates that the mask value
sent from userspace is valid and can be used to power gate
the desired TPC
- can_tpc_powergate = false indicates that the mask value
sent from userspace is not valid and cannot be used to
power gate the desired TPC.
Bug 200532639
Change-Id: Ib0806e4c96305a13b3574e8063ad8e16770aa7cd
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159219
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_mm_l2_flush flushes the L2 cache when "struct gk20a->power_on"
is true. But it doesn't acquire power lock when doing that, which
creates a race that runtime PM might suspend the GPU in the middle
of L2 flush. The FB flush looks having the same issue with L2 flushing.
This patch fixes that by calling pm_runtime_get_if_in_use at the
beginning of the ioctl. This API from PM does a compare and add to
the usage count. If the device was not in use, it simply returns
without incrementing the usage count as its unnecessary to wake up
the GPU(using e.g. a call to gk20a_busy()) as the caches are
flushed when the device would be resumed anyways.
Bug 2643951
Change-Id: I2417f7ca3223c722dcb4d9057d32a7e065b9e574
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2151532
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mark Zhang <markz@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In some cases, we would get deadlock issue due to there are two locks
acquisition on common clk driver's lock and nvgpu driver's locks. At
the bug, inconsistent lock ordering problem will come with one thread
gets "nvgpu lock -> clk lock" and the other thread gets "clk lock ->
nvgpu lock".
Slove the latter path with one-time initializing clk_parent entry
and use cached data afterward.
Bug 2555115
Change-Id: I31c5c2728f406307e7cfd4e555f4db0c163234d8
Signed-off-by: Jeremy Ho <jeremyh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146727
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aleksandr Frid <afrid@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The "nvgpu_big_zalloc()" will be failed if the passed-in argument
"vm->num_user_mapped_buffers" is zero. The returned value is 16
which will bypass the NULL-check and then causes the panic.
This patch adds a check on the "vm->num_user_mapped_buffers" to
avoid the zero is passed-in the "nvgpu_big_zalloc()".
Bug 2603292
Change-Id: I399eecf72a288e13992730651a34a6cea1ef56d1
Signed-off-by: Kary Jin <karyj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2123499
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Daniel Fu <danifu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Register gk20a non-arch-specific functions for gm20b
gpu_ops.fecs_trace,
Register nvgpu_os_linux_ops.fecs_trace.init_debugfs
gp10b_fecs_trace_flush is now replaced by gm20b_fecs_trace_flush in
fecs_trace_gm20b.* and the fecs_trace_gp10b.* files are removed.
Bug 2052906
Change-Id: I245c91ae8e6015b87bafeb3ec023b98fe4c57501
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2115247
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rename gr_reset_mutex to engines_reset_mutex and acquire it
before initiating recovery. Recovery running in parallel with
engine reset is not recommended.
On hitting engine reset, h/w drops the ctxsw_status to INVALID in
fifo_engine_status register. Also while the engine is held in reset
h/w passes busy/idle straight through. fifo_engine_status registers
are correct in that there is no context switch outstanding
as the CTXSW is aborted when reset is asserted.
Use deferred_reset_mutex to protect deferred_reset_pending variable
If deferred_reset_pending is true then acquire engines_reset_mutex
and call gk20a_fifo_deferred_reset.
gk20a_fifo_deferred_reset would also check the value of
deferred_reset_pending before initiating reset process
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: I47de669a6203e0b2e9a8237ec4e4747339b9837c
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2022373
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from cb91bf1e13
in dev-main)
Reviewed-on: https://git-master.nvidia.com/r/2024901
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
set gr.initialized to false in the beginning of gk20a_gr_reset() and
set it to true at the end of successful execution of gk20a_gr_reset.
Use gk20a_gr_wait_initialized() to enable/disable cg/pg
functions to make sure engine is out of reset and initialized.
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: Ic7b0b71382c6d852a625c603dad8609c43b7f20f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from 7e2f124fd1 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2111038
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
if fecs is sent stop_ctxsw method, elpg entry/exit cannot happen
and may timeout. It could manifest as different error signatures
depending on when stop_ctxsw fecs method gets sent with respect
to pmu elpg sequence. It could come as pmu halt or abort or
maybe ext error too.
If ctxsw failed to disable, do not read engine info and just abort tsg.
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: I5f3ba07663bcafd3f0083d44c603420b0ccf6945
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2014914
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2018156
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new power/clock gating functions that can be called by
other units.
New clock_gating functions will reside in cg.c under
common/power_features/cg unit.
New power gating functions will reside in pg.c under
common/power_features/pg unit.
Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable
elpg and also in gr_gk20a_elpg_protected macro to access gr registers.
Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled
and slcg_enabled thread safe.
JIRA NVGPU-2014
Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2025493
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from c905858565 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2108406
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For handle_sched_error, change err to info print for failing eng
id returned as -1 i.e. FIFO_INVAL_ENGINE_ID as no engine is found
busy doing ctxsw. May be ctxsw already finished for the context
for which ctxsw timeout intr was triggered.
Possible Causes:
a)
On hitting engine reset, h/w drops the ctxsw_status to INVALID in
fifo_engine_status register. Also while the engine is held in reset
h/w passes busy/idle straight through. fifo_engine_status registers
are correct in that there is no context switch outstanding
as the CTXSW is aborted when reset is asserted.
This is just a side effect of how gv100 and earlier versions of
ctxsw_timeout behave.
With gv10b and later, h/w snaps the context at the point of error
so that s/w can see the tsg_id which caused the HW timeout.
b)
If engines are not busy and ctxsw state is valid then intr occurred
in the past and if the ctxsw state has moved on to VALID from LOAD
or SAVE, it means that whatever timed out eventually finished
anyways. The problem with this is that s/w cannot conclude which
context caused the problem as maybe more switches occurred before
intr is handled.
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: Ia79bee6e860fb179ee39024c963671d4f8245227
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2030866
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from d27f875d2c
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2076126
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Any recovery that goes through gk20a_fifo_recover path e.g. gr error,
mmu fault or any recovery that involves engine recovery as well, will
still dump the full debug dump. This change will just avoid dumping debug
dump for force reset channels and pbdma intr if they do not involve
engine recovery. For FIFO_ERROR_IDLE_TIMEOUT error notifiers that
involves tsg recovery only, debug_dump will happen only if
timeout_debug_dump is set. timeout_debug_dump by default is set to true
but can be changed using NVGPU_IOCTL_CHANNEL_SET_TIMEOUT_EX.
Bug 2092051
Change-Id: Ibbf3cd2c44c586d9deb9e61ffbf37945b8d9e428
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2033068
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 5222d0ff4f
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2076117
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gr_gp10b_set_cilp_preempt_pending() we already extract TSG pointer
by calling tsg_gk20a_from_ch() which safely returns correct TSG or
NULL in error case
But before calling g->ops.fifo.post_event_id() we again extract TSG
by directly accessing g->fifo.tsg array, and this could result in
getting invalid TSG pointer
Fix this by removing direct TSG extraction through g->fifo.tsg
Bug 2444819
Jira NVGPU-1601
Change-Id: I9d49b5309c74e162828e7cb7d97556aae939a07c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1984954
(cherry picked from commit dcd3778b5e)
Reviewed-on: https://git-master.nvidia.com/r/2077313
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>