NETLIST_REGIONID_SW_CTX_LOAD writes update gr_gpcs_tpcs_tex_m_dbg2_r to
default value that keeps rd coalesce enabled for LG & SU.
Disable rd coalesce for tex, lg and su after NETLIST_REGIONID_SW_CTX_LOAD
writes during gr init and golden ctx init for it to take effect.
For gr sw method handling, don't update the tex rd coalesce on interrupt
with offset *_SET_RD_COALESCE as we want to keep rd coalescing disabled.
Bug 3881919
Change-Id: Ie7e6616d48f84547ce3380bfa395910b7995c05b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2857141
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently in case of any fecs error, we only dump fecs
cxtsw fw related registers, mailboxes and trace registers.
With this change, we want to ensure we dump gpccs register
space as well. This will help in debugging ctxsw related
failures
JIRA NVGPU-9560
Bug 3907163
Change-Id: I61e25883da4455ea1412ca70c5fc3377d9a786a3
Signed-off-by: Kishan <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2850402
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
On Volta, nvgpu needs to wait for explicit ACK from CTXSW while
setting FECS watchdog timeoout
This is manual port of the fixes 4d7e5026e38528b88a4a168eca9a8b180475b368
and ad89436b03428a42e43042b6a849c15843fdebc4 on dev-main since clean
cherry-pick is not possible due to huge file and structure differences.
Bug 200603566
Change-Id: Icba69998ab45eee5fdf2a29e1ac1067589301be6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2371708
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Typically, the PMU init thread will finish up long
before the golden context image has been initialized,
which means that ELPG hasn't truly been enabled at that
point.
Create a new function, nvgpu_pmu_reenable_pg(), which
checks if elpg had been enabled (non-zero refcnt), and
if so, disables then re-enables it.
Call this function from gk20a_alloc_obj_ctx() after
the golden context image has been initialized to ensure
that elpg is truly enabled.
Manually ported from dev-main
Bug 200543218
Change-Id: I0e7c4f64434c5e356829581950edce61cc88882a
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2245768
(cherry picked from commit 077b6712b5a40340ece818416002ac8431dc4138)
Reviewed-on: https://git-master.nvidia.com/r/2250091
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
set gr.initialized to false in the beginning of gk20a_gr_reset() and
set it to true at the end of successful execution of gk20a_gr_reset.
Use gk20a_gr_wait_initialized() to enable/disable cg/pg
functions to make sure engine is out of reset and initialized.
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: Ic7b0b71382c6d852a625c603dad8609c43b7f20f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from 7e2f124fd1 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2111038
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
if fecs is sent stop_ctxsw method, elpg entry/exit cannot happen
and may timeout. It could manifest as different error signatures
depending on when stop_ctxsw fecs method gets sent with respect
to pmu elpg sequence. It could come as pmu halt or abort or
maybe ext error too.
If ctxsw failed to disable, do not read engine info and just abort tsg.
Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287
Change-Id: I5f3ba07663bcafd3f0083d44c603420b0ccf6945
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2014914
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2018156
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new power/clock gating functions that can be called by
other units.
New clock_gating functions will reside in cg.c under
common/power_features/cg unit.
New power gating functions will reside in pg.c under
common/power_features/pg unit.
Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable
elpg and also in gr_gk20a_elpg_protected macro to access gr registers.
Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled
and slcg_enabled thread safe.
JIRA NVGPU-2014
Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2025493
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from c905858565 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2108406
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add gk20a_channel_from_id() to retrieve a channel, given a raw channel
ID, with a reference taken (or NULL if the channel was dead). This makes
it harder to mistakenly use a channel that's dead and thus uncovers bugs
sooner. Convert code to use the new lookup when applicable; work remains
to convert complex uses where a ref should have been taken but hasn't.
The channel ID is also validated against FIFO_INVAL_CHANNEL_ID; NULL is
returned for such IDs. This is often useful and does not hurt when
unnecessary.
However, this does not prevent the case where a channel would be closed
and reopened again when someone would hold a stale channel number. In
all such conditions the caller should hold a reference already.
The only conditions where a channel can be safely looked up by an id and
used without taking a ref are when initializing or deinitializing the
list of channels.
Jira NVGPU-1460
Change-Id: I0a30968d17c1e0784d315a676bbe69c03a73481c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955400
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 7df3d58750
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008515
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
After fecs watchdog gets triggered, system will not
do anything useful as it cannot context switch. Dumping
falcon stats will help debug the issue since s/w is
not triggering recovery.
Bug 2113657
Change-Id: I03ccd5ad7c03daac0581775dc615174cc0e77328
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1812720
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Debug page was allocated and programmed to HUB MMU in GR code. This
introduces a dependency from GR to FB and is anyway the wrong place.
Move the code to allocate memory to generic MM code, and the code
to program the addresses to FB.
Change-Id: Ib6d3c96efde6794cf5e8cd4c908525c85b57c233
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1801423
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
NVGPU_DBG_GPU_IOCTL_REG_OPS currently doesn't return if the ctx was
resident in engine or not.
Regops are broken down into batches of 128 and each batch is executed
together. Since there only 32 bits were available in IOCTL args, returning
is ctx was resident isn't possible for all batches.
Hence return if the ctx was resident for the first batch.
Bug 200445575
Change-Id: Iff950be25893de0afadd523d4ea04842a8ddf2af
Signed-off-by: Anup Mahindre <amahindre@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1812975
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule-17.7 requires the return value of all functions to be used.
Fix is either to use the return value or change the function to return
void. This patch contains fix for calls to nvgpu_readl.
JIRA NVGPU-677
Change-Id: I432197cca67a10281dfe407aa9ce2dd8120030f0
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1807528
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule-15.6 requires that all if-else blocks be enclosed in braces,
including single statement blocks. Fix errors due to single statement
if blocks without braces by introducing the braces.
JIRA NVGPU-671
Change-Id: Ie4bd8bffdafe6321e35394558dc9559f9c2d05c2
Signed-off-by: Srirangan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1797689
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule-17.7 requires the return value of all functions to be used.
Fix is either to use the return value or change the function to return
void. This patch contains fix for calls to nvgpu_mutex_init and
improves related error handling.
JIRA NVGPU-677
Change-Id: I609fa138520cc7ccfdd5aa0e7fd28c8ca0b3a21c
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1805598
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch corrects a handful of MISRA 10.1 violations involving
illegal arithmetic operations (e.g. bitwise OR) on boolean values:
* fix to status handling in regops validation code
* fix to debugger event handling in gr code
* fix to entries_left tracking in runlist construct code
* fix to verbose channel dumping and reset tracking in fifo code
JIRA NVGPU-650
Change-Id: I3c3d9123b5a0e08fc936d0e63d51de99fc310ade
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1810957
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The graphics subctx header object is nothing but memory. Drop the
dependency to gr header file in the channel header file and substitute
struct nvgpu_mem for struct ctx_header_desc.
Jira NVGPU-967
Change-Id: Ic3976391016c42d2ada4aac3e0851a1222244ce9
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1807370
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The gr_gk20a_add_zbc() routine returns a signed error
(errno) status value.
Current callers of this function use a bitwise OR to collect
the returned error status values to generate a single value
to return.
Bitwise OR on signed status values is flagged as a
violation of MISRA Rule 10.1 (not to mention that in this
case it potentially results in a garbage return value).
To eliminate such violations this change modifies the
following routines to fail immediately on the first error
from a call to gr_gk20a_add_zbc():
* gr_gk20a_load_zbc_default_table()
* gr_gv11b_load_stencil_default_tbl()
JIRA NVGPU-650
Change-Id: If733c1bb0e05943ff5d0355de729133c89233583
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1805501
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In nvgpu repository, we have multiple accesses to methods in
pmu_gk20a.h which have register accesses. Instead of directly invoking
these methods, these are now called via HALs. Some common methods such
as pmu_wait_message_cond which donot have any register accesses
are moved to pmu_ipc.c and the method declarations are moved
to pmu.h. Also, changed gm20b_pmu_dbg to
nvgpu_dbg_pmu all across the code base. This would remove all
indirect dependencies via gk20a.h into pmu_gk20a.h. As a result
pmu_gk20a.h is now removed from gk20a.h
JIRA-597
Change-Id: Id54b2684ca39362fda7626238c3116cd49e92080
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1804283
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add READ_SM_ERROR IOCTL support to TSG level.
Moved the struct to save the sm_error details
from gr to tsg as the sm_error support is context
based, not global.
Also corrected MISRA 21.1 error in header file.
nvgpu_dbg_gpu_ioctl_write_single_sm_error_state and
nvgpu_dbg_gpu_ioctl_read_single_sm_error_state
functions are modified to use the tsg struct
nvgpu_tsg_sm_error_state.
Bug 200412642
Change-Id: I9e334b059078a4bb0e360b945444cc4bf1cc56ec
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1794856
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix MISRA rule 10.1 violations involving gk20a_nonstall_ops
enums by replacing them with with corresponding #defines.
Because these values can be used in expressions that require
unsigned values (e.g. bitwise OR) we cannot use enums.
The g->ce2.isr_nonstall() function was previously returning an
int that was a combination of gk20a_nonstall_ops enum bits which
led to the violations.
JIRA NVGPU-650
Change-Id: I6210aacec8829b3c8d339c5fe3db2f3069c67406
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1796242
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix MISRA rule 10.1 violations involving need_reset var
in gk20a_gr_isr().
Changed type to bool and set it to true any time one of
the pending condition checks returns non-zero.
JIRA NVGPU-650
Change-Id: I2f87b68d455345080f7b4c68cacf515e074c671a
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1793633
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In the current code, gk20a.h includes io.h which gets directly included
in a lot of other files. io.h contains methods which uses a struct
gk20a as a parameter leading to a circular dependency between io.h
and gk20a.h. This can be mitigated by removing io.h from gk20a.h as
part of larger effort to moving gk20a.h to nvgpu/gk20a.h
JIRA NVGPU-597
Change-Id: I93e504fa9371b88152737b342a75580c65e8f712
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1787316
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>