Currently in case of any fecs error, we only dump fecs
cxtsw fw related registers, mailboxes and trace registers.
With this change, we want to ensure we dump gpccs register
space as well. This will help in debugging ctxsw related
failures
JIRA NVGPU-9560
Bug 3907163
Change-Id: I61e25883da4455ea1412ca70c5fc3377d9a786a3
Signed-off-by: Kishan <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2850402
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
- When DISALLOW cmd is sent from driver to PMU the actual
completion of the disallow will be acknowledged by PMU
via a PG EVENT: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
DISALLOW_ACK through ASYNC_CMD_RESP msg.
- After disallow command is sent from the driver, NvGPU driver
waits/polls for disallow command ack. This is sent immediately
by msg framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
ack from PMU, it can result in erros as PMU is still
processing DISALLOW cmd but the driver progressed further.
Bug 3580271
Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500
(cherry picked from commit fb019bf43a)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694312
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Typically, the PMU init thread will finish up long
before the golden context image has been initialized,
which means that ELPG hasn't truly been enabled at that
point.
Create a new function, nvgpu_pmu_reenable_pg(), which
checks if elpg had been enabled (non-zero refcnt), and
if so, disables then re-enables it.
Call this function from gk20a_alloc_obj_ctx() after
the golden context image has been initialized to ensure
that elpg is truly enabled.
Manually ported from dev-main
Bug 200543218
Change-Id: I0e7c4f64434c5e356829581950edce61cc88882a
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2245768
(cherry picked from commit 077b6712b5a40340ece818416002ac8431dc4138)
Reviewed-on: https://git-master.nvidia.com/r/2250091
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
A call to exit the PMU state machine/kthread must
be prioritized over any other state change.
It was possible to set the state as PMU_STATE_EXIT,
signal the kthread and overwrite the state before
the kthread has had the chance to exit its loop.
This may lead to a "lost" signal, resulting in
indefinite wait during the destroy sequence.
Faulting sequence:
1. pmu_state = PMU_STATE_EXIT in nvgpu_pmu_destroy()
2. cond_signal()
3. pmu_state = PMU_STATE_LOADING_PG_BUF
4. PMU kthread wakes up
5. PMU kthread processes PMU_STATE_LOADING_PG_BUF
6. PMU kthread sleeps
7. nvgpu_pmu_destroy() waits indefinitely
This patch adds a sticky flag to indicate PMU_STATE_EXIT,
irrespective of any subsequent changes to pmu_state.
The PMU PG init kthread may wait on a call to
NVGPU_COND_WAIT_INTERRUPTIBLE, which requires a
corresponding call to nvgpu_cond_signal_interruptible()
as the core kernel code requires this task mask to
wake-up an interruptible task.
Bug 2658750
Bug 200532122
Change-Id: I61beae80673486f83bf60c703a8af88b066a1c36
Signed-off-by: Abhiroop Kaginalkar <akaginalkar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2177112
(cherry picked from commit afa49fb073a324c49a820e142aaaf80e4656dcc6)
Reviewed-on: https://git-master.nvidia.com/r/2190733
Tested-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This is to prevent GPU (and thus EMC) frequency from being boosted
from time to time when system is completely idle. It's caused by max
GPU load being incorrectly reported by perfmon. When the issue
happens, it can be observed that max load is reported but busy_cycles
read from PMU is actually zero.
Even though busy and total cycles returned by PMU may not be
completely accurate when counter overflows, the counters
accumulated so far still have some value that we shouldn't ignore.
OTOH, returning max load could be the least accurate approximation in
such cases. So let's just clear the interrupt status and let rest of
the code handle the exception cases.
Bug 200545546
Change-Id: I6882ae265029e881f5417fb2b82005b0112b0fda
Signed-off-by: Leon Yu <leoyu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180771
Reviewed-by: Peng Liu <pengliu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mubushir Rahman <mubushirr@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new power/clock gating functions that can be called by
other units.
New clock_gating functions will reside in cg.c under
common/power_features/cg unit.
New power gating functions will reside in pg.c under
common/power_features/pg unit.
Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable
elpg and also in gr_gk20a_elpg_protected macro to access gr registers.
Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled
and slcg_enabled thread safe.
JIRA NVGPU-2014
Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2025493
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from c905858565 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2108406
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
PMU ucode is updated to use acrlib from t19xbringup
branch.
We are seeing build issues due to incompatibility
with acrlib from tegra_acr branch.
CTX_DMA aperture to be used for loading LS falcons
needed update in the local acrlib.
Bug 2400729.
Change-Id: Iad00a332acfac307c389bde504893a87abaf7460
Signed-off-by: Deepak <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1849182
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Removed _dmem postfix to some functions which
can be common for DMEM & EMEM queue, and
made changes as needed.
-Defined flcn_queue_push_emem() &
flcn_queue_pop_emem() functions to
to read/write queue data to/from EMEM
-Defined flcn_queue_init_emem_queue()
function to assign EMEM specific functions
to support EMEM queue type.
-Defined QUEUE_TYPE_DMEM to support
DMEM based queue.
-Defined QUEUE_TYPE_EMEM to support
EMEM based queue.
-Modified nvgpu_flcn_queue_init() to call queue
type flcn_queue_init_dmem/emem_queue()
function to assign its ops.
JIRA NVGPU-1161
Change-Id: I06333fa318b7ca4137c977ad63f5a857e7b36cc8
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1841084
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move all files under perf/* to pmu_perf/* since pmu_perf is logically
appropriate name for PMU's perf unit
Rename perf.c to pmu_perf.c
Also rename the HAL from gops.perf to gops.pmu_perf
Jira NVGPU-1102
Change-Id: I79e73b8b102ddf6b49783c2f38d861cd43b0b4c6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1819301
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Removed ACR support code from PMU module
- Deleted ACR related ops from pmu ops
- Deleted assigning of ACR related ops
using pmu ops during HAL init
-Removed code related to ACR bootstrap &
dependent code for all chips.
JIRA NVGPU-1147
Change-Id: I47a851a6b67a9aacde863685537c34566f97dc8d
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1817990
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In nvgpu repository, we have multiple accesses to methods in
pmu_gk20a.h which have register accesses. Instead of directly invoking
these methods, these are now called via HALs. Some common methods such
as pmu_wait_message_cond which donot have any register accesses
are moved to pmu_ipc.c and the method declarations are moved
to pmu.h. Also, changed gm20b_pmu_dbg to
nvgpu_dbg_pmu all across the code base. This would remove all
indirect dependencies via gk20a.h into pmu_gk20a.h. As a result
pmu_gk20a.h is now removed from gk20a.h
JIRA-597
Change-Id: Id54b2684ca39362fda7626238c3116cd49e92080
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1804283
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule 10.4 only allows the usage of arithmetic operations on
operands of the same essential type category.
Adding "U" at the end of the integer literals to have same type of
operands when an arithmetic operation is performed.
This fix violations where an arithmetic operation is performed on
signed and unsigned int types.
Jira NVGPU-992
Change-Id: Iab512139a025e035ec82a9dd74245bcf1f3869fb
Signed-off-by: Sai Nikhil <snikhil@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1789425
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule-15.6 requires that all if-else blocks be enclosed in braces,
including single statement blocks. Fix errors due to single statement
if blocks without braces, introducing the braces.
JIRA NVGPU-671
Change-Id: I497fbdb07bb2ec5a404046f06db3c713b3859e8e
Signed-off-by: Srirangan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1799525
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This fixes PMU halt caused due to IMEM miss exception
when calling apCtrlEnable/apCtrlDisable.
IMEM miss exception occurs as overlay containing these
functions is not loaded in the PMU's IMEM. This version
loads the overlays before calling these functions.
Bug 2167968.
Change-Id: I37c75c59b1b545571d2bf94f07a7ecb3a814af54
Signed-off-by: Deepak Goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1801250
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Renamed "struct pmu_queue" to "struct
nvgpu_falcon_queue" & moved to falcon.h
-Renamed pmu_queue_* functions to flcn_queue_* &
moved to new file falcon_queue.c
-Created ops for queue functions in struct
nvgpu_falcon_queue to support different queue
types like DMEM/FB-Q.
-Created ops in nvgpu_falcon_engine_dependency_ops
to add engine specific queue functionality & assigned
correct HAL functions in hal*.c file.
-Made changes in dependent functions as needed to replace
struct pmu_queue & calling queue functions using
nvgpu_falcon_queue data structure.
-Replaced input param "struct nvgpu_pmu *pmu" with
"struct gk20a *g" for pmu ops pmu_queue_head/pmu_queue_tail
& also for functions gk20a_pmu_queue_head()/
gk20a_pmu_queue_tail().
-Made changes in nvgpu_pmu_queue_init() to use nvgpu_falcon_queue
for PMU queue.
-Modified Makefile to include falcon_queue.o
-Modified Makefile.sources to include falcon_queue.c
Change-Id: I956328f6631b7154267fd5a29eaa1826190d99d1
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1776070
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In order to avoid the circular dependencies,
rearrange the static inline functions from
gk20a.h file.
Moved gk20a_gr_flush_channel_tlb function to
gr_gk20a.c and removed the #include gr_gk20a.h
from gk20a.h
Added a helper function utils.h to
move all generic static inline functions which
have no reference to gpu related structures.
ptimer related functions are moved to
ptimer.h
Implementations for as and pmu are moved to
corresponding files.
JIRA NVGPU-624
Change-Id: I4e956326e773ba037bf3a1696cc4c462085dbbe5
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1781941
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Commit c61e21c868 fixed race codition in PMU
state transition.
- The race condition is such that PMU response(intr callback for messages) can
run faster than kthread posting commands to PMU and thus PMU message callback
may skip important pmu state change.
- Commit c61e21c868 introduced a fix where PMU
state change was only updated from callback while other places can only update
pmu_state variable
- However, this commit introduced a regression as follows:
- When PMU state is PMU_STATE_INIT_RECEIVED, we loop over every engine
supported by GPU --> If state = PMU_STATE_INIT_RECEIVED, change the state
to PMU_STATE_ELPG_BOOTING and init ELPG else If state != PMU_STATE_INIT_RECEIVED
throw an error saying "PMU INIT not received"
- Now, if GPU supports multiple engines, first engine will check that
pmu_state is PMU_STATE_INIT_RECEIVED and change it to PMU_STATE_ELPG_BOOTING
However, from second engine onwards, since state is already changed to
PMU_STATE_ELPG_BOOTING, all engines except first engine start throwing
error "PMU INIT not received"
- This patch fixes the issue by changing pmu state from
PMU_STATE_INIT_RECEIVED to PMU_STATE_ELPG_BOOTING only once.
Bug 200372838
JIRA EVLR-2164
Change-Id: Ic8c954d14acb1d6ec3adcbc4bcf4d4745542d9f0
Signed-off-by: Deepak Bhosale <dbhosale@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1769814
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Deepak Goyal <dgoyal@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove Gr engine reset during ELPG entry.
Engine reset is causing clock gating logic to get reset thus
clock gating gets disabled during ELPG entry sequence.
It leads to higher power numbers observed at light graphics.
Removing GR reset during ELPG entry helped save power.
Bug 2180198
Change-Id: I957951eb93f9d044f4d9a908f2b56a4903dfbfad
Signed-off-by: Deepak Goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1757695
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Added vf change inject support for gv10x
- Updated clk_pmu_vf_inject() to fill required data
for pascal or volta vf change inject support
- Added new ctrl clk interface for gv10x clk domain list
- Added pmu interface for gv10x clk domain list &
vf change inject request
- Modified clk cmd, msg & RPC id's to match
with chips_a_23609936 branch
Bug 200399373
Change-Id: Ib9dc10073386f63bdfd92110c7ec3e09b1c484ce
Signed-off-by: Vaikundanathan S <vaikuns@nvidia.com>
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1700746
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
PMU ucode is updated to include LDIV slowdown factor in gr_init_param command.
- Defined a new version gr_init_param_v2.
- Updated the PMU FW version code.
- Set the LDIV slowdown factor to 0x1e by default.
- Added sysfs entry to program ldiv_slowdown factor at runtime.
Bug 200391931
Change-Id: Ic66049588c3b20e934faff3f29283f66c30303e4
Signed-off-by: Deepak Goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1674208
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Mostly just including necessary includes to make sure that
global function declarations actually match their implementations.
Also work around pointer munging warning:
/build/ddpx/linux/kernel/nvgpu/drivers/gpu/nvgpu/common/pmu/pmu.c: In function 'nvgpu_pmu_process_init_msg':
/build/ddpx/linux/kernel/nvgpu/drivers/gpu/nvgpu/common/pmu/pmu.c:348:4: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
(*(u32 *)gid_data.signature == PMU_SHA1_GID_SIGNATURE);
Work around this warning by simply moving the type punning.
This code is certainly dangerous - it assumes the endianness
of the header data is the same as the machine this code is
running on. Apparently it works, though, so this ignores
the warning.
JIRA NVGPU-525
Change-Id: Id704bae7805440bebfad51c8c8365e6d2b7a39eb
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1692454
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
clk_vin data structures updated as new calibration type (v20) is added.
GP106 header does not have vin calibration type.
Assuming V10 if calibration type is not V20.
Add fuse calibration for V20 type.
Bug 200399373
Change-Id: I9449de1ecb0d0873f3bc16f46660f93fab5b9eac
Signed-off-by: Vaikundanathan S <vaikuns@nvidia.com>
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1687591
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>