Commit Graph

35 Commits

Author SHA1 Message Date
Debarshi Dutta
d8e8eb65d3 nvgpu: gpu: separate runlist submit from construction
This patch primary separates runlist modification from
runlist submits.

Instead of submitting the runlist(domain) immediately after
modification, a worker thread interface is now being used to
synchronously schedule runlist submits. If the runlist being
scheduled is currently active, the submit happens instantly,
otherwise, it will happen in the next iteration when the nvs
thread will schedule the domain. This external interface uses
a condition variable to wait for the completion of the
synchronous submits.

A pending_update variable is used to synchronize domain memory
swaps just before being submitted.

To facilitate faster scheduling via the NVS thread, nvgpu_dom
itself contains an array of rl_domain pointers. This can then
be used to select the appropriate rl_domain directly for scheduling
as against the earlier approach of maintaining nvs domains and rl
domains in sync everytime.

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-13 16:36:19 -07:00
Divya
dcec7f184e gpu: nvgpu: disable elpg earlier in recovery path
When MMU fault happens, if the id_type = 1, that means
fault happened in TSG. So in that path we set the error
notifier and let userspace know about faulty channel.
During this, we check if debugger is attached or not by
reading gr_gpc0_tpc0_sm0_dbgr_control0_r() register.
During this time ELPG is enabled and this read causes
IDLE SNAP error for ELPG.

To resolve this, move CG/PG disable function call
early in fifo recover code path. This ensures that
ELPG is disabled early before any read happens for any
GR register.

Bug 3660592

Change-Id: Ie5d01b7ccf00167b58f260e9142aa5deb2a08be4
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit f09e429f2d142c20529bedc05acf193805e1bb25)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720655
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-06-01 06:41:57 -07:00
Sagar Kamble
11819380e8 gpu: nvgpu: remove invalid NULL check
mmufault will not be NULL when recovery is triggered of type
RC_TYPE_MMU_FAULT. NULL check for it at one place followed
by dereference is flagged as CERT issue.

Remove this invalid NULL check.

CID 17871
Bug 3512546

Change-Id: Ice8035f5df33c45ef0afb4c2a1395e0d5455652c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2692544
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-04-08 00:00:36 -07:00
Konsta Hölttä
9be8fb80a2 gpu: nvgpu: make tsgs domain aware
Start transitioning from an assumption of a single runlist buffer to the
domain based approach where a TSG is a participant of a scheduling
domain that then owns has a runlist buffer used for hardware scheduling.

Concretely, move the concept of a runlist domain up to the users of the
runlist code. Modifications to a runlist need to specify which domain is
modified.

There is still only the default domain that is created at boot.

Jira NVGPU-6425

Change-Id: Id9a29cff35c94e0d7e195db382d643e16025282d
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621213
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-11 20:39:42 -08:00
Konsta Hölttä
3cf796b787 gpu: nvgpu: move active bitmaps to domain
Move the active_channels and active_tsgs bitmaps from struct
nvgpu_runlist to struct nvgpu_runlist_domain. A TSG and its channels are
currently active as part of a runlist; in the future, a runlist may be
switched from multiple domains that each are a collection of TSGs.

The changes are still internal to the runlist code. Users of runlists
need no modifications.

Jira NVGPU-6425

Change-Id: I2d0e98e97f04b9716bc3f4890cf881735d0ab664
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618387
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-11-03 20:55:08 -07:00
Alex Waterman
c55f7d624c gpu: nvgpu: Use runlist struct in construction APIs
Use a struct nvgpu_runlist pointer for the runlist update and
construction APIs.

This gets rid of the runlist ID being passed into the runlist
code for most of the normal APIs. Some recovery and suspect APIs
still use runlist ID masks since they may work with multiple
runlists at a time. These will be updated in the future.

Jira NVGPU-6425

Change-Id: Ib8d7a6aad0201af62267099cd993d130504478e8
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470307
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-12 11:24:37 -07:00
Alex Waterman
5bf229dcd5 gpu: nvgpu: Rename runlist_id to id
Rename the runlist_id field in struct nvgpu_runlist to just id.
The runlist part is redundant given that this id is already in
'struct nvgpu_runlist'.

Change-Id: Ie2ea98f65d75e5e46430734bd7a7f6d6267c7577
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470306
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-02-19 15:16:46 -08:00
Antony Clince Alex
52b33022e7 gpu: nvgpu: gv11b: skip setting error notifier during deferred reset
Skip setting error notifier on MMUFAULT when a debugger is connected and
debugging is enabled(mmu_debug_mode=on). At present, error notifier causes
the application to teardown the channels and debugger will not be able to
collect any data.

Bug 200632771

Change-Id: Idc4141990d4e2fb4714de9bfd31cfea5e7dcd52a
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2477253
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-02-01 13:18:09 -08:00
Alex Waterman
7a1c65c65d gpu: nvgpu: Rename dbg_rec() to rec_dbg()
Since this is backwards compared to other examples (pte_dbg, etc)
this makes more of the dbg helper macros consistent in syntax.

Change-Id: I98e30fd8967b7a86b3902878fecbe91440afa9b3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2472520
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-01-22 07:04:54 -08:00
Alex Waterman
11d3785faf gpu: nvgpu: Rename struct nvgpu_runlist_info, fields in fifo
Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the
info is not necessary. struct nvgpu_runlist is soon to be a
first class object among the nvgpu object model.

Also rename the fields runlist_info and active_runlist_info to
simply runlists and active_runlists respectively. Again the info
text is just not necessary and somewhat misleading. These structs
_are_ the runlist representations in SW; they are not merely
informational.

Also add an rl_dbg() macro to print debug info specific to
runlist management and some debug prints specifying the runlist
topology for the running chip.

Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-01-20 21:56:33 -08:00
Alex Waterman
7e99a68e34 gpu: nvgpu: Add basic recovery debugging messages
Add basic recovery messages that describe what's happening during
the recovery process. Hide this under a new recovery specific GPU
debug log flag. The logs look like:

[  276.000733] nvgpu: 17000000.gv11b                gv11b_fifo_recover:162  [DBG]  REC | Recovery starting
[  276.000737] nvgpu: 17000000.gv11b                gv11b_fifo_recover:163  [DBG]  REC |   ID      = 0
[  276.000741] nvgpu: 17000000.gv11b                gv11b_fifo_recover:164  [DBG]  REC |   id_type = TSG
[  276.000745] nvgpu: 17000000.gv11b                gv11b_fifo_recover:165  [DBG]  REC |   rc_type = MMU fault
[  276.000748] nvgpu: 17000000.gv11b                gv11b_fifo_recover:166  [DBG]  REC |   Engine bitmask: 0x0
[  276.000753] nvgpu: 17000000.gv11b                gv11b_fifo_recover:170  [DBG]  REC | Acquiring engines_reset_mutex
[  276.000756] nvgpu: 17000000.gv11b                gv11b_fifo_recover:174  [DBG]  REC | Acquiring runlist_lock for active runlists
[  276.000764] nvgpu: 17000000.gv11b                gv11b_fifo_recover:185  [DBG]  REC | Channels bound to this TSG:
[  276.000767] nvgpu: 17000000.gv11b                gv11b_fifo_recover:190  [DBG]  REC |   0 | chid 511
[  276.001098] nvgpu: 17000000.gv11b                gv11b_fifo_recover:222  [DBG]  REC | PBDMA   Bitmask: 0x1
[  276.001102] nvgpu: 17000000.gv11b                gv11b_fifo_recover:228  [DBG]  REC | Runlist Bitmask: 0x1
[  276.001106] nvgpu: 17000000.gv11b                gv11b_fifo_recover:240  [DBG]  REC | Disabling RL scheduler now
[  276.001126] nvgpu: 17000000.gv11b                gv11b_fifo_recover:246  [DBG]  REC | Disabling CG/PG now
[  276.189348] nvgpu: 17000000.gv11b                gv11b_fifo_recover:259  [DBG]  REC | Clearing PBDMA_FAULTED, ENG_FAULTED in CCSR register
[  276.191972] nvgpu: 17000000.gv11b                gv11b_fifo_recover:264  [DBG]  REC | Disabling TSG
[  276.191983] nvgpu: 17000000.gv11b                gv11b_fifo_recover:279  [DBG]  REC | Preempting runlists for RC
[  276.192001] nvgpu: 17000000.gv11b                gv11b_fifo_recover:288  [DBG]  REC | Polling for TSG to be off PBDMA
[  276.192012] nvgpu: 17000000.gv11b                gv11b_fifo_recover:296  [DBG]  REC |   Done!
[  276.192016] nvgpu: 17000000.gv11b                gv11b_fifo_recover:306  [DBG]  REC | Resetting relevant engines
[  276.192020] nvgpu: 17000000.gv11b                gv11b_fifo_recover:318  [DBG]  REC |   Engine bitmask for RL 0: 0xd
[  276.192024] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=0
[  276.209567] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.209572] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=2
[  276.214290] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.214295] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=3
[  276.224986] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.225013] nvgpu: 17000000.gv11b                gv11b_fifo_recover:377  [DBG]  REC | Re-enabling runlists
[  276.225034] nvgpu: 17000000.gv11b                gv11b_fifo_recover:383  [DBG]  REC | Re-enabling CG/PG
[  276.225134] nvgpu: 17000000.gv11b                gv11b_fifo_recover:394  [DBG]  REC | Releasing engines reset mutex

Note the "REC |" which lets one easily do:

  $ dmesg | grep "REC |"

To get a clear ubobstrructed view of the recovery progress in the dmesg
log.

JIRA NVGPU-5606

Change-Id: I183f2b5ac54edc60ee894a82111723e27aa5c46b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392991
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Alex Waterman
0f5818b89e gpu: nvgpu: Condition debug dump on recovery profiling
If recovery sequence profiling is enabled skip the debug dump that
happens during an MMU fault. This prevents the debug dump from
dominating the time spent by the recovery sequence. The debug dump
is severly limited in speed by the (lack of) UART bandwidth.

JIRA NVGPU-5606

Change-Id: Ifc7c326d33d9115d58b13c0fa42ec4bb7acb3075
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382591
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
1bcdc306a0 gpu: nvgpu: Add gv11b recovery profiling
Add some basic profiling to the gv11b recovery sequence. This captures
the high level events. Subsequent patches start to dig into the
subsections in more detail.

JIRA NVGPU-5606

Change-Id: I488a448ca1cbf961651588e24685e2a5b4420c44
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368302
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
ab2b0b5949 gpu: nvgpu: Set unserviceable flag early during RC
During recovery, we set ch->unserviceable at the end after we preempt
the TSG and reset the engines. It might be too late and user-space
might submit more work to the broken channel which is not desirable.
Move setting this unserviceable flag right at the start
of recovery sequence.
Another thread doing a submit can still read the unserviceable flag
just before it is set here, leaving that submit stuck if recovery
completes before the submit thread advances enough to set up a post
fence visible for other threads. This could be fixed with a big lock
or with a double check at the end of the submit code after the job
data has been made visible.
We still release the fences, semaphore and error notifier wait queues
at the end; so user-space would not trigger channel unbind while
channel is being recovered.

Also, change the handle_mmu_fault APIs to return void as the
debug_dump return value is not used in any of the caller APIs.

JIRA NVGPU-5843

Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
881a6f35be gpu: nvgpu: Trigger quiesce on PBDMA preempt fail
During recovery, we preempt the faulty TSG from PBDMA and engines.
If the TSG preempt on PBDMA times out(timeout = 100ms), the PBDMA
might be hung state. We do not reset the HOST during recovery, so
stuck PBDMAs are unrecoverable.
Abort the recovery and trigger GPU to quiesce as there is no way
back.

Triggering Quiesce from recovery sequence should be fine as the only
redundant operation will be write to FIFO_RUNLIST_PREEMPT register.
The error notifiers will eventually be set by Quiesce thread.

Bug 2768005
JIRA NVGPU-4631

Change-Id: I914b9379aa8e48014e6ddace9abe47180a072863
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368187
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
c6908922e5 gpu: nvgpu: move generic preempt hals to common
- Move fifo.preempt_runlists_for_rc and fifo.preempt_tsg hals to common
source file as nvgpu_fifo_preempt_runlists_for_rc and
nvgpu_fifo_preempt_tsg.

Jira NVGPU-4881

Change-Id: I31f7973276c075130d8a0ac684c6c99e35be6017
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2323866
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
44f12288ad gpu: nvgpu: add mc.reset_engine hal for nvgpu-next
Engine reset process has changed for nvgpu-next. Add mc.reset_engine
gops for nvgpu-next.
Modify engine reset functions to use mc.reset_engine hal.

Jira NVGPU-5145

Change-Id: I176800212042eaef71c8cbd4bc499805c5af0e60
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2312485
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sagar Kamble
3444d729fd gpu: nvgpu: update compiling out cg changes
nvgpu_cg_pg_enable|disable functions are non-safe hence compile out
power_features.c. Corresponding functions from cg.c are also not
compiled. for e.g. nvgpu_cg_elcg_enable|disable, nvgpu_cg_blcg-
_mode_enable|disable, nvgpu_cg_slcg_gr_perf_ltc_load_enable|disable,
nvgpu_cg_elcg_set_elcg|blcg|slcg_enabled.
BLCG handling in nvgpu_cg_set_mode is non-safe hence compile it out
as well.

JIRA NVGPU-2175

Change-Id: I9940cc418d84eb30979dd50a2ed4a132473312fe
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Thomas Fleury
9836420185 gpu: nvgpu: no engine reset when recovery is disabled
Compile out nvgpu_engine_reset and nvgpu_gr_reset when
CONFIG_NVGPU_RECOVERY is not defined.

Jira NVGPU-3886

Change-Id: I7ff67cf3680dfff2130e2a9e16d68b5a3f684bd4
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2175430
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-15 13:26:09 -07:00
Thomas Fleury
c7b41f106d gpu: nvgpu: add CONFIG_NVGPU_RECOVERY
Add CONFIG_NVGPU_RECOVERY in order to conditionally compile
recovery code. This code will be removed from safety build
when sw quiesce state is implemented, and negative tests are
disabled or modified such that they do not expect recovery
to happen.

Added static inline functions for recovery handlers, when
CONFIG_NVGPU_RECOVERY is not defined. These inline functions
can later be wired to the sw quiesce functions.

Also moved gv11b recovery code to non-fusa, as it will ultimately
be removed from safety build.

Jira NVGPU-3871

Change-Id: Ia705b059fab6120899c7e15082f2a0f51ff51dc9
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2166074
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-07 08:25:57 -07:00
Debarshi Dutta
69ef86e627 gpu: nvgpu: move safe code HAL files to fusa
This patch moves all the safe static and non-static functions as well
as its dependencies such as static declared structs into files with
_fusa.c extension. If the original file is left with no functions
remaining then the file is deleted.

Added changes in Makefile, Makefile.sources, nvgpu-hal-new.yaml for
compilation.

Jira NVGPU-3690

Change-Id: I81af67c308705faf8a681df63a6778e7de2076cf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146761
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-03 02:46:15 -07:00
Thomas Fleury
c2eb26436a gpu: nvgpu: Add doxygen documentation in runlist.h
Removed the following unused fields from runlist context:
- total_entries
- stopped
- support_tsg

Renamed:
- nvgpu_fifo_runlist_set_state -> nvgpu_runlist_set_state

Removed RUNLIST_INVALID_ID which was redundant with
NVGPU_INVALID_RUNLIST_ID.

Jira NVGPU-3594

Change-Id: I23d1abdf87b73bc0138816dab6659249f2602b9f
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2139520
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-24 17:36:29 -07:00
Sagar Kamble
3f08cf8a48 gpu: nvgpu: rename feature Make and C flags
Name the Make and C flag variables consistently wih syntax:
CONFIG_NVGPU_<feature name>

s/NVGPU_DEBUGGER/CONFIG_NVGPU_DEBUGGER
s/NVGPU_CYCLESTATS/CONFIG_NVGPU_CYCLESTATS
s/NVGPU_USERD/CONFIG_NVGPU_USERD
s/NVGPU_CHANNEL_WDT/CONFIG_NVGPU_CHANNEL_WDT
s/NVGPU_FEATURE_CE/CONFIG_NVGPU_CE
s/NVGPU_GRAPHICS/CONFIG_NVGPU_GRAPHICS
s/NVGPU_ENGINE/CONFIG_NVGPU_FIFO_ENGINE_ACTIVITY
s/NVGPU_FEATURE_CHANNEL_TSG_SCHED/CONFIG_NVGPU_CHANNEL_TSG_SCHED
s/NVGPU_FEATURE_CHANNEL_TSG_CONTROL/CONFIG_NVGPU_CHANNEL_TSG_CONTROL
s/NVGPU_FEATURE_ENGINE_QUEUE/CONFIG_NVGPU_ENGINE_QUEUE
s/GK20A_CTXSW_TRACE/CONFIG_NVGPU_FECS_TRACE
s/IGPU_VIRT_SUPPORT/CONFIG_NVGPU_IGPU_VIRT
s/CONFIG_TEGRA_NVLINK/CONFIG_NVGPU_NVLINK
s/NVGPU_DGPU_SUPPORT/CONFIG_NVGPU_DGPU
s/NVGPU_VPR/CONFIG_NVGPU_VPR
s/NVGPU_REPLAYABLE_FAULT/CONFIG_NVGPU_REPLAYABLE_FAULT
s/NVGPU_FEATURE_LS_PMU/CONFIG_NVGPU_LS_PMU
s/NVGPU_FEATURE_POWER_PG/CONFIG_NVGPU_POWER_PG

JIRA NVGPU-3624

Change-Id: I8b2492b085095fc6ee95926d8f8c3929702a1773
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2130290
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-11 09:46:24 -07:00
Deepak Nibade
649a2b57a8 gpu: nvgpu: add debugger flag for hal.gr.gr unit
Add NVGPU_DEBUGGER flag for common.hal.gr.gr unit and corresponding
hals.

Also add this flag for deferred reset functionality

Jira NVGPU-3506

Change-Id: Iee4fbc1305346bb4d779cd69e8fd5539cb07206b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2130149
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-06 16:28:44 -07:00
Mahantesh Kumbar
b691df5a02 gpu: nvgpu: compile out PMU members & headers for safety
-compile out nvgpu_pmu members which are not required for
safety buid & modified source as required to support same.
-compile out PMU headers include which are not required for
safety code
-Removed unnecessary PMU header includes from some files

JIRA NVGPU-3418

Change-Id: I5364b1b16c46637d229e82745dd2846cb6335a72
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2128228
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-06 06:55:58 -07:00
Mahantesh Kumbar
90aee0086f gpu: nvgpu: rename NVGPU_LS_PMU to NVGPU_FEATURE_LS_PMU
renamed NVGPU_LS_PMU to NVGPU_FEATURE_LS_PMU to follow
nvgpu naming standard
Compile out LS PMU files when PMU RTOS support is
disabled for safety build by setting NVGPU_LS_PMU
build flag to 0

JIRA NVGPU-3418

Change-Id: Ib09924ac25657e932723c10be573f2f701cb7bea
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2127794
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-30 19:27:14 -07:00
Mahantesh Kumbar
120defb7cb gpu: nvgpu: compile out PMU mutex code for safety
Compile out PMU mutex calls called from other unit when
PMU RTOS support is disabled for safety build by setting
NVGPU_LS_PMU build flag to 0

NVGPU JIRA-3418

Change-Id: I040a744d5102f7fd889d4e8ad6e94129eadb73dd
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2124698
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-30 19:25:42 -07:00
Mahantesh Kumbar
3d1169544f gpu: nvgpu: alloc space for PMU's struct nvgpu_pmu at runtime
Allocating space for struct nvgpu_pmu at run time as part of
nvgpu_pmu_early_init() stage and made required changes to
dependent fiels as needed.

JIRA NVGPU-1972

Change-Id: I2d1c86d713e533c256ba95b730aa2e9543a66438
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2110109
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-23 00:56:55 -07:00
Mahantesh Kumbar
0a64f6cb2d gpu: nvgpu: PMU pmu.c/h header include cleanup
Some headers are not required to include in pmu.c/h as
lot of PMU code restructure happened, so removed headers
which not required anymore.

JIRA NVGPU-1972

Change-Id: Iead7f049d167cdaaaf7c75c2a5e19ae7b068fe6b
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2110108
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-23 00:56:45 -07:00
Debarshi Dutta
17486ec1f6 gpu: nvgpu: rename tsg_gk20a and channel_gk20a structs
rename struct tsg_gk20a to struct nvgpu_tsg and rename struct
channel_gk20a to struct nvgpu_channel

Jira NVGPU-3248

Change-Id: I2a227347d249f9eea59223d82f09eae23dfc1306
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2112424
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-06 02:56:53 -07:00
Seema Khowala
cfb4ff0bfb gpu: nvgpu: rename struct fifo_gk20a
Rename
struct fifo_gk20a -> nvgpu_fifo

JIRA NVGPU-2012

Change-Id: Ifb5854592c88894ecd830da092ada27c7f05380d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109625
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-03 16:25:43 -07:00
Seema Khowala
170d7464d6 gpu: nvgpu: move fifo_gk20a.[ch] to hal/fifo
Move fifo_gk20a struct to fifo.h
Move fifo_gk20a.[ch] to hal/fifo

Add missing includes for fifo subunits.

JIRA NVGPU-2012

Change-Id: I8bf5402bd5a9f8ff9f6a818cee553b57e117f3bc
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109012
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 23:40:42 -07:00
Seema Khowala
3392a72d1a gpu: nvgpu: move runlist related struct and defines
Move from fifo_gk20a.h to runlist.h
RUNLIST_DISABLED
RUNLIST_ENABLED
MAX_RUNLIST_BUFFERS
struct fifo_runlist_info_gk20a

Rename
fifo_runlist_info_gk20a -> nvgpu_runlist_info

JIRA NVGPU-2012

Change-Id: Ib7e3c9fbf77ac57f25e73be8ea64c45d4c3155ff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109008
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 23:39:42 -07:00
Thomas Fleury
258a6141fd gpu: nvgpu: rename runlist functions
Renamed:
- gk20a_runlist_reload -> nvgpu_runlist_reload
- gk20a_fifo_interleave_level_name -> nvgpu_runlist_interleave_level_name
- gk20a_runlist_update_for_channel -> nvgpu_runlist_update_for_channel
- nvgpu_fifo_lock_active_runlists -> nvgpu_runlist_lock_active_runlists
- nvgpu_fifo_unlock_active_runlists -> nvgpu_runlist_unlock_active_runlists
- nvgpu_fifo_get_runlists_mask -> nvgpu_runlist_get_runlists_mask
- nvgpu_fifo_unlock_runlists -> nvgpu_runlist_unlock_runlists
- gk20a_runlist_update -> nvgpu_runlist_update

Jira NVGPU-3198

Change-Id: Ifc5ad2aae546614667c174643ee07283d2716adc
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2108029
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-30 12:46:02 -07:00
Seema Khowala
60633ca551 gpu: nvgpu: move gv11b rc code to rc_gv11b.c
Move chip specific recovery code for volta onwards
architecture to hal/rc/rc_gv11b.c

Rename
fifo.teardown_ch_tsg -> fifo.recover
gk20a_runlist_update_locked -> nvgpu_runlist_update_locked

Remove
Unused h/w headers from fifo_gv11b.c

Use local variable f instead of g->fifo

JIRA NVGPU-1314

Change-Id: Ia535bbe4780e7241fdd911a8f577c6b98cf0fe53
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2102897
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-24 20:23:06 -07:00