Following changes are added here to simplify the overall
sequence.
1) Remove deferred update for runlists. NVS worker thread
shall submit the updated runlist.
2) Moved Runlist mem swap inside update itself. Protect
the swap() and hw_submit() path with a spinlock. This
is temporary till GSP.
3) Enable Control-Fifo mode from nvgpu driver.
Jira NVGPU-8609
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icc52e5d8ccec9d3653c9bc1cf40400fc01a08fde
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757406
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch primary separates runlist modification from
runlist submits.
Instead of submitting the runlist(domain) immediately after
modification, a worker thread interface is now being used to
synchronously schedule runlist submits. If the runlist being
scheduled is currently active, the submit happens instantly,
otherwise, it will happen in the next iteration when the nvs
thread will schedule the domain. This external interface uses
a condition variable to wait for the completion of the
synchronous submits.
A pending_update variable is used to synchronize domain memory
swaps just before being submitted.
To facilitate faster scheduling via the NVS thread, nvgpu_dom
itself contains an array of rl_domain pointers. This can then
be used to select the appropriate rl_domain directly for scheduling
as against the earlier approach of maintaining nvs domains and rl
domains in sync everytime.
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
When MMU fault happens, if the id_type = 1, that means
fault happened in TSG. So in that path we set the error
notifier and let userspace know about faulty channel.
During this, we check if debugger is attached or not by
reading gr_gpc0_tpc0_sm0_dbgr_control0_r() register.
During this time ELPG is enabled and this read causes
IDLE SNAP error for ELPG.
To resolve this, move CG/PG disable function call
early in fifo recover code path. This ensures that
ELPG is disabled early before any read happens for any
GR register.
Bug 3660592
Change-Id: Ie5d01b7ccf00167b58f260e9142aa5deb2a08be4
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit f09e429f2d142c20529bedc05acf193805e1bb25)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720655
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Start transitioning from an assumption of a single runlist buffer to the
domain based approach where a TSG is a participant of a scheduling
domain that then owns has a runlist buffer used for hardware scheduling.
Concretely, move the concept of a runlist domain up to the users of the
runlist code. Modifications to a runlist need to specify which domain is
modified.
There is still only the default domain that is created at boot.
Jira NVGPU-6425
Change-Id: Id9a29cff35c94e0d7e195db382d643e16025282d
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621213
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Move the active_channels and active_tsgs bitmaps from struct
nvgpu_runlist to struct nvgpu_runlist_domain. A TSG and its channels are
currently active as part of a runlist; in the future, a runlist may be
switched from multiple domains that each are a collection of TSGs.
The changes are still internal to the runlist code. Users of runlists
need no modifications.
Jira NVGPU-6425
Change-Id: I2d0e98e97f04b9716bc3f4890cf881735d0ab664
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618387
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Skip setting error notifier on MMUFAULT when a debugger is connected and
debugging is enabled(mmu_debug_mode=on). At present, error notifier causes
the application to teardown the channels and debugger will not be able to
collect any data.
Bug 200632771
Change-Id: Idc4141990d4e2fb4714de9bfd31cfea5e7dcd52a
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2477253
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the
info is not necessary. struct nvgpu_runlist is soon to be a
first class object among the nvgpu object model.
Also rename the fields runlist_info and active_runlist_info to
simply runlists and active_runlists respectively. Again the info
text is just not necessary and somewhat misleading. These structs
_are_ the runlist representations in SW; they are not merely
informational.
Also add an rl_dbg() macro to print debug info specific to
runlist management and some debug prints specifying the runlist
topology for the running chip.
Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
If recovery sequence profiling is enabled skip the debug dump that
happens during an MMU fault. This prevents the debug dump from
dominating the time spent by the recovery sequence. The debug dump
is severly limited in speed by the (lack of) UART bandwidth.
JIRA NVGPU-5606
Change-Id: Ifc7c326d33d9115d58b13c0fa42ec4bb7acb3075
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382591
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
During recovery, we set ch->unserviceable at the end after we preempt
the TSG and reset the engines. It might be too late and user-space
might submit more work to the broken channel which is not desirable.
Move setting this unserviceable flag right at the start
of recovery sequence.
Another thread doing a submit can still read the unserviceable flag
just before it is set here, leaving that submit stuck if recovery
completes before the submit thread advances enough to set up a post
fence visible for other threads. This could be fixed with a big lock
or with a double check at the end of the submit code after the job
data has been made visible.
We still release the fences, semaphore and error notifier wait queues
at the end; so user-space would not trigger channel unbind while
channel is being recovered.
Also, change the handle_mmu_fault APIs to return void as the
debug_dump return value is not used in any of the caller APIs.
JIRA NVGPU-5843
Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
During recovery, we preempt the faulty TSG from PBDMA and engines.
If the TSG preempt on PBDMA times out(timeout = 100ms), the PBDMA
might be hung state. We do not reset the HOST during recovery, so
stuck PBDMAs are unrecoverable.
Abort the recovery and trigger GPU to quiesce as there is no way
back.
Triggering Quiesce from recovery sequence should be fine as the only
redundant operation will be write to FIFO_RUNLIST_PREEMPT register.
The error notifiers will eventually be set by Quiesce thread.
Bug 2768005
JIRA NVGPU-4631
Change-Id: I914b9379aa8e48014e6ddace9abe47180a072863
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368187
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvgpu_cg_pg_enable|disable functions are non-safe hence compile out
power_features.c. Corresponding functions from cg.c are also not
compiled. for e.g. nvgpu_cg_elcg_enable|disable, nvgpu_cg_blcg-
_mode_enable|disable, nvgpu_cg_slcg_gr_perf_ltc_load_enable|disable,
nvgpu_cg_elcg_set_elcg|blcg|slcg_enabled.
BLCG handling in nvgpu_cg_set_mode is non-safe hence compile it out
as well.
JIRA NVGPU-2175
Change-Id: I9940cc418d84eb30979dd50a2ed4a132473312fe
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add CONFIG_NVGPU_RECOVERY in order to conditionally compile
recovery code. This code will be removed from safety build
when sw quiesce state is implemented, and negative tests are
disabled or modified such that they do not expect recovery
to happen.
Added static inline functions for recovery handlers, when
CONFIG_NVGPU_RECOVERY is not defined. These inline functions
can later be wired to the sw quiesce functions.
Also moved gv11b recovery code to non-fusa, as it will ultimately
be removed from safety build.
Jira NVGPU-3871
Change-Id: Ia705b059fab6120899c7e15082f2a0f51ff51dc9
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2166074
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch moves all the safe static and non-static functions as well
as its dependencies such as static declared structs into files with
_fusa.c extension. If the original file is left with no functions
remaining then the file is deleted.
Added changes in Makefile, Makefile.sources, nvgpu-hal-new.yaml for
compilation.
Jira NVGPU-3690
Change-Id: I81af67c308705faf8a681df63a6778e7de2076cf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146761
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-compile out nvgpu_pmu members which are not required for
safety buid & modified source as required to support same.
-compile out PMU headers include which are not required for
safety code
-Removed unnecessary PMU header includes from some files
JIRA NVGPU-3418
Change-Id: I5364b1b16c46637d229e82745dd2846cb6335a72
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2128228
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
renamed NVGPU_LS_PMU to NVGPU_FEATURE_LS_PMU to follow
nvgpu naming standard
Compile out LS PMU files when PMU RTOS support is
disabled for safety build by setting NVGPU_LS_PMU
build flag to 0
JIRA NVGPU-3418
Change-Id: Ib09924ac25657e932723c10be573f2f701cb7bea
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2127794
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move chip specific recovery code for volta onwards
architecture to hal/rc/rc_gv11b.c
Rename
fifo.teardown_ch_tsg -> fifo.recover
gk20a_runlist_update_locked -> nvgpu_runlist_update_locked
Remove
Unused h/w headers from fifo_gv11b.c
Use local variable f instead of g->fifo
JIRA NVGPU-1314
Change-Id: Ia535bbe4780e7241fdd911a8f577c6b98cf0fe53
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2102897
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>