nvgpu_mem_rd*() functions were implemented per OS. They also used
nvgpu_pramin_access_batched() and implemented a big portion of logic
for using PRAMIN in OS specific code.
Make the implementation for the functions generic. Move all PRAMIN
logic to PRAMIN and simplify the interface provided by PRAMIN.
Change-Id: I1acb9e8d7d424325dc73314d5738cb2c9ebf7692
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1753708
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When requesting a gpc2clck below lowest V/f point, clock arbiter
did not properly adjust target value to nearest V/f point. This
could lead to lower than expected effective frequency. Fixed the
logic to adjust to nearest V/f point.
Bug 200412996
Change-Id: I36c24b4c081931e2ac54da14d49e46fcb14503e3
(cherry picked from commit 7ed1f8fb39f76208922daa91d00905cdb96b2304)
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1763641
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
the logic used for selecting frequencies from achievable frequencies
of the GPU clk is selecting one in a set of 8 frequencies.
This reduces the number of available frequencies when the number of
achievable frequencies is small. Change this implementation to choose
all frequencies when the achievable frequency list is small.
Bug 200381453
Change-Id: Ib280d7ccf9b75f88f6c7c6d2666f05e92a0343bd
Signed-off-by: Vishruth <vishruthj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1753289
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvgpu_mem_get_addr() gets virtual/phys address depending on the platform.
But we need to explicitly use physical addresses to configure PCI simulation
support since simulator expects physical address only
Hence use nvgpu_mem_get_phys_addr() explicitly to configure msg/send/recv
buffers needed for pci simulation support
Jira NVGPUT-41
Change-Id: I6870feef35fe81d43189fa048dc2f7052926bcc4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1756843
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Now that GR buffers always have a kernel mapping, remove the unnecessary
calls to nvgpu_mem_begin() and nvgpu_mem_end() on these buffers:
- global ctx buffer mem in gr
- gr ctx mem in a tsg
- patch ctx mem in a gr ctx
- pm ctx mem in a gr ctx
- ctx_header mem in a channel (subctx header)
Change-Id: Id2a8ad108aef8db8b16dce5bae8003bbcd3b23e4
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1760599
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support for `rfr' to send emails directly to the nvgpu-core mailing
list. Copying the email contents and then manually updating it is
annoying. This streamlines that process greatly!
Since most people always write something extra past what the `rfr'
script generates you can edit the message before it sends. If you don't
specify a message explicitly with '--msg' then an editor based on your
setting of $EDITOR is opened up with the contents of the email. What
you edit into there will be sent verbatim. If for some reason you want
nothing to be added there's the '--no-msg' option.
Change-Id: Icb0ccba936357cf5e3a93001a5c22aeacf906e11
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1595544
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Tested-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
To finish OS unification of the submit path, move the
gk20a_submit_channel_gpfifo* functions to a file that's accessible also
outside Linux code.
Also change the prefix of the submit functions from gk20a_ to nvgpu_.
Jira NVGPU-705
Change-Id: I8ca355d1eb69771fb016c7a21fc7f102ca7967d7
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1760421
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove Gr engine reset during ELPG entry.
Engine reset is causing clock gating logic to get reset thus
clock gating gets disabled during ELPG entry sequence.
It leads to higher power numbers observed at light graphics.
Removing GR reset during ELPG entry helped save power.
Bug 2180198
Change-Id: I957951eb93f9d044f4d9a908f2b56a4903dfbfad
Signed-off-by: Deepak Goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1757695
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Separate HAL added in gv11b and gv100 for
init_gpc_mmu function.
In gv11b HAL, RMW is enabled for gpu atomics
as default.
In gv100 HAL, GPC atomic capability mode will
get set based on the FB MMU capability.
If GPU is connected through NVLINK then mmu
will be set to RMW mode, else it will be in
L2 mode.
Bug 200390336
Change-Id: I224934f83d1762ec864ef8da7265dd01d86893a0
Signed-off-by: Ashish Srivastava <assrivastava@nvidia.com>
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1735137
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
membar.sys does synchronization with the whole system (GPU and CPU),
membar.gl does synchronization within the GPU.
In gv11b, fb flush is generating membar.gl instead of membar.sys, which
is an issue. To fix this issue. following WAR is used:
1. Use bar1 engine id and bind it to a particular pdb,
2. Then instead of a fb_flush, issue a tlb invalidate of the bar1 pdb.
Now allocation of vm for bar1 instance block and bar1 binding is done
without check for bar1 support. Only bar1 register mapping is done
based on bar1 support enabled.
Bug 2112790
Change-Id: I76f43f1178a68f10823d48bc9da55d2bd686dd52
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1750257
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Add support for aborting runlist/s. Aborting runlist/s,
will abort all active tsgs and associated active channels
within these active tsgs
-Bare channels are no longer supported. Remove recovery
support for bare channels. In case there are bare
channels, recovery will trigger runlist abort
Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596
Change-Id: I6bec8a0004508cf65ea128bf641a26bf4c2f236d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1640567
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Even though tsg preempt timed out, teardown sequence would by
design require s/w to issue another preempt.
If recovery includes an ENGINE_RESET, to not have race conditions,
use RUNLIST_PREEMPT to kick all work off, and cancel any context
load which may be pending. This is also needed to make sure
that all PBDMAs serving the engine are not loaded when engine is
reset
-Add max retries for pre-si platforms for runlist preempt done
polling loop
Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596
Change-Id: If9d1731fc17e7e7281b24a696ea0917cd269498c
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1709902
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-is_preempt_pending hal does not need timeout_rc_type input param as
for volta, reset_eng_bitmask is saved if preempt times out. For
legacy chips, recovery triggers mmu fault and mmu fault handler
takes care of resetting engines.
-For volta, no special input param needed to differentiate between
preempt polling during normal scenario and preempt polling during
recovery. Recovery path uses preempt_ch_tsg hal to issue preempt.
This hal does not issue recovery if preempt times out.
Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596
Change-Id: Ie76a18ae0be880cfbeee615859a08179fb974fa8
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1709799
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For Si platforms, gk20a_get_gr_idle_timeout returns
3000 ms i.e. 3 sec. Currently this time is used for
polling each pbdma, eng and runlist and this conflicts
channel timeout if preempt fails.
Use 1000 ms timeout for polling preempt timeout when
timeouts are enabled else use gk20a_get_gr_idle_timeout.
For non-si platforms, polling loop is depending on
max number of retries.
Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596
Change-Id: Icdb69b7b7d17292f2b6a43f1d8e9d75ff545d0ae
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1739543
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-During polling eng preempt done, reset eng only if eng stall
intr is pending. Also stop polling for eng preempt done
if eng intr is pending.
-Add max retries for pre-si platforms for poll pbdma and eng
preempt done polling loops.
Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596
Change-Id: I66b07be9647f141bd03801f83e3cda797e88272f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1694137
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- This patch sets required fecs trace functions to hals.
- In gv100, GPU VA needs to programmed in ucode to
get FECS trace. Hence, setting this "NVGPU_FECS_TRACE_VA"
characteristic to be true.
EVLR-2708
Change-Id: I41845a1db8452b8e3fa9d67ecc4cabfac503fd0b
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1747269
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
ACR allocates buffers from vidmem and passes explicitly flag
NVGPU_DMA_NO_KERNEL_MAPPING. Vidmem buffers are never mapped to CPU,
and dGPU ACR does not actually care, so switch to using
nvgpu_alloc_vid_at(). This removes another explicit dependency
to NVGPU_DMA_NO_KERNEL_MAPPING, which is going to be removed.
Change-Id: I532d40c5ba9e71f07461c526bd2a43e1eb01a290
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1753711
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update the Linux specific code to match the MM API docs in the
previous patch. The user passed page size is plumbed through
the Linux VM mapping calls but is ultimately ignored once the
core VM code is called. This will be handled in the next
patch.
This also adds some code to make the CDE page size picking
happen semi-intelligently. In many cases the CDE buffers can
be mapped with large pages.
Bug 2011640
Change-Id: I20e78e7d5a841e410864b474179e71da1c2482f4
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1740610
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Provide concrete documentation for the VM mapping API. This shall be
the gold standard!
One of the major goals of this API is to keep the design simple. There
are features that could be implemented in the kernel but were explictly
chosen not to be. Example: partial mappings. The less work the kernel
does the easier it is to verify against the API specifications. Security
follows as well: the more simple the API the easier it is to ensure
there are no easily exploitable corner cases.
Bug 2011640
Change-Id: Id91956000fbe67f2309a1d46027aa7f5384dffd3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1589617
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When generating the PTE size for a given mapping the code must
consider whether the GPU is being IOMMU'ed. The presence and
usage of an IOMMU implies the buffers will appear contiguous
to the GPU. Without an IOMMU we cannot assume that and therefor
must use small pages regardless of the size of the buffer to
be mapped.
Bug 2011640
Change-Id: I6c64cbcd8844a7ed855116754b795d949a3003af
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1697891
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>