For some reason the GPU does not like the mappings created by the
DMA API for coherent sysmem buffers. But a plain vmap() does seem
to work. To work around this, when we are using coherent sysmem,
force the NO_KERNEL_MAPPING flag to on and then make a vmap() in
the nvgpu DMA API wrapper. The rest of the driver will be none the
wiser but will work as expected.
This problem is not understood yet but it is being tracked in bug
2040115. Once this bug is understood this WAR should either be
determined as necessary or reverted with an appropriate fix.
Bug 2040115
JIRA EVLR-2333
Change-Id: Idae7a0c92441f0309df572ac18697af49bb6ff2b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1657568
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When using a coherent DMA API wee must make sure to program
any aperture fields with the coherent aperture setting. To
do this the nvgpu_aperture_mask() function was modified to
take a third aperture mask argument, a coherent setting, so
that code can use this function to generate coherent aperture
settings.
The aperture choice is some what tricky: the default version
of this function uses the state of the DMA API to determine
what aperture to use for SYSMEM: either coherent or
non-coherent internally. Thus a kernel user need only specify
the normal nvgpu_mem struct and the correct mask should be
chosen. Due to many uses of nvgpu_mem structs not created
directly from the DMA API wrapper it's easier to translate
SYSMEM to SYSMEM_COH after creation.
However, the GMMU mapping code, will encounter buffers from
userspace with difference coerency attributes than the DMA
API. Thus the __nvgpu_aperture_mask() really respects the
aperture setting passed in regardless of the DMA API state.
This aperture setting is pulled from NVGPU_VM_MAP_IO_COHERENT
since this is either passed in from userspace or set by the
kernel when using coherent DMA. The aperture field in attrs
is upgraded to coh if this flag is set.
This change also adds a coherent sysmem mask everywhere that
it can. There's a couple places that do not have a coherent
register field defined yet. These need to eventually be
defined and added.
Lastly the aperture mask code has been mvoed from the Linux
vm.c code to the general vm.c code since this function has
no Linux dependencies.
Note: depends on https://git-master.nvidia.com/r/1664536 for
new register fields.
JIRA EVLR-2333
Change-Id: I4b347911ecb7c511738563fe6c34d0e6aa380d71
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1655220
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Miscellaneous fixes for the sched code:
1. Make sure get_addr() on an SGL respects the use phys flag since
the runlist needs physical addresses when NVLINK is in use.
2. Ensure the runlist is contiguous. Since the runlist memory is
not virtually addressed the buffer must be physically contiguous.
3. Use all 64 bits of the runlist address in the runlist base addr
register (and related fields).
JIRA EVLR-2333
Change-Id: Id4fd5ba4665d3e35ff1d6ca78dea6b58894a9a9a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1654667
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Tested-by: Thomas Fleury <tfleury@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch does a couple of things. First it renames
NVGPU_DMA_COHERENT to NVGPU_USE_COHERENT_SYSMEM since the former
is somewhat ambiguous in meaning. The latter clearly states what
must happen: nvgpu needs to treat sysmem as coherent. This flag
does simply follow the state of the DMA API but there's no reason
to expect a casual reader of the code to know that when the DMA
API is coherent nvgpu must treat sysmem as coherent.
One thing to note though: when the dGPU is using PCIe and the
PCIe controller is coherent, it doesn't actually matter what we
do. However, we use this flag for determining how to make CPU
mappings in nvgpu_mem_begin() so this flag is still relevant for
the CPU side of things.
Next this patch adds a check in the core kernel GMMU mapping
routine to make sure that when the NVGPU_USE_COHERENT_SYSMEM flag
is set that the IO coherent flag is passed into the mapping code.
This is the primary fix that made NVLINK start working.
Finally the setting of the USE_COHERENT_SYSMEM flag and the
NVGPU_SUPPORT_IO_COHERENCE flag were set both for PCIe and for
iGPUs. The iGPU also must correctly match it's CPU mappings and
GPU mappings for proper operation.
JIRA EVLR-2333
Change-Id: Icd5f07167c9f48a0a2e8493e34c9cc6238e56907
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1654519
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support for user fence updates i.e. increments added by user space
in pushbuffer directly
Add a submit IOCTL flag NVGPU_SUBMIT_GPFIFO_FLAGS_USER_FENCE_UPDATE to indicate
if User has added increments in pushbuffer
If yes, number_of_increment value is received in fence.value from User
If User is adding increments in the pushbuffer then we don't need to do any job
tracking in the kernel
So fail the submit if we evaluate need_job_tracking to true and
FLAGS_USER_FENCE_UPDATE is set
User is responsible for ensuring all pre-requisites for a fast submit and to
prevent kernel job tracking
Since user space adds increments in the pushbuffer, just handle the threshold
book keeping in kernel.
Bug 200326065
Jira NVGPU-179
Change-Id: Ic0f0b1aa69e3389a4c3305fb6a559c5113719e0f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1661854
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new user API NVGPU_IOCTL_CHANNEL_GET_USER_SYNCPOINT which will expose
per-channel allocated syncpoint to user space
API will also return current value of the syncpoint
On supported platforms, this API will also return a RW semaphore address
(corresponding to syncpoint shim) to user space
Add new characteristics flag NVGPU_GPU_FLAGS_SUPPORT_USER_SYNCPOINT to indicate
support for this new API
Add new flag NVGPU_SUPPORT_USER_SYNCPOINT for use of core driver
Set this flag for GV11B and GP10B for now
Add a new API (*syncpt_address) in struct gk20a_channel_sync to get GPU_VA
address of a syncpoint
Add new API nvgpu_nvhost_syncpt_read_maxval() which will read and return MAX
value of syncpoint
Bug 200326065
Jira NVGPU-179
Change-Id: I9da6f17b85996f4fc6731c0bf94fca6f3181c3e0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1658009
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The following changes implements the initial (as per bringup) nvlink driver.
(1) SW initialization of nvlink core driver structures
(2) Nvlink interrupt handling
(3) Device initialization (IOCTRL, pll and clocks, device level intr)
(4) Falcon support for minion
(5) Minion load and bootstrapping
(6) Link initialization and DL PROD settings
(7) Device Interface init (and switching HSHUB to nvlink)
(8) HS set/get mode for both link and sublink
(9) Topology discovery and VBIOS settings.
(10) Ensures we get physical contiguous memory when Nvlink is enabled
This driver includes a hack for the current single dev/single link limitation.
JIRA: EVLR-2331
JIRA: EVLR-2330
JIRA: EVLR-2329
JIRA: EVLR-2328
Change-Id: Idca9a819179376cc655784482b24b575a52fa9e5
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1656790
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Until issue related to low frequencies root caused,
limit min frequency to known safe value: 216.75Mhz.
This change needs to be reverted, once orginal issue
root-caused and fixed.
Bug 2051863
Bug 2056266
Change-Id: If6e56f59ee5fa06967fde1128b58a7fc97be74e9
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1657595
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
On GV11B, CBC base is calculated in similar fashion than it's
calculated on dGPUs. Thus, remove gv11b_ltc_cbc_fix_config() as it
would incorrectly multiply the CBC base by the LTC count.
Bug 2054860
Change-Id: Iaed717161547468c17e12236149d970c497885b3
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1654506
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add User space API NVGPU_AS_IOCTL_GET_SYNC_RO_MAP to get read-only syncpoint
address map in user space
We already map whole syncpoint shim to each address space with base address
being vm->syncpt_ro_map_gpu_va
This new API exposes this base GPU_VA address of syncpoint map, and unit size
of each syncpoint to user space.
User space can then calculate address of each syncpoint as
syncpoint_address = base_gpu_va + (syncpoint_id * syncpoint_unit_size)
Note that this syncpoint address is read_only, and should be only used for
inserting semaphore acquires.
Adding semaphore release with this address would result in MMU_FAULT
Define new HAL g->ops.fifo.get_sync_ro_map and set this for all GPUs supported
on Xavier SoC
Bug 200327559
Change-Id: Ica0db48fc28fdd0ff2a5eb09574dac843dc5e4fd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1649365
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add characteristic flag NVGPU_GPU_FLAGS_SUPPORT_SYNCPOINT_ADDRESS to indicate if
platform supports semaphore GPU_VA address for a syncpoint
Define NVGPU_SUPPORT_SYNCPOINT_ADDRESS for core driver book keeping
Set this flag for both GV100 and GV11B since Xavier SoC supports a semaphore
GPU_VA address for a syncpoint through syncpoint SHIM
Bug 200327559
Change-Id: I1f31673c9fd59f493d0b35a80d23151fc063ae06
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1649364
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The GPU has multiple different operating modes in respect to IOMMU'ability.
As such there needs to be a clean way to tell the driver whether it is
IOMMU'able or not. This state also does not always reflect what is possible:
all becasue the GPU can generate IOMMU'ed memory requests doesn't mean it
wants to.
The nvgpu_iommuable() API has now existed for a little while which is a
useful way to convey whether nvgpu should consider the GPU as IOMMU'able.
However, there is also the g->mm.bypass_smmu flag which used to be able to
override what the GPU decided it should do. Typically it was assigned
the same value as nvgpu_iommuable() but that was not necessarily a
requirment.
This patch removes all the usages of g->mm.bypass_smmu and instead uses the
nvgpu_iommuable() function. All places where the check against
g->mm.bypass_smmu have been replaced with nvgpu_iommuable(). The code
should now be much cleaner.
Subsequently other checks can also be placed in the nvgpu_iommuable()
function. For example, when NVLINK comes online and the GPU should no
longer consider DMA addresses and instead use scatter-gather lists
directly the ngpu_iommuable() function will be able to check the state of
NVLINK and then act accordingly.
Change-Id: I0da6262386de15709decac89d63d3eecfec20cd7
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1648332
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvlink core library no longer exposes the set_init_state()
interface as it wishes to block init_state changes from
endpoint drivers.
Now, the core driver is responsible for initializing init_state
variables using set_init_state() interface. Hence, we remove
this redundant code.
Change-Id: I81c4922cf48f7918e69795579b39b7fa0c299644
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1646437
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Set the DMA mask to 34 bits so that large DMA allocs can be done.
Currently the DMA mask is left unset which limits the size of the
maximum DMA allocation to 32 bits.
The 34 bit mask was chosen because it works for all chips (even
gm20b supports 34 bit physical addresses). However, newer chips
could use larger masks in the future if they desire.
Bug 200377221
Change-Id: Iaa0543f77ff4e2bd6616f38e4464240375bb37b6
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1641762
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
After moving devfreq enable to end of finalize power on,
intermittent issues related to gpu booting with devfreq
enabled are fixed.
Enabled devfreq for gv11b by enabling ""nvhost_podgov"
governor in platform data.
Reused scaling functions from gp10b/gk20a.
Removed emc floor on railgate for power saving.
Added max emc frequency as floor in rail-ungate for
faster gpu boot.
Bug 2049965
Bug 2039013
Bug 200377508
Change-Id: Ia1dec278b663b9f7ed859dd953a60f3eae7ef9a0
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1644702
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
VM and CDE code assumes that dma_buf_attachment is stored as a pointer
in the private dma_buf_drvdata, so it is not tracked. In Linux trees
without dma_buf_*_drvdata() support this is not true, so change the
code to explicitly track dma_buf_attachment.
JIRA NVGPU-4
Change-Id: I692f05a19a6469195d5444a7e5ff6e92f77ae272
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1648004
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Enabling gpu scaling driver after finalize poweron,
will make gpu booting happen at initially set
frequency(1GHz).
Also doing platform specific init scale after enabling
scaling driver.
Bug 2049965
Bug 2039013
Bug 200377508
Change-Id: I633f8f5a25d9de18cbb3a022913b8b725ccd87e5
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1644703
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The message to tell RM server to unbind channel has to be sent after
client unbinds the channel and before client calls tsg release. The
channel has to belong to a tsg on RM server before client submit a
runlist to remove the channel. Or there's a bare channel problem.
By moving .tsg_unbind_channl one layer lower, gk20a_tsg_unbind_channel()
will be common functions for all chip, and it'll call tsg release after
call .tsg_unbind_channel. So vgpu won't need to worry about tsg was
released before sending msg to RM server.
Bug 200382695
Bug 200382785
Change-Id: I32acc122f3f9d5d0628049ccf673225f9e90c87a
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1645383
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Patch 7240b3c2 enabled secure allocation for gv11b
But since we allocate secure buffers in poweron path, and secure allocation
needs GPU to be in off state, this results in deadlock in poweron path
To solve this, we already cause early VPR resize for older chips by calling
gk20a_tegra_secure_page_alloc() from late_probe
Implement same for gv11b.
Add late_probe callback and add a call to gk20a_tegra_secure_page_alloc()
Bug 2038249
Change-Id: I8c17b069962b26edbd0639a7c0d6c2fdaa352935
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1648831
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seema Khowala <seemaj@nvidia.com>