Aggressive sync destroy is used on some platforms where the amount of
syncpoints is limited. It can cause sync objects to get allocated and
freed in the submit path and when jobs are cleaned up, so require
deferred cleanup. Allocations do not belong to job tracking in a
deterministic submit path.
Although this has been technically allowed before, deterministic
channels have likely not been a priority on those old platforms with
aggressive sync destroy set.
Update virtualized gp10b platform data to match on a gp10b-vgpu compat
string instead of gk20a-vgpu. gk20a (Tegra T124) hasn't been supported
for a long time. Delete the aggressive sync destroy field from this
platform. It's got enough syncpoints to not dynamically allocate them;
having this property set for gp10b-vgpu has likely been a mistake.
This is not a completely pure cherry-pick: also extend the gpu
characteristics to not advertise full deterministic submit support when
aggressive sync destroy is off. This platform flag cannot be adjusted by
the user unlike many other flags.
Jira NVGPU-4548
Change-Id: I283f546d48b79ac94b943d88e5dce55710858330
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322042
(cherry picked from commit b1ba2b997b2174e365bcb0782ef3e67260ff9e57)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328411
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, for platforms with canRailgate device characteristics disabled,
suspend can block as deterministic channels hold busy references. This
patch makes the change to first hold off any new jobs for deterministic
channels and then reverts back the busy references taken by those
channels. Following this, suspend also waits for the device to get idle
by waiting (with timeout) for the nvgpu's internal usage counter to be
come zero. This ensures there are no further jobs in progress and
allows the system to go into a suspend state.
Bug 200598228
Bug 2930266
Change-Id: Id02b4d41a9c2dd64303b2e2449dbed48c12aea4c
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328489
(cherry picked from commit 9d1e07ca18)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2330159
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
dma_buf_kmap was introduced a decade ago to map a dma_buf partially
by the input number of pages, when 32-bit was fairly common. It was
added to not exhaust vmalloc space. Starting from kernel 5.6, it is
deprecated as vmap calls should succeed with larger available
vmalloc space.
Use dma_buf_vmap/vunmap instead of dma_buf_kmap/kunmap for handling
mapping of notifier memory in gk20a_channel_wait_semaphore.
Also update the debug prints and add speculation barrier to the
start of gk20a_channel_wait.
Bug 2925664
Change-Id: I49078fa81f050a57a5b66a793e62006dd66e3ba3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2326513
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Following commit updated the debug message in the function
nvgpu_vm_find_mapping w.r.t reuse of mapping.
commit 2f00d9adfc4fc91a6b84b14cc513f9b855d39cad
Author: Sagar Kamble <skamble@nvidia.com>
gpu: nvgpu: fix null pointer access in nvgpu_vm_find_mapping
That reuse log is about the mapping and not SGT. Fix the log
and add details about different handling of SGT for dmabuf
drvdata cases in the comment.
Bug 2834141
Change-Id: I3630de1c45a2bf55ff18bdb426f0597efe83f72c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328427
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Guard nvgpu_os_fence_syncpt_fdget() with an nvgpu_has_syncpoints()
check. Even when CONFIG_TEGRA_GK20A_NVHOST is set, the platform data bit
can be disabled independently; on Linux we have a runtime flag to
disable them, too. If nvgpu doesn't have syncpt support, don't try
reading syncpt-based sync files.
If a sema-only-backed channel sync is given a syncpoint-based prefence
fd, we can't wait for it with the current design that couples waits and
increments in one interface. This should eventually be fixed, but for
now the extra check at least guards another interesting case.
A sync file with a zero fence count can be trivially accepted as either
a valid syncpoint fence or a sema fence. If only semas are supported,
and the syncpt check that happens first would turn the empty fd into a
syncpt-based sync fence, the sema wait layer would wrongly reject it.
Jira NVGPU-4548
Change-Id: Ib40c2d9a6a25812c5e24eef52c1d1a4f81eeed83
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325733
(cherry picked from commit 877f99d7c9977dfea14480a1b0488c990b813d1d)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2326044
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvhost_syncpt_read_minval() only reads the min value that nvhost has
cached. It makes sense for host managed syncpoints, but the user
syncpoint that needs set_safe_state is client managed; its min and max
values are not tracked internally. Use nvhost_syncpt_read_ext_check() to
read the actual syncpoint value from HW and set the "safe state" (65536
increments) based on that.
The safe state is analogous to "set min equal to max" when max is
expected to be no more than the current value plus a big number. Using
the cached min value would make this safe state lose its meaning when
there could have been more than the big number of increments since the
syncpoint was allocated.
Jira NVGPU-4548
Change-Id: I395be75f1696e48e344f5503420864efeb3621de
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2323060
(cherry picked from commit ae571178ca63c4fa3e6bf70a4da0221393e975ee)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2326380
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Refactor user managed syncpoints out of the channel sync infrastructure
that deals with jobs submitted via the kernel api. The user syncpt only
needs to expose the id and gpu address of the reserved syncpoint. None
of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been
ideal to couple with the user-allocated syncpts.
With user syncpts now provided by channel_user_syncpt, remove the
user_managed flag from the kernel sync api.
This allows moving all the kernel submit sync code to be conditionally
compiled in only when needed, and separates the user sync functionality
in a more clear way from the rest with a minimal API.
[this is squashed with commit 5111caea601a (gpu: nvgpu: guard user
syncpt with nvhost config) from
https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009]
Jira NVGPU-4548
Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768
(cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
If the os fence is the only kind that's supported, fail a submit if the
user wants fences but doesn't explicitly request sync fences, expecting
syncpoints. Syncpoint support is advertised to userspace in the gpu
characteristics, so userspace already has the knowledge to request the
correct sync type.
Do this check at the ioctl level. The in-kernel stuff that needs submits
(cde, copyengine) can work without syncpoints and sync fences are used
only in userspace.
Fail a submit also if CONFIG_SYNC is not set and sync fences are
requested. Lack of kernel support doesn't guarantee that userspace would
still wrongly want that.
Clarify the deferred cleanup requirements. The sync framework is needed
only for post sync fences, but deferred cleanup is still always needed
with semaphores because the internal tracking is done with dynamically
allocated (although small) objects.
Jira NVGPU-4548
Change-Id: I2e5a6554930cb413b2bb46ddfe388e41390bc7e4
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321715
(cherry picked from commit d870956170906eae1088846ec05266c859669771)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318157
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
g->ops.channel.set_syncpt() updates the inst block's allowed syncpt
field for the syncpoint that's allocated for kernelmode submits (if
any). This is different from the userspace-owned syncpoint that is used
with usermode submits, so don't call set_syncpt when creating the user
syncpt.
If the kernel sync object wouldn't exist, the call would even fail. The
allowed syncpt is written during nvgpu_channel_setup_bind() (if usermode
submit isn't requested), after the kernel syncpt is allocated.
Originally, just one syncpoint used to be shared for the kernel and
userspace; now that they're separate, this different path to update the
allowed syncpt would need other modifications to work with the user sync
if one would be used on platforms that care about the allowed syncpt.
Jira NVGPU-4548
Change-Id: If800e704ef8a3289e04c8d75a623affd6779e309
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2320357
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
1) added support for fuse clock directly from nvgpu linux build for GV11B
and GP10B.
2) added a common function platform_acquire_clock thats used by both
GP10B, GV11B to acquire their respective clocks.
3) remove use of tegra_fuse_clock_enable/disable APIs
The clock parsing code is changed to verify the clock-names obtained
via DT with the static clock-names in the platform code before proceeding
with clk_enable.
Bug 2887230
Change-Id: I177cde9c4bf1a8be6f3437f36e1c6f75cd9c9279
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2307136
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
nvgpu_has_syncpoints is more general than a channel synchronization
related, so move it to nvhost.c from channel_sync.c. Move the
declaration from gk20a.h to nvhost.h.
As the debugfs knob is Linux related, move it from struct gk20a to
struct nvgpu_os_linux.
Jira NVGPU-4548
Change-Id: I4236086744993c3daac042f164de30939c01ee77
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318814
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add the following changes to build dgpu sources for kernels after 4.14:
1. tegra_alloc_fd is defined by downstream tegra mc driver built with
the flag CONFIG_NV_TEGRA_MC, hence make the nvgpu config
NVGPU_USE_TEGRA_ALLOC_FD dependent on it.
2. dma_buf_ops.map_atomic is removed in kernel version 4.17.
3. gk20a_vidbuf_ops.set_drvdata and .get_drvdata are to be set
under the config CONFIG_NVGPU_DMABUF_HAS_DRVDATA
Bug 2834141
Change-Id: If2351beddfd6f22a1a1da4499cacf6a1880ede76
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2316486
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Some of the APIs that access debugger register are not protected
from ELPG. This might trigger PRI access timeouts for corresponding
registers if GR engine is power gated.
Add nvgpu_pg_elpg_protected_call() to protect against ELPG.
Bug 2820066
Change-Id: I467ea28aaea1c0e36c2d6aabce6a2daea6ee9911
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2306383
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Linux kernel has a default 32-bit segmentation boundary for
any device that doesn't explicitly configure it. When nvgpu
tries to allocate a larger memory > 4GB, iommu_dma_map_sg()
function in the kernel will take this boundary into account
and add an internal padding to the allocated IOVA space:
|<---IOVA space 1--->|<---padding--->|<---IOVA space 2--->|
When DMA reads/writes the memory using this discountinued
IOVA space, it may end up with accessing the padding part,
instead of the IOVA space 2.
So this patch adds dma_set_seg_boundary() to nvgpu driver,
by maximizing the segmentation boundary up to DMA_BIT_MASK
to ensure a continued IOVA space.
Bug 200558567
Change-Id: I979d56681dddca56f1b02fce83dc81147a6b0d82
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2304150
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Pritesh Raithatha <praithatha@nvidia.com>
Reviewed-by: Puneet Saxena <puneets@nvidia.com>
Reviewed-by: Chris Dragan <kdragan@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Added dependency between the Kconfig options as follows where
'->' indicates 'depends on' relation:
SUPPORT_CDE -> COMPRESSION -> DMABUF_HAS_DRVDATA
DGPU -> GK20A_PCI
Defined Kconfig option for VPR and for DGPU that is dependent GK20A_PCI
as well. DGPU related sources are now compiled under config flag DGPU.
Also update conditional compilation of the driver paths w.r.t DGPU,
VPR and COMPRESSION flags.
Bug 2834141
Change-Id: Ia0a39d6d4cf8b36e7f955b7355a5ab41783f821c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2299627
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In the upstream kernel ACCESS_ONCE is now deprecated with reason as
given in the following related commit:
commit 381f20fceba8e ("security: use READ_ONCE instead of deprecated
ACCESS_ONCE")
ACCESS_ONCE() does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step.
Replace usages of ACCESS_ONCE with READ_ONCE and WRITE_ONCE in nvgpu.
Bug 2834141
Change-Id: I9904c49e1a4d7b17ed2fe54360051d08595a2982
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2294096
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Current clk unit has multiple header files under include folder.
This has combination of public struct which is accessed outside the
unit and private struct which is accessed within clk unit.
This patch segregates them based on their accessibility.
All private items are moved into ucode_clk_inf.h from include which only
clk can access.
All public items are moved into include/clk.h which other units can
access and removed the clk_xxx.h files
NVGPU-4689
Change-Id: I469270ae539e09a3f6fe6187207791732407863e
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2298220
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This is fixing the following misra violation
MISRA 5.1 :
Declaration with identical names.
The first 31 characters of identifiers
"nvgpu_nvhost_syncpt_unit_interface_get_aperture" and
"nvgpu_nvhost_syncpt_unit_interface_get_byte_offset" are identical.
JIRA NVGPU-4811
Change-Id: Ib862c4acd53cf748b47c1edffa91b5f033c08953
Signed-off-by: Dinesh <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2298136
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Current clk unit has multiple header files under pmuif folder.
This has combination of public struct which is accessed outside the
unit and private struct which is accessed within clk unit.
This patch segregates them based on their accessibility.
All private items are moved into ucode_clk_inf.h from pmuif which only
clk can access.
All public items are moved into include/clk.h which other units can
access
This will help in documentation of items for public items.
NVGPU-4491
Change-Id: Iccb0571e05ecb3cb13363390bed8c7214409b543
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2292318
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>