dma_buf_kmap was introduced a decade ago to map a dma_buf partially
by the input number of pages, when 32-bit was fairly common. It was
added to not exhaust vmalloc space. Starting from kernel 5.6, it is
deprecated as vmap calls should succeed with larger available
vmalloc space.
Use dma_buf_vmap/vunmap instead of dma_buf_kmap/kunmap for handling
mapping of notifier memory in gk20a_channel_wait_semaphore.
Also update the debug prints and add speculation barrier to the
start of gk20a_channel_wait.
Bug 2925664
Change-Id: I49078fa81f050a57a5b66a793e62006dd66e3ba3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2326513
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Refactor user managed syncpoints out of the channel sync infrastructure
that deals with jobs submitted via the kernel api. The user syncpt only
needs to expose the id and gpu address of the reserved syncpoint. None
of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been
ideal to couple with the user-allocated syncpts.
With user syncpts now provided by channel_user_syncpt, remove the
user_managed flag from the kernel sync api.
This allows moving all the kernel submit sync code to be conditionally
compiled in only when needed, and separates the user sync functionality
in a more clear way from the rest with a minimal API.
[this is squashed with commit 5111caea601a (gpu: nvgpu: guard user
syncpt with nvhost config) from
https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009]
Jira NVGPU-4548
Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768
(cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
If the os fence is the only kind that's supported, fail a submit if the
user wants fences but doesn't explicitly request sync fences, expecting
syncpoints. Syncpoint support is advertised to userspace in the gpu
characteristics, so userspace already has the knowledge to request the
correct sync type.
Do this check at the ioctl level. The in-kernel stuff that needs submits
(cde, copyengine) can work without syncpoints and sync fences are used
only in userspace.
Fail a submit also if CONFIG_SYNC is not set and sync fences are
requested. Lack of kernel support doesn't guarantee that userspace would
still wrongly want that.
Clarify the deferred cleanup requirements. The sync framework is needed
only for post sync fences, but deferred cleanup is still always needed
with semaphores because the internal tracking is done with dynamically
allocated (although small) objects.
Jira NVGPU-4548
Change-Id: I2e5a6554930cb413b2bb46ddfe388e41390bc7e4
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321715
(cherry picked from commit d870956170906eae1088846ec05266c859669771)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318157
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
g->ops.channel.set_syncpt() updates the inst block's allowed syncpt
field for the syncpoint that's allocated for kernelmode submits (if
any). This is different from the userspace-owned syncpoint that is used
with usermode submits, so don't call set_syncpt when creating the user
syncpt.
If the kernel sync object wouldn't exist, the call would even fail. The
allowed syncpt is written during nvgpu_channel_setup_bind() (if usermode
submit isn't requested), after the kernel syncpt is allocated.
Originally, just one syncpoint used to be shared for the kernel and
userspace; now that they're separate, this different path to update the
allowed syncpt would need other modifications to work with the user sync
if one would be used on platforms that care about the allowed syncpt.
Jira NVGPU-4548
Change-Id: If800e704ef8a3289e04c8d75a623affd6779e309
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2320357
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- "include/trace/events/gk20a.h" file was having GPL2 license
(which should not used for QNX code). This file was used for
compiling linux userspace driver("libnvgpu-drv.so") and was used for
unit testing on QNX.
- This patch removes stubs in "include/trace/events/gk20a.h" file.
(which were used for linux userspace driver.)
- For QNX driver, "nvgpu_rmos/trace/events/gk20a.h" was used.
This patch moves that file to "include/nvgpu/posix/trace_gk20a.h" and
does relevant license change. This same file will be used for linux
userspace driver.
- This patch also creates a new file "include/nvgpu/trace.h" which
selects proper trace file depending on the config.
Bug 2802414
Change-Id: Icdfb251e5698073f986753a969e804161af3ecc5
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286388
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute
on different hardware resources.
This feature requires a change in software to make it possible
to modify the virtual SM id to TPC mapping across mission and
redundant contexts.
This CL adds only SM diversity flags which are exposed to
its clients through ioctl/devctl interfaces.
Actual virtual SM id to TPC mapping implementation
will be part of upcoming patch sets.
Added NvGpu CFLAGS to identify the safety build
"CONFIG_NVGPU_BUILD_CONFIGURATION_IS_SAFETY"
JIRA NVGPU-4133
Change-Id: I5a18256780e6726e399e39c1c8d155d2ef07d7bd
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2250461
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Renamed gk20a_channel_* APIs to nvgpu_channel_* APIs.
Removed unused channel API int gk20a_wait_channel_idle
Renamed nvgpu_channel_free_usermode_buffers in os/linux-channel.c to
nvgpu_os_channel_free_usermode_buffers to avoid conflicts with the API
with the same name in channel unit.
Jira NVGPU-3248
Change-Id: I21379bd79e64da7e987ddaf5d19ff3804348acca
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2121902
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
cyclestats_snapshot data and lock is right now stored in struct nvgpu_gr
Use case itself is not specific to GR engine but in general it applies
to other units outside of GR too.
Hence it makes sense to move both data and lock to struct gk20a instead
of keeping them in struct nvgpu_gr
Update all cyclestats_snapshot code to refer data/lock from struct gk20a
Remove gr_priv.h header include from cyclestats_snapshot.c
Some of the functions were mistakenly declared in gr_gk20a.h.
Move them to cyclestats_snapshot.h and rename them to form nvgpu_css_*()
Jira NVGPU-1103
Change-Id: I3fb32fe96f0ca6613f4640c8bd227b9e0e02dca3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2104848
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move chip specific preempt code to hal/fifo
Move non-chip specific preempt code to common/fifo
Remove fifo.get_preempt_timeout
Rename gk20a_fifo_get_preempt_timeout -> nvgpu_preempt_get_timeout
Rename gk20a_fifo_preempt -> nvgpu_preempt_channel
Add fifo.preempt_trigger hal for issuing preempt
Add fifo.preempt_runlists_for_rc hal for preempting runlists during rc
Add fifo.preempt_poll_pbdma hal
Add nvgpu_preempt_poll_tsg_on_pbdma to be called from rc
JIRA NVGPU-3144
Change-Id: Idb089acaa0c6ca08de17487c3496459a61f0bcd4
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2100819
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Removed unused struct from gr_gk20a.h
Change static allocation for struct gr_gk20a to dynamic type.
Change all the files that being affected by that change.
Call gr allocation from corresponding init_support functions, which
are part of the probe functions.
nvgpu_pci_init_support in pci.c
vgpu_init_support in vgpu_linux.c
gk20a_init_support in module.c
Call gr free before the gk20a free call in nvgpu_free_gk20a.
Rename struct gr_gk20a to struct nvgpu_gr
JIRA NVGPU-3132
Change-Id: Ief5e664521f141c7378c4044ed0df5f03ba06fca
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2095798
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add api nvgpu_gr_setup_set_preemption_mode() in common.gr.setup to
set various preemption modes
Define new hal g->ops.gr.setup.set_preemption_mode() that calls above
common api
Move corresponding code from gr_gp10b.c to common.gr.setup unit
Jira NVGPU-1886
Change-Id: I7cb0187a4809156e5f90f39727a782b17219afa3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2092170
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add below apis in common.gr.setup to allocate/free context
nvgpu_gr_setup_alloc_obj_ctx()
nvgpu_gr_setup_free_gr_ctx()
Define two new hals
g->ops.gr.setup.alloc_obj_ctx()
g->ops.gr.setup.free_gr_ctx()
Move corresponding code from gr_gk20a.c to common.gr.setup unit
Jira NVGPU-1886
Change-Id: Icf170a6ed8979afebcedaa98e3df1483437b427b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2092169
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
force_reset_ch obtains a tsg from a channel first before proceeding
with other work. Thus, force_reset_ch is moved as part of tsg unit to
avoid circular dependency between channel and tsg. TSGs can depend on
channels but channel cannot depend on TSGs.
Jira NVGPU-2978
Change-Id: Ib1879681287971d2a4dbeb26ca852d6b59b50f6a
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2084927
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new unit common.gr.setup that provides runtime setup interfaces to
other units outside of GR unit or to OS-specific code
Move zcull setup call to this unit.
New unit now exposes nvgpu_gr_setup_bind_ctxsw_zcull() to setup zcull
This API internally calls common.gr.zcull API nvgpu_gr_zcull_ctx_setup()
Add new hal g->ops.gr.setup.bind_ctxsw_zcull() and remove
g->ops.gr.zcull.bind_ctxsw_zcull()
Remove nvgpu_channel_gr_zcull_setup() from channel unit
Also remove ctx/subctx header includes sicne channel code need not
configure zcull
Remove gm20b_gr_bind_ctxsw_zcull() since binding is done from common
code
Jira NVGPU-1886
Change-Id: I6f04d19a8b8c003734702c5f6780a03ffc89b717
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2086602
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
On gp10b, ramfc contains information related to syncpoint
protection, which restricts the syncpoint increment operation
to a safe set of syncpoints. This information must be
updated when a syncpoint is assigned to a channel.
Added the following ramfc HALs
- ramfc.get_syncpt
- ramfc.set_syncpt
And replaced
- fifo.resetup_ramfc
With
- channel.set_syncpt
Use new ramfc HALs, move resetup_ramfc implementation
from fifo to common channel code:
- nvgpu_channel_set_syncpt
NVGPU-1750
Change-Id: I036a0b7b2d9fd6ccd9f30094ae33e6c38a96e0cc
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2075938
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
timeout_ms_max is renamed as ctxsw_timeout_max_ms
timeout_debug_dump is renamed as ctxsw_timeout_debug_dump
timeout_accumulated_ms is renamed as ctxsw_timeout_accumulated_ms
timeout_gpfifo_get is renamed as ctxsw_timeout_gpfifo_get
gk20a_channel_update_and_check_timeout is renamed as
nvgpu_channel_update_and_check_ctxsw_timeout
JIRA NVGPU-1312
Change-Id: Ib5c8829c76df95817e9809e451e8c9671faba726
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2076847
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
We have 3 header files for FECS tracing support
include/nvgpu/gr/fecs_trace.h : common header
include/nvgpu/ctxsw_trace.h : header that includes both common and
os-specific functions
os/linux/ctxsw_trace.h : linux specific header
Remove the second header since it is not needed.
Move all structures that are needed in common code to
include/nvgpu/gr/fecs_trace.h
Move all function declarations that are needed in common code to
include/nvgpu/gr/fecs_trace.h
Move all linux specific declarations in os/linux/ctxsw_trace.h and
rename this file as os/linux/fecs_trace_linux.h
Also rename os/linux/ctxsw_trace.c to os/linux/fecs_trace_linux.c
Jira NVGPU-1880
Change-Id: I05cc4489c4b6a64880b7d59c02b22cd2244d5e22
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2070766
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The following changes are done in this patch.
1) gk20a_fifo_get_engine_info() is moved to common/fifo/engine.c
and is renamed to gk20a_fifo_get_active_engine_info() to reflect
accurately the purpose of the function.
2) move the definition of enum fifo_engine to <nvgpu/engines.h> and
add the prefix NVGPU_
3) move the following functions related to engines in fifo_gk20a.c to
common/fifo/engines.c and replace their signature by adding the prefix
nvgpu_engine and removing gk20a_fifo.
gk20a_fifo_get_active_engine_info
gk20a_fifo_engine_enum_from_type
gk20a_fifo_get_engine_ids
gk20a_fifo_is_valid_engine_id
gk20a_fifo_get_gr_engine_id
gk20a_fifo_act_eng_interrupt_mask
gk20a_fifo_engine_interrupt_mask
gk20a_fifo_get_all_ce_engine_reset_mask
Jira NVGPU-1315
Change-Id: I63d9dcd905a0bebcc9a4c65776cf6ec7a0837acf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2011298
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Drop the "runlist_" part in the runlist section of the HAL ops. For
example:
- old: g->ops.runlist.runlist_wait_pending
- new: g->ops.runlist.wait_pending
At the same time, drop the "fifo_" part from the function names. For
example:
- old: gk20a_fifo_runlist_wait_pending
- new: gk20a_runlist_wait_pending
Also rename eng_runlist_base_size to count_max. The size of the
eng_runlist_base register array depicts the maximum possible number of
runlists in the chip for which count_max is more descriptive.
Jira NVGPU-1309
Change-Id: Ie9e94b9f65cd10d3e682d19954f240adb6e311be
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2017403
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split out ops that belong to channel unit to a new section called
channel. Channel is a broad concept; this includes just the code that
accesses channel registers (ccsr_*). This is effectively just renaming;
the implementation still stays put.
The word "channel" is also dropped from certain HAL entries to avoid
redundancy (e.g., channel.disable_channel -> channel.disable).
fifo.get_num_fifos gets an entirely new name: channel.count.
Jira NVGPU-1307
Change-Id: I9a08103e461bf3ddb743aa37ababee3e0c73c861
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2017261
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add NVGPU_DBG_GPU_IOCTL_CYCLE_STATS to debugger node, to
install/uninstall a buffer for cycle stats.
Add NVGPU_DBG_GPU_IOCTL_CYCLE_STATS_SNAPSHOT to debugger
node, to attach/flush/detach a buffer for Mode-E streamout.
Those ioctls will apply to the first channel in the debug session.
Bug 200464613
Jira NVGPU-1442
Change-Id: I0b96d9a07c016690140292fa5886fda545697ee6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2002060
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new unit common/gr/ctx.c to manage GR context
This unit provides interfaces to allocate/free/map/unmap GR context,
patch context, pm context, ctxsw {preempt/spill/betacb/pagepool/rtvcb}
buffers.
It also provides APIs to set size of above buffers
Add new header file include/nvgpu/gr/ctx.h to declare all the interfaces.
Move nvgpu_gr_ctx, patch_desc, pm_ctx_desc, zcull_ctx_desc structures
to this unit
Add new structure nvgpu_gr_ctx_desc to hold context description
parameters. For now we add sizes of all the buffers here.
Add this structure to gr_gk20a for global reference
Remove gr_gp10b_alloc_buffer() since it is no longer used
Rename g->ops.gr.alloc_gfxp_rtv_cb() to g->ops.gr.init_gfxp_rtv_cb()
since this HAL now only sets the size of rtvcb ctxsw buffer
Remove gr->ctx_vars.buffer_size and gr->ctx_vars.buffer_total_size
since they were redundant. We already have gr->ctx_vars.golden_image_size
to denote golden image size
Jira NVGPU-1527
Change-Id: I8847b347f80235209dd5e28d979e79984ab85408
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1987702
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Extract non-chip-specific code that manages the runlists (init, update,
reschedule etc.) to a new file in the common directory. Move the
declarations to a new matching runlist.h header.
Jira NVGPU-1309
Change-Id: I3c7e0032899516487037f47ddc9a7e7aa4b0b33a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1978058
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>