Implement new API nvgpu_prof_ioctl_exec_reg_ops() to support regops on
new profiler objects.
Add two new staging buffers to hold regops copied from userspace, and
to convert and execute regops in common code.
Buffers are allocated and released along with the profiler object.
New API will implements this :
- copy regops data in chunks of 4K from userspace
- store them in staging buffer
- convert the new regop struct into common regop struct and also
copy the content into second staging buffer
- trigger gops.regops.exec_regops() with second staging buffer as
operation pointer
- convert common regop struct back into new regop struct and copy
back to userspace
Export bunch of helper functions from ioctl_dbg.h. e.g.
nvgpu_get_regops_op_values_common()
Update regop execution code to skip regop execution if regop status
is not valid. This is only possible when userspace requests for
CONTINUE_ON_ERROR mode.
Add more documentation to some of the fields in UAPI header.
Note that maximum atomic operations reported by new API are same
as legacy API and are incorrect. This will be fixed up in upcoming
patches.
Bug 2510974
Jira NVGPU-5360
Change-Id: I9f82052b22143aec33f6e778c0784386744b699e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2394208
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- use release instead of free for the fence destroy identifier
- nvhost_dev is a struct name, so use nvhost_device
- compare nvgpu_nvhost_syncpt_read_ext_check retval properly
Also, if the syncpt read fails when checking for fence expiration,
behave as if the wait isn't expired. Possibly getting stuck is safer
than possibly continuing too early.
Jira NVGPU-5617
Change-Id: Ied529e25f8c43f1c78fd9eac73b9cd6c3550ead5
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398399
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
struct nvgpu_gr is right now initialized during probe and from OS
specific code. To support multiple instances of graphics engine,
nvgpu needs to initialize nvgpu_gr after number of engine instances
have been enumerated in poweron path.
Hence move nvgpu_gr_alloc() to poweron path and after gr manager has
been initialized.
Some of the members of nvgpu_gr are initialized in probe path and they
too are in OS specific code. Move them to common code in
nvgpu_gr_alloc()
Add field fecs_feature_override_ecc_val to struct gk20a to store the
override flag read from device tree. This flag is later copied to
nvgpu_gr in poweron path.
Update tpc_pg_mask_store() to check for g->gr being NULL before
accessing golden image pointer.
Update tpc_fs_mask_store() to return error if g->gr is not initialized.
This path needs nvgpu_gr struct initialized. Also fix the incorrect
NULL pointer check in tpc_fs_mask_store() which breaks the write path
to this sysfs.
Jira NVGPU-5648
Change-Id: Ifa2f66f3663dc2f7c8891cb03b25e997e148ab06
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397259
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Created Perfmon events handling for nvgpu-next.
Nvgpu-next pmu send perfmon events in the form of
rpc events. Events are:
- Change event: This gives information of whether
it is increase/decrease event.
- Init event: This gives information of perfmon init
done in PMU.
NVGPU-5202
NVGPU-5205
NVGPU-5206
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Change-Id: Ida7e77dbaf70d2b594a0801c91a168dcb4a860bd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395358
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The channel watchdog feature has always been a blocker for deterministic
submits. Instead of waiting for a submit call to happen just to reject
it, nack already the setup_bind ioctl if deterministic is set and the
watchdog has not been disabled before. This can avoid confusion with
usermode submits where leaving the watchdog set would have worked but
the watchdog would never see updates from userspace.
Disallow also any other watchdog adjustments than disabling it when the
channel has been set up for deterministic mode.
Jira NVGPU-5582
Change-Id: I0ba4584bbc035197d952e5b562197c36aa483867
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369819
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
channel_joblist_peek() returns NULL if the list is empty.
nvgpu_channel_joblist_is_empty() has been used only together with that
function; remove it and check against NULL to see whether there are jobs
in flight.
This removes some duplication, simplifies the call sites slightly, and
gets rid of a Coverity nag about a possible NULL pointer from peek that
really isn't (when the emptiness was already checked).
Jira NVGPU-5772
Change-Id: I814e9c510d99b88e59539359992fb44d4e7ce2ea
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397394
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add a "priv" fence struct type and use that in the fence type to
emphasize that the inner data is not meant to be seen.
The fence unit needs to have an outside-visible fence type so that
fences can be allocated directly as a struct field in job metadata for
performance and simplicity, so hiding the type entirely wouldn't work.
A couple of places need to touch the priv data directly in channel code.
Those can be thought to be technically fence unit's code scattered
outside the fence files, but they mean that the architecture is not
perfect yet.
Jira NVGPU-5773
Change-Id: Ifa3c95757ae31eef0e32f2605293e23e210b065f
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395071
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split up nvgpu_channel_clean_up_jobs() on the clean_all parameter so
that there's one version for the asynchronous ("deferred") cleanup and
another for the synchronous deterministic cleanup that occurs in the
submit path.
Forking another version like this adds some repetition, but this lets us
look at both versions clearly in order to come up with a coherent plan.
For example, it might be feasible to have the light cleanup of pooled
items in also the nondeterministic path, and deferring heavy cleanup to
another, entirely separated job queue.
Jira NVGPU-5493
Change-Id: I5423fd474e5b8f7b273383f12302126f47076bd3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346065
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move the sema-specific APIs to the sema-specific os fence header. Do the
same for syncpts. Stub out the os fence type and the initialized check
if os fence support is not enabled in build time. Guard the sema header
with CONFIG_NVGPU_SW_SEMAPHORE.
Jira NVGPU-5773
Change-Id: I838debd66a800b00cde76e65458b13eee367b55f
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395070
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new APIs to bind/unbind PM resources to/from profiler objects:
nvgpu_profiler_bind_pm_resources()
nvgpu_profiler_unbind_pm_resources()
Implement support to bind/unbind SMPC/HWPM/HWPM_STREAMOUT in various
functions in common/profiler/profiler.c.
Unbind all the PM resources explicitly in
nvgpu_profiler_unbind_context() while closing the profiler object.
If resources are bound during a resource reservation request,
unbind the resources explicitly before reserving new resource.
It is responsibility of application to bind the PM resources again.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ib2a0e017eaa23d0d376438771e8bf4e340865f03
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389655
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add two new functions to reserve/release PM resources :
nvgpu_prof_ioctl_reserve_pm_resource()
nvgpu_prof_ioctl_release_pm_resource()
Add ctxsw field to struct nvgpu_profiler_object to store per-resource
context switch enable flag.
Force resource reservation release while unbinding the context from
profiler object or while closing the profiler object. Add this code
in nvgpu_profiler_unbind_context() since both above paths will call
this function.
Bug 2510974
Jira NVGPU-5360
Change-Id: If334148e8df86360fba4162d1611187f3f04d01b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389654
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
The lockless allocator that spins in alloc and free ops using cmpxchg to
mitigate race conditions has only ever been used for the post fences in
preallocated job resources. Now each post fence has a clear owner (the
job struct which already is allocated well) and lifetime, so this
allocator has no longer a purpose. Delete it to avoid bitrot. (The
design of the job queue has always been such that there's minimal
contention in any case.)
Jira NVGPU-5773
Change-Id: Ied98d977c2c75bacfd3d010ce60c80fe709231e0
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392705
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
As the submit job metadata has been simplified, the fence pool for job
tracking fences is now just complex code for very simple purposes, so
delete it. It's enough to hold the fence memory in the job struct itself
instead of having separately allocated objects with different lifetimes.
Each channel is using preallocated job arrays based on the prespecified
inflight job count. The fences are used for tracking job completion, and
a new job cannot be submitted before a previous wait has completed.
This means that even with a ringbuffer with space for only one job, the
previous job memory cannot get reclaimed by a new submit because the
submits are ordered.
Jira NVGPU-5773
Change-Id: I0c777df700aa7cfda6f971efa47aa72c5462b53a
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392704
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add supper surface gpu_va details to super surface header member
as needed by PMU ucode to process.
This is required for iGPU PMU ucode on nvgpu-next to process command
line args and ACK back with INIT message, without this PMU ucode ends
up in hang due to DMA wait.
Update super-surface details to cmd line args for PMU ucode to
know the starting address of super-surface in SYSMEM.
JIRA NVGPU-5186
Change-Id: I56d7d3e28527e46707663c97bc8e2a58000c7f5a
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376364
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rework regops execution API to accomodate below updates for new
profiler design
- gops.regops.exec_regops() should accept TSG pointer instead of
channel pointer.
- Remove individual boolean parameters and add one flag field.
Below new flags are added to this API :
NVGPU_REG_OP_FLAG_MODE_ALL_OR_NONE
NVGPU_REG_OP_FLAG_MODE_CONTINUE_ON_ERROR
NVGPU_REG_OP_FLAG_ALL_PASSED
NVGPU_REG_OP_FLAG_DIRECT_OPS
Update other APIs, e.g. gr_gk20a_exec_ctx_ops() and validate_reg_ops()
as per new API changes.
Add new API gk20a_is_tsg_ctx_resident() to check context residency
from TSG pointer.
Convert gr_gk20a_ctx_patch_smpc() to a HAL gops.gr.ctx_patch_smpc().
Set this HAL only for gm20b since it is not required for later chips.
Also, remove subcontext code from this function since gm20b does not
support subcontext.
Remove stale comment about missing vGPU support in exec_regops_gk20a()
Bug 2510974
Jira NVGPU-5360
Change-Id: I3c25c34277b5ca88484da1e20d459118f15da102
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389733
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In nvgpu_dbg_gpu_ioctl_smpc_ctxsw_mode(), check if SMPC global mode
capability is supported instead of checking for the function pointer.
Enable the capability only for Turing since pre-Turing GPUs don't
support it.
Bug 2510974
Jira NVGPU-5360
Change-Id: I352fb2a91b836cd8ef727966a53a28255d8ea834
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389653
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- NvRmGpuDeviceSetSmDebugMode uses regops interface.
- NvRmGpuDeviceTriggerSuspend, NvRmGpuDeviceWaitForPause,
and NvRmGpuDeviceResumeFromPause should return error on Pascal+. Use
regops interface to suspend/resume.
- On non-cilp devices(Maxwell), NvRmGpuDeviceTriggerSuspend,
NvRmGpuDeviceWaitForPause, NvRmGpuDeviceResumeFromPause and
NvRmGpuDeviceSetSmDebugMode are used when debugger(including coredump,
memcheck) is attached or when CUDA application uses a syscall that
requires traphandler(assert, cnp).
Bug 2558022
Bug 2559631
Bug 2706068
JIRA NVGPU-5502
Change-Id: I9eb2ab0c8c75c50f53523d8bf39c75f98b34f3f0
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376159
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Create new dev nodes for device and context profilers. Example of dev
nodes on iGPU
/dev/nvhost-prof-dev-gpu - device scope profiler
/dev/nvhost-prof-ctx-gpu - context scope profiler
Add below APIs to open/close above dev nodes :
nvgpu_prof_dev_fops_open()
nvgpu_prof_ctx_fops_open()
nvgpu_prof_fops_release()
Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these
dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG
to profiler objects.
Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from
file descriptor. Also store profiler object pointer into TSG struct.
Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104.
Note that this is not yet enabled for vGPU.
Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this
will take longer to support.
Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can
explicitly unbind the context and release the resources before closing
the profiler descriptor.
Add context_init flag to profiler object for book keeping.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
If recovery sequence profiling is enabled skip the debug dump that
happens during an MMU fault. This prevents the debug dump from
dominating the time spent by the recovery sequence. The debug dump
is severly limited in speed by the (lack of) UART bandwidth.
JIRA NVGPU-5606
Change-Id: Ifc7c326d33d9115d58b13c0fa42ec4bb7acb3075
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382591
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>