In linux kernel v4.14 and below gpu sysfs node is created under
/sys/devices. In linux kernel v5.x it is created under
/sys/devices/platform.
Create symbolic link gpu.0 under /sys/devices/ as various tests
and scripts expect it to be there.
Bug 200665782
Change-Id: I807ce72fad94438f927df25e829082e771b72543
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2426544
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fuse registers should be queried with physical gpc-id and not the
logical ones. For tu104 and before chips physical gpc-ids are same as
logical for non-floorswept config but for newer chips it may differ.
Also, logical to physical mapping is not present for a floorswept gpc so
query gpc_tpc mask only upto actual gpcs that are present.
Jira NVGPU-6080
Change-Id: I84c4a3c1f256fdd1927f4365af26e9892fe91beb
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417721
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The simulator ring buffer DMA interface supports buffers of the following sizes:
4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match
with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once
4K is reached. However, this is not always true; for instance, take 64K pages.
Hence, replace PAGE_SIZE with SIM_BFR_SIZE.
Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace
latter with former.
Bug 200658101
Jira NVGPU-6018
Change-Id: I83cc62b87291734015c51f3e5a98173549e065de
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Mapping of large buffers to GMMU end up needing many
pages for the PTE tables. Allocating these one by one
can end up being a performance bottleneck, particularly
in the virtualized case.
This is adding the following changes:
- As the TLB invalidation doesn't have access to mem_off,
allow top-level allocation by alloc_cache_direct().
- Define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab
for the PD cache, effectively set to 64K bytes
- Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE
When freeing up cached entries, avoid prefetch errors by
invalidating the entry (memset to 0).
- Try to fall back to direct allocation of smaller chunk for
contiguous allocation failures.
- Unit test changes.
Bug 200649243
Change-Id: I0a667af0ba01d9147c703e64fc970880e52a8fbc
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2404371
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Right now PMA byte buffer address is allocated in the range of
0x1ffc010000. The register that stores this address is only 32-bit and
there is no corresponding _hi() register, so the address must fit in
32 bits.
Update nvgpu_vm_init() parameters in nvgpu_perfbuf_init_vm() so that a
low_hole of only 4K is used. This allows the address to be allocated
in the range of 0x4000000.
Also map byte buffer before PMA stream buffer so that byte buffer always
gets lower address.
There is only one PMA stream buffer allowed to be mapped right now so
this works for now. But in future multiple buffers can be mapped and this
solution needs to be reworked.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ief1a9ee54d554e3bc13c7a9567934dcbeaefbcc6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418520
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Support new IOCTL to manage PMA stream meta data by adding below API
nvgpu_prof_ioctl_pma_stream_update_get_put()
Add nvgpu_perfbuf_update_get_put() to handle all the updates coming
from userspace and to pass all required information.
Add gops.perf.update_get_put() to handle all HW accesses required in
perf HW unit.
Add gops.perf.bind_mem_bytes_buffer_addr() to bind the available bytes
buffer while binding HWPM streamout.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ibacc2299b845e47776babc081759dfc4afde34fe
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2406484
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add two new IOCTL APIs to allocate/free pma stream. Add two new
functions to handle this :
nvgpu_prof_ioctl_alloc_pma_stream()
nvgpu_prof_ioctl_free_pma_stream()
Allocation of pma stream includes below steps :
- Initializing perfbuf VM
- Mapping PMA buffer into perfbuf VM
- Mapping PMA byte buffer into perfbuf VM
- Mapping PMA byte buffer to CPU virtual address space
Store all of above data in struct nvgpu_profiler_object for
reference. OS specific data is stored in struct
nvgpu_profiler_object_priv
Update HWPM streamout bind/unbind sequence to enable/disable perfbuf
respectively.
Also take care of releasing the pma stream resources in profiler object
close path if they are not explicitly released by user space by IOCTL
call.
Bug 2510974
Jira NVGPU-5360
Change-Id: I126633746cabc4e293c7ad7c49806866a897949d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2406483
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Enable CONFIG_GK20A_DEVFREQ and apply corresponding
changes for kernel-5.9
- Remove frequency clipping of devfreq min/max frequency
constraint due to devfreq already takes care of it.
- Register available GPU frequencies to OPP framework
due to devfreq access available device frequencies
through OPP frameworks during device frequency
transition.
Bug 200639056
Change-Id: I72e4a7825ae9ca814791dc283138d17a5cfbe8e2
Signed-off-by: Aaron Tian <atian@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400107
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Following changes are made in this patch.
1) Change unnamed structs within gpu_ops to named structs
with the prefix gops_*.
2) Each named struct gops_ are moved into a separate gops specific file
under include/nvgpu/gops/
3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h
and all other dependent struct gops_* are included in this header.
4) Direct references to include/nvgpu/gops are removed from files as its enough
to include gk20a.h.
Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
* Removed unnecessary irqs_enabled flag, and
Replaced enable/disable irq logics with nvgpu variant functions.
* Added nvgpu_interrupts data structure to hold interrupt details.
* Interpret all stall irqs first and followed by nonstall irq from dt.
* Used interrupt size checks for enable/disable irqs instead of
comparing stall and nonstall interrupt lines.
Now adding new stall interrupt lines as easy as just updating macro.
Jira NVGPU-6019
Change-Id: I5a5eaa8d333c68ee87d25d2b45ec244ec8d7b297
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400777
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The max values that the Linux nvhost driver tracks are adding some
complexity to our wrapper APIs. Max values are used only for internal
submit syncpoint tracking, so implement that tracking in the sync code
by just storing the last value that the syncpoing will reach after all
jobs are complete.
The value is a simple u32. It's accessed from functions in the submit
path that already is serialized, so there's no worrying about atomic
modifications.
Previously nvhost_syncpt_set_min_eq_max_ext() was used to reset the
syncpoint when necessary. Now with the internal max value we'll use
nvhost_syncpt_set_minval(), so add a wrapper for it.
The maxval reported with the user syncpoint allocation is just the
current value at allocation time since no jobs have affected it yet;
there is no means for the kernel to track the max value of user
syncpoints.
Jira NVGPU-5506
Change-Id: I34672eaa7fe3af36b2fbac92d11babe2bc6a2d2b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
common.gr unit exports a separate API nvgpu_gr_prepare_sw to
initialize some SW pieces required for nvgpu_gr_enable_hw().
A separate API is really unnecessary since same initialization
can be performed in nvgpu_gr_alloc().
Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw().
Initialize falcon and interrupt structures in loop from
nvgpu_gr_alloc().
Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to
common init path since netlist parsing need not be done from
common.gr unit. It just needs to happen before nvgpu_gr_enable_hw().
Also, trigger nvgpu_gr_free() from gr_remove_support() instead
of OS specific paths. Also remove nvgpu_gr_free() calls from
probe error paths since nvgpu_gr_alloc is no longer called in
probe path.
Move interrupt and falcon data structure free calls to nvgpu_gr_free().
Also remove corresponding unit testing code that tests
nvgpu_gr_prepare_sw() specifically.
Update some unit tests to initialize ecc counters and netlist.
Disable some unit tests that fail for reasons unknown.
Jira NVGPU-5648
Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The channel code needs the watchdog code and vice versa. Cut this
circular dependency with a few simplifications so that the watchdog
wouldn't depend on so much.
When calling watchdog APIs that cause stores or comparisons of channel
progress, provide a snapshot of the current progress instead of a whole
channel pointer. struct nvgpu_channel_wdt_state is added as an interface
for this to track gp_get and pb_get.
When periodically checking the watchdog state, make the channel code ask
whether a hang has been detected and abort the channel from within
channel code instead of asking the watchdog to abort the channel. The
debug dump verbosity flag is also moved back to the channel data.
Move the functionality to restart all channels' watchdogs to channel
code from watchdog code. Looping over active channels is not a good
feature for the watchdog; it's better for the channel handling to just
use the watchdog as a tracking tool.
Move a few unserviceable checks up in the stack to the callers of the
wdt code. They're a kludge but this will do for now and demonstrates
what needs to be eventually fixed.
This does not leave much code in the watchdog unit. Now the purpose of
the watchdog is to only isolate the logic to couple a timer and progress
snapshots with careful locking to start and stop the tracking.
Jira NVGPU-5582
Change-Id: I7c728542ff30d88b1414500210be3fbaf61e6e8a
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369820
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Legacy profiler does not reserve PMA stream resource with PM
reservation system. Also, HWPM system reset is separately implemented
in membuf disable path. And it does not even restore perf unit SLCG
prod values.
Allcoate a dummy profiler object for debug session in perfbuf map
path. Free it in perfbuf unmap path.
This has advantage of synchronizing PMA stream reservation with new
profiler stack. And this also leverages HWPM system reset and SLCG
handling code during resource reservation.
Remove explicit HWPM reset from gops.perf.membuf_reset_streaming()
HALs
Bug 2510974
Jira NVGPU-5360
Change-Id: I54c5202b6251dea3d80a4dfc011e8a296339e07f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2399595
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.
Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:
- The enum_type that engine_info uses is defined in engines.h and
has a bit of SW abstraction - in particular the GRCE type. The only
place this seemed to be actually relevant (the IOCTL providing device
info to userspace) the GRCE engines can be worked out by comparing
runlist ID.
- Addition of masks based on intr_id and reset_id; those can be
computed easily enough using BIT32() but this is an area that
could be improved on.
This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.
JIRA NVGPU-5421
Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, we just deinitialize the nvlink in the shutdown path.
This alone is not sufficient and can lead to someone trying to use
dGPU while being shutdown.
Avoid triggers to dGPU usage by -
1. Set NVGPU_DRIVER_IS_DYING to let users
know that the driver is currently in the process of dying.
2. Disable IRQs
3. Prepare for poweroff using nvgpu_prepare_poweroff
4. Stop CPU from accessing GPU registers
5. Set GPU state to POWEROFF
Bug 200601517
JIRA NVGPU-5991
Change-Id: Ie185516618678bb893bcc3c3dcb514701483ecf2
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393565
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Implement new API nvgpu_prof_ioctl_exec_reg_ops() to support regops on
new profiler objects.
Add two new staging buffers to hold regops copied from userspace, and
to convert and execute regops in common code.
Buffers are allocated and released along with the profiler object.
New API will implements this :
- copy regops data in chunks of 4K from userspace
- store them in staging buffer
- convert the new regop struct into common regop struct and also
copy the content into second staging buffer
- trigger gops.regops.exec_regops() with second staging buffer as
operation pointer
- convert common regop struct back into new regop struct and copy
back to userspace
Export bunch of helper functions from ioctl_dbg.h. e.g.
nvgpu_get_regops_op_values_common()
Update regop execution code to skip regop execution if regop status
is not valid. This is only possible when userspace requests for
CONTINUE_ON_ERROR mode.
Add more documentation to some of the fields in UAPI header.
Note that maximum atomic operations reported by new API are same
as legacy API and are incorrect. This will be fixed up in upcoming
patches.
Bug 2510974
Jira NVGPU-5360
Change-Id: I9f82052b22143aec33f6e778c0784386744b699e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2394208
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- use release instead of free for the fence destroy identifier
- nvhost_dev is a struct name, so use nvhost_device
- compare nvgpu_nvhost_syncpt_read_ext_check retval properly
Also, if the syncpt read fails when checking for fence expiration,
behave as if the wait isn't expired. Possibly getting stuck is safer
than possibly continuing too early.
Jira NVGPU-5617
Change-Id: Ied529e25f8c43f1c78fd9eac73b9cd6c3550ead5
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398399
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
struct nvgpu_gr is right now initialized during probe and from OS
specific code. To support multiple instances of graphics engine,
nvgpu needs to initialize nvgpu_gr after number of engine instances
have been enumerated in poweron path.
Hence move nvgpu_gr_alloc() to poweron path and after gr manager has
been initialized.
Some of the members of nvgpu_gr are initialized in probe path and they
too are in OS specific code. Move them to common code in
nvgpu_gr_alloc()
Add field fecs_feature_override_ecc_val to struct gk20a to store the
override flag read from device tree. This flag is later copied to
nvgpu_gr in poweron path.
Update tpc_pg_mask_store() to check for g->gr being NULL before
accessing golden image pointer.
Update tpc_fs_mask_store() to return error if g->gr is not initialized.
This path needs nvgpu_gr struct initialized. Also fix the incorrect
NULL pointer check in tpc_fs_mask_store() which breaks the write path
to this sysfs.
Jira NVGPU-5648
Change-Id: Ifa2f66f3663dc2f7c8891cb03b25e997e148ab06
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397259
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The channel watchdog feature has always been a blocker for deterministic
submits. Instead of waiting for a submit call to happen just to reject
it, nack already the setup_bind ioctl if deterministic is set and the
watchdog has not been disabled before. This can avoid confusion with
usermode submits where leaving the watchdog set would have worked but
the watchdog would never see updates from userspace.
Disallow also any other watchdog adjustments than disabling it when the
channel has been set up for deterministic mode.
Jira NVGPU-5582
Change-Id: I0ba4584bbc035197d952e5b562197c36aa483867
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369819
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add a "priv" fence struct type and use that in the fence type to
emphasize that the inner data is not meant to be seen.
The fence unit needs to have an outside-visible fence type so that
fences can be allocated directly as a struct field in job metadata for
performance and simplicity, so hiding the type entirely wouldn't work.
A couple of places need to touch the priv data directly in channel code.
Those can be thought to be technically fence unit's code scattered
outside the fence files, but they mean that the architecture is not
perfect yet.
Jira NVGPU-5773
Change-Id: Ifa3c95757ae31eef0e32f2605293e23e210b065f
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395071
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split up nvgpu_channel_clean_up_jobs() on the clean_all parameter so
that there's one version for the asynchronous ("deferred") cleanup and
another for the synchronous deterministic cleanup that occurs in the
submit path.
Forking another version like this adds some repetition, but this lets us
look at both versions clearly in order to come up with a coherent plan.
For example, it might be feasible to have the light cleanup of pooled
items in also the nondeterministic path, and deferring heavy cleanup to
another, entirely separated job queue.
Jira NVGPU-5493
Change-Id: I5423fd474e5b8f7b273383f12302126f47076bd3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346065
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new APIs to bind/unbind PM resources to/from profiler objects:
nvgpu_profiler_bind_pm_resources()
nvgpu_profiler_unbind_pm_resources()
Implement support to bind/unbind SMPC/HWPM/HWPM_STREAMOUT in various
functions in common/profiler/profiler.c.
Unbind all the PM resources explicitly in
nvgpu_profiler_unbind_context() while closing the profiler object.
If resources are bound during a resource reservation request,
unbind the resources explicitly before reserving new resource.
It is responsibility of application to bind the PM resources again.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ib2a0e017eaa23d0d376438771e8bf4e340865f03
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389655
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add two new functions to reserve/release PM resources :
nvgpu_prof_ioctl_reserve_pm_resource()
nvgpu_prof_ioctl_release_pm_resource()
Add ctxsw field to struct nvgpu_profiler_object to store per-resource
context switch enable flag.
Force resource reservation release while unbinding the context from
profiler object or while closing the profiler object. Add this code
in nvgpu_profiler_unbind_context() since both above paths will call
this function.
Bug 2510974
Jira NVGPU-5360
Change-Id: If334148e8df86360fba4162d1611187f3f04d01b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389654
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Rework regops execution API to accomodate below updates for new
profiler design
- gops.regops.exec_regops() should accept TSG pointer instead of
channel pointer.
- Remove individual boolean parameters and add one flag field.
Below new flags are added to this API :
NVGPU_REG_OP_FLAG_MODE_ALL_OR_NONE
NVGPU_REG_OP_FLAG_MODE_CONTINUE_ON_ERROR
NVGPU_REG_OP_FLAG_ALL_PASSED
NVGPU_REG_OP_FLAG_DIRECT_OPS
Update other APIs, e.g. gr_gk20a_exec_ctx_ops() and validate_reg_ops()
as per new API changes.
Add new API gk20a_is_tsg_ctx_resident() to check context residency
from TSG pointer.
Convert gr_gk20a_ctx_patch_smpc() to a HAL gops.gr.ctx_patch_smpc().
Set this HAL only for gm20b since it is not required for later chips.
Also, remove subcontext code from this function since gm20b does not
support subcontext.
Remove stale comment about missing vGPU support in exec_regops_gk20a()
Bug 2510974
Jira NVGPU-5360
Change-Id: I3c25c34277b5ca88484da1e20d459118f15da102
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389733
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In nvgpu_dbg_gpu_ioctl_smpc_ctxsw_mode(), check if SMPC global mode
capability is supported instead of checking for the function pointer.
Enable the capability only for Turing since pre-Turing GPUs don't
support it.
Bug 2510974
Jira NVGPU-5360
Change-Id: I352fb2a91b836cd8ef727966a53a28255d8ea834
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389653
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- NvRmGpuDeviceSetSmDebugMode uses regops interface.
- NvRmGpuDeviceTriggerSuspend, NvRmGpuDeviceWaitForPause,
and NvRmGpuDeviceResumeFromPause should return error on Pascal+. Use
regops interface to suspend/resume.
- On non-cilp devices(Maxwell), NvRmGpuDeviceTriggerSuspend,
NvRmGpuDeviceWaitForPause, NvRmGpuDeviceResumeFromPause and
NvRmGpuDeviceSetSmDebugMode are used when debugger(including coredump,
memcheck) is attached or when CUDA application uses a syscall that
requires traphandler(assert, cnp).
Bug 2558022
Bug 2559631
Bug 2706068
JIRA NVGPU-5502
Change-Id: I9eb2ab0c8c75c50f53523d8bf39c75f98b34f3f0
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376159
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Create new dev nodes for device and context profilers. Example of dev
nodes on iGPU
/dev/nvhost-prof-dev-gpu - device scope profiler
/dev/nvhost-prof-ctx-gpu - context scope profiler
Add below APIs to open/close above dev nodes :
nvgpu_prof_dev_fops_open()
nvgpu_prof_ctx_fops_open()
nvgpu_prof_fops_release()
Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these
dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG
to profiler objects.
Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from
file descriptor. Also store profiler object pointer into TSG struct.
Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104.
Note that this is not yet enabled for vGPU.
Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this
will take longer to support.
Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can
explicitly unbind the context and release the resources before closing
the profiler descriptor.
Add context_init flag to profiler object for book keeping.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>