Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.
Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:
- The enum_type that engine_info uses is defined in engines.h and
has a bit of SW abstraction - in particular the GRCE type. The only
place this seemed to be actually relevant (the IOCTL providing device
info to userspace) the GRCE engines can be worked out by comparing
runlist ID.
- Addition of masks based on intr_id and reset_id; those can be
computed easily enough using BIT32() but this is an area that
could be improved on.
This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.
JIRA NVGPU-5421
Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- NvRmGpuDeviceSetSmDebugMode uses regops interface.
- NvRmGpuDeviceTriggerSuspend, NvRmGpuDeviceWaitForPause,
and NvRmGpuDeviceResumeFromPause should return error on Pascal+. Use
regops interface to suspend/resume.
- On non-cilp devices(Maxwell), NvRmGpuDeviceTriggerSuspend,
NvRmGpuDeviceWaitForPause, NvRmGpuDeviceResumeFromPause and
NvRmGpuDeviceSetSmDebugMode are used when debugger(including coredump,
memcheck) is attached or when CUDA application uses a syscall that
requires traphandler(assert, cnp).
Bug 2558022
Bug 2559631
Bug 2706068
JIRA NVGPU-5502
Change-Id: I9eb2ab0c8c75c50f53523d8bf39c75f98b34f3f0
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376159
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Create new dev nodes for device and context profilers. Example of dev
nodes on iGPU
/dev/nvhost-prof-dev-gpu - device scope profiler
/dev/nvhost-prof-ctx-gpu - context scope profiler
Add below APIs to open/close above dev nodes :
nvgpu_prof_dev_fops_open()
nvgpu_prof_ctx_fops_open()
nvgpu_prof_fops_release()
Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these
dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG
to profiler objects.
Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from
file descriptor. Also store profiler object pointer into TSG struct.
Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104.
Note that this is not yet enabled for vGPU.
Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this
will take longer to support.
Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can
explicitly unbind the context and release the resources before closing
the profiler descriptor.
Add context_init flag to profiler object for book keeping.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, NVGPU_SUPPORT_FECS_CTXSW_TRACE enabled flag is set to true
when fecs_trace s/w setup is executed successfully. Sometimes,
fecs_trace is required to be disabled for debugging. This change will
help disable/enable fecs_trace feature by modifying one of the enabled
flags.
Enable NVGPU_SUPPORT_FECS_CTXSW_TRACE during chip specific hal init.
Control fec_trace init and ctxsw dev open depending on
NVGPU_SUPPORT_FECS_CTXSW_TRACE flag status.
JIRA NVGPU-5616
Change-Id: Id0754a5af7cd95a67a1f0ae5de36115d44e1111b
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2357501
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_perfbuf_map() allocates perfbuf VM, maps the user buffer into new
VM, and then triggers gops.perfbuf.perfbuf_enable(). This HAL then does
following :
- Allocate perfbuf instance block
- Initialize perfbuf instance block
- Reset stream buffer
- Program instance block address in PMA registers
- Program user buffer address into PMA registers
New profiler interface will have it's own API to setup PMA strem, and
it requires above setup to be done in two phases of perfbuf
initialization and then user buffer setup.
Split above functionalities into below functions
- nvgpu_perfbuf_init_vm()
- Allocate perfbuf VM
- Call gops.perfbuf.init_inst_block() to initialize perfbuf instance
block
- gops.perfbuf.init_inst_block()
- Allocate perfbuf instance block
- Initialize perfbuf instance block
- Program instance block address in PMA registers using
gops.perf.init_inst_block()
- In case of vGPU, trigger TEGRA_VGPU_CMD_PERFBUF_INST_BLOCK_MGT
command to gpu server
- gops.perf.init_inst_block()
- Reset stream buffer
- Program user buffer address into PMA registers
Also add corresponding cleanup functions as below :
gops.perf.deinit_inst_block()
gops.perfbuf.deinit_inst_block()
nvgpu_perfbuf_deinit_vm()
Bug 2510974
Jira NVGPU-5360
Change-Id: I486370f21012cbb7fea84fe46fb16db95bc16790
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2372984
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
While booting LS falcons, gr.falcon.bind_instblk gops is
used to bind WPR VA to gr falcon. Only FECS_METHOD must be
used to bind instblks. But at this point FECS falcon is not loaded
and running. Hence FECS_METHOD cannot be used to bind this instblk.
Besides that, this code is not required
for successful falcon boot and functioning of chips other
than gm20b.
JIRA NVGPU-5323
Change-Id: I148ccc77d65d5f01adbba6261369e7a292dccfc3
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369736
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently the vGPU engine management rewrites a lot of the common
device agnostic engine management code.
With the new top HAL parsing one device at a time, it is now more
easily possible to tie the vGPU into the new common device framework
by implementing the top HAL but with the vGPU engine list backend.
This lets the vGPU inherit all the common engine and device
management code. By doing so the vGPU HAL need only implement a
trivial and simple HAL.
This also gets us a step closer to merging all of the CE init
code: logically it just iterates through all CE engines whatever
they may be. The only reason this differs between chips is because
of the swap from CE0-2 to LCEs in the Pascal generation. This could
be abstracted by the unit code easily enough.
Also, the pbdma_id for each engine has to be added to the device
struct. Eventually this was going to happen anyway, since the
device struct will soon replace the nvgpu_engine_info struct.
It's a little bit of an abuse but might be worth it long term. If
not, it should not be difficult to replace uses of dev->pbdma_id
with a proper lookup of PBDMA ID based on the device info.
JIRA NVGPU-5421
Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Current PM resource reservation system is limited to HWPM resources
only. And reservation tracking is done using boolean variables.
New upcoming profiler support requires reservation for all the PM
resources like SMPC and PMA stream. Using boolean variables is
not scalable and confusing. Plus the variables have to be replicated
on gpu server in case of virtualization.
Remove flag tracking mechanism and use list based approach to track
all PM reservations. Also, current HALs are defined on debugger object.
Implement new HALs in new pm_reservation object since it is really an
independent functionality.
Add new source file common/profiler/pm_reservation.c which implements
functions to reserve/release resources and to check if any resource
is reserved or not.
Add common/vgpu/pm_reservation_vgpu.c for vGPU which simply forwards
the request to gpu server.
Define new HAL object gops.pm_reservation and assign above functions
to below respective HALs :
g->ops.pm_reservation.acquire()
g->ops.pm_reservation.release()
g->ops.pm_reservation.release_all_per_vmid()
Last HAL above is only used for gpu server cleanup of guest OS.
Add below new common profiler functions that act as APIs to reserve/
release resources for rest of the units in nvgpu.
nvgpu_profiler_pm_resource_reserve()
nvgpu_profiler_pm_resource_release()
Initialize the meta data required for reservtion system in
nvgpu_pm_reservation_init() and call it during nvgpu_finalize_poweron.
Clean up the meta data before releasing struct gk20a.
Delete below HALs :
g->ops.debugger.check_and_set_global_reservation()
g->ops.debugger.check_and_set_context_reservation()
g->ops.debugger.release_profiler_reservation()
Bug 2510974
Jira NVGPU-5360
Change-Id: I4d9f89c58c791b3b2e63099a8a603462e5319222
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367224
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
quad type reg_ops were only needed on Kepler, and not for any other chip
beginning Maxweel.
HAL g->ops.gr.access_smpc_reg() was incorrectly set for Volta and Turing
whereas it was only applicable to Kepler. Delete it.
There is no register in the quad type whitelist since the type itself is
not supported anymore. Remove the empty whitelists for all chips and
also delete below HALs:
g->ops.regops.get_qctl_whitelist()
g->ops.regops.get_qctl_whitelist_count()
hal/regops/regops_gv100.* files are not used anymore. Delete the files
instead of just deleting quad HALs in these files.
Bug 200628391
Change-Id: I4dcc04bef5c24eb4d63d913f492a8c00543163a2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366035
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists.
This array allows other software to query what PBDMA(s) serves a given
runlist. The PBDMA map is read verbatim from an array of host registers.
These registers are stored in a kmalloc()'ed array.
This causes a problem for the device management code. The device
management initialization executes well before the rest of the FIFO
PBDMA initialization occurs. Thus, if the device management code
queries the PBDMA mapping for a given device/runlist, the mapping has
yet to be populated.
In the next patches in this series the engine management code is subsumed
into the device management code. In other words the device struct is
reused by the engine management and all host SW does is pull pointers to
the host managed devices from the device manager. This means that all
engine initialization that used to be done on top of the device
management needs to move to the device code.
So, long story short, the PBDMA map needs to be read from the registers
directly, instead of an array that gets allocated long after the device
code has run.
This patch removes the pbdma map array, deletes two HALs that managed
that, and instead provides a new HAL to query this map directly from
the registers so that the device code can use it.
JIRA NVGPU-5421
Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
This adds a new device management unit in the common code responsible
for facilitating the parsing of the GPU top device list and providing
that info to other units in nvgpu.
The basic idea is to read this list once from HW and store it in a
set of lists corresponding to each device type (graphics, LCE, etc).
Many of the HALs in top can be deleted and instead implemented using
common code parsing the SW representation.
Every time the driver queries the device list it does so using a
device type and instance ID. This is common code. The HAL is responsible
for populating the device list in such a way that the driver can
query it in a chip agnostic manner.
Also delete some of the unit tests for functions that no longer
exist. This code will require new unit tests in time; those should be
quite simple to write once unit testing is needed.
JIRA NVGPU-5421
Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The ecc init, handling for the fb unit is refactored to improve reusability
for nvgpu-next.
The following changes have been done:
- fb.ecc:
This is a new subunit within fb and contains the following functions:
- init: Moved from fb.fb_ecc_init.
- free: Moved from fb.fb_ecc_free.
- l2tlb_error_mask: Fetch bit mask for corrected, uncorrected errors supported
by the unit.
- fb.intr:
This unit has been updated to include the following ecc interrupt, error
handlers:
- handle_ecc: Top level interrupt handler for fb ecc errors.
- handle_ecc_l2tlb: Handle errors within l2tlb memory.
- handle_ecc_hubtlb: Handle errors within hubtlb memory.
- handle_ecc_fillunit: Handle errors within fillunit memory
Jira: NVGPU-5032
Change-Id: I1a26c1823eb992e0e0175250b969f1186dff6e62
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2333271
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add NVGPU_SUPPORT_COMPRESSION to indicate if compression feature is
supported in nvgpu. If not, set cbc.init, cbc.ctrl and
cbc.alloc_comptags hals to NULL.
Add corresponding GPU characteristics flag and IOCTL mapping to sync
compression support status with nvrm_gpu.
JIRA NVGPU-4666
Change-Id: I2e685688ddac592b3bb918ee70c82ea5524d695a
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2338926
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Instead of one HAL op with a boolean flag to decide whether to do one
thing or another entirely different thing, use two separate HAL ops for
filling priv cmd bufs with semaphore wait and semaphore increment
commands. It's already two ops for syncpoints, and explicit commands are
more readable than boolean flags.
Change offset into cmdbuf in sem wait HAL to be relative to the cmdbuf,
so the HAL adds the cmdbuf internal offset to it.
While at it, modify the syncpoint cmdbuf HAL ops' prototypes to be
consistent.
Jira NVGPU-4548
Change-Id: Ibac1fc5fe2ef113e4e16b56358ecfa8904464c82
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2323319
(cherry picked from commit 08c1fa38c0fe4effe6ff7a992af55f46e03e77d0)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328409
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Channel debug_dump hal function does not involve
any register related code.
Move gv11b_channel_debug_dump hal function to
common code nvgpu_channel_info_debug_dump function.
Check gpu hw version to limit instance variables
dump that differs between socs.
Add new hal pointer syncpt_debug_dump for pbdma.
Jira NVGPU-5109
Signed-off-by: Vinod G <vinodg@nvidia.com>
Change-Id: Icfca837ce8e4117387cffa6fadf6c094c7da5946
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321016
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new hal function
gp10b_gr_intr_handle_class_error.
Update handle_class_error hal function for
gp10b, gv11b and tu104 to
gp10b_gr_intr_handle_class_error from
gm20b_gr_intr_handle_class_error.
gr_trapped_data_mme_pc uses 12 bits from gp10b.
Move gm20b_gr_intr_handle_class_error hal function
to non-fusa section.
Jira NVGPU-4913
Signed-off-by: vinodg <vinodg@nvidia.com>
Change-Id: Ic93013ba43d4bf409527109f2f2d43db11c4238e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2314249
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, size of zbc index table is defined as a macro. This macro is
independent of the number of address bits in the ltc zbc index register.
Adding below hal will update zbc index table size as per number of
address bits.
Add hal to get gr_zbc_index_table_size:
u32 (*zbc_table_size)(struct gk20a *g);
ZBC index table address 0 is reserved. Logic to start zbc table index
from 1 is moved to corresponding hals.
JIRA NVGPU-4838
Change-Id: I700cadfdd1f3dc5f323055b8f44d769d6627920a
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2288479
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
From gv11b onwards, FECS ucode returns an ACK for set watchdog
timeout method. Failure to wait for this ACK was leading to races,
and in some cases, the ACK could be mistaken for the reply to the
next method.
In particular, this happened for the discover golden image size
method which is sent after set watchdog timeout.
With instrumented FECS ucode, it takes longer for the code to
process the set watchdog timeout method, and the write to ack
that method could happen after nvgpu driver clears the mailbox to
send the discover image size method.
With an invalid golden context image size, FECS ended up causing
an MMU fault while attempting to save past allocated buffer.
Added NVGPU_GR_FALCON_METHOD_SET_WATCHDOG_TIMEOUT to be used with
gops_gr_falcon.ctrl_ctxsw, and implemented 2 variants:
- gm20b_gr_falcon_ctrl_ctxsw, without ACK
- gv11b_gr_falcon_ctrl_ctxsw, with ACK
Added NVGPU_GR_FALCON_SUBMIT_METHOD_F_LOCKED flag to allow
executing above method without re-acquiring FECS lock. Longer term,
the 'flags' could be added to gop_gr_falcon.ctrl_ctxsw parameters.
Use gops_gr_falcon.ctrl_ctxsw instead of register writes to invoke
set watchdog timeout method in gm20b_gr_falcon_wait_ctxsw_ready.
Also replaced calls to gm20b_gr_falcon_ctrl_ctxsw to
gops_gr.falcon.ctrl_ctxsw when appropriate, since there are
multiple variants (gm20b, gp10b and gv11b).
Last, fixed clearing of mailbox 0 in gm20b_gr_falcon_bind_instblk.
Bug 200586923
Change-Id: I653b9a216555eec8cd4bb01d6f202bc77b75a939
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287340
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>