Commit Graph

541 Commits

Author SHA1 Message Date
prsethi
144f548552 gpu:nvgpu: fix for shadow domain submission
There are three issues with shadow domain submission:
1. runlist mem is not being swapped with mem_hw for shadow domain when
there is non-shadow domain being bound to tsg which does not allow runlist
to have all the tsgs. To fix this nvgpu_runlist_swap_mem() is being called
for shadow domain as well.
2. tsg num_active_channels is being set as part of non-shadow domain which
does happen after shadow domain. Due to this, runlist tsg length is not
being set as part of runlist reconstruct and leaving tsg length 0 for last
tsg. To fix this, tsg num_active_channels always get set for shadow domain
as it gets configured first.
3. NV_BUILD_CONFIGURATION_VARIANT_IS_EMBEDDED is not solving the purpose
to differentiate l4t and embedded_linux builds so using
NV_BUILD_SYSTEM_TYPE in place of this to find out the build type.

L4t is using round robin scheduling and this issue coming with manual mode
scheduling so adding the fix only for manual mode scheduling support.

Bug 3884011

Change-Id: Ic55da8f75294eb32c8df6e35fb1fa47df78db8f8
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2833880
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-01-09 12:26:00 -08:00
Rajesh Devaraj
2e36ad9e35 gpu: nvgpu: add null check for gp_get, pb_get
This patch adds NULL check for gp_get and pb_get.

JIRA NVGPU-9325

Change-Id: If41c1c526c58a18cc91a95686e71bdfae9edb328
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2836366
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-01-03 19:10:11 -08:00
Atul Anand
5db14f3bfb nvgpu: Fix pm resource release sequence
The memory leak issue was due to nvgpu_profiler_unbind_context() calling
nvgpu_profiler_pm_resource_release() for all resources which clears the
flag required by nvgpu_profiler_free_pma_stream() to release the memory
for perf_buf instance block.
Fixing this issue by splitting nvgpu_profiler_unbind_context() to release
all the pm resources at a later time separately.

Bug 3510455
Change-Id: Ibab8d071693e600c46f7e7f16575e36e6f62af3c
Signed-off-by: atanand <atanand@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2825013
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-12-15 15:13:29 -08:00
Rajesh Devaraj
3b2b225c73 gpu: nvgpu: update pmu_early_init
Move the setting of power features related enable flags to separate
static function. Invoke this function when PMU is not supported.

JIRA NVGPU-9283

Change-Id: I429504c09d40c2cb115fce7550555f06b1e384ed
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2817658
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-12-07 01:51:30 -08:00
Debarshi Dutta
5d2dfc88a3 gpu: nvgpu: Replace CONFIG_NVS_KMD_BACKEND
Use CONFIG_KMD_SCHEDULING_WORKER_THREAD instead of
CONFIG_NVS_KMD_BACKEND to remove confusion about the CPU based
KMD scheduling worker thread.

The KMD based scheduling worker thread caters to both Manual Mode
CPU based scheduler as well as Automatic Round Robin CPU based
scheduler.

For the traditional submit path, add correct handling of the
CONFIG_NVS_PRESENT. CPU based worker thread should be part of
CONFIG_NVS_PRESENT. Eventually, when DCONFIG_KMD_SCHEDULING_WORKER_THREAD
is removed, the application must switch to GSP.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0886ef3b2e0124b6fe22c2bf0bf7d1fa98039d00
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2810217
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-23 08:07:24 -08:00
rmylavarapu
c8429c5de9 gpu: nvgpu: gsp sched: get device id from runlist pri base
Get the device id from runlist pri base instead of reading from
runlist structure which could be failing if the device node not
present inside the runlist struct.
This Change will call the get device id hal to get it from rlend_id
and runlist pri base.

NVGPU-8531

Change-Id: Ia81189a6c2281ed09ee52eb461f0cd87164c5fc4
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2791605
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-17 02:04:28 -08:00
Sagar Kamble
96f675595c gpu: nvgpu: implement get and revoke share token ioctls
Add share token list to gk20a_ctrl_priv. Implement GET_SHARE_TOKEN
and REVOKE_SHARE_TOKEN ioctls. Revoke tokens while closing the
TSG for all active devices.

Bug 3677982
JIRA NVGPU-8681

Change-Id: I74455c21d881d5a0d381729fd695239722599980
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2792081
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Scott Long <scottl@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-11-10 11:49:54 -08:00
Sagar Kamble
675edd5053 gpu: nvgpu: maintain authorized devices in TSG
When the TSG is successfully created first time or is opened with share
token, the device instance id associated with the CTRL fd will be added
to the TSG private data structure as authorized device instance ids.

This is used for a security check when creating a TSG share token with
nvgpu_tsg_get_share_token.

Bug 3677982
JIRA NVGPU-8681

Change-Id: I67bb0514e1272dab15023cd3828a6a51e9a4c928
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2792080
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Scott Long <scottl@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-11-10 11:49:44 -08:00
Sagar Kamble
d1b28712b6 gpu: nvgpu: implement VEID alloc/free
Implement the ioctls NVGPU_TSG_IOCTL_CREATE_SUBCONTEXT and
NVGPU_TSG_IOCTL_DELETE_SUBCONTEXT. These will allocate and
free the VEID numbers.

Address space association with the VEIDs is verified to
ensure that channels association with VEIDs and address
space remains consistent.

Bug 3677982
JIRA NVGPU-8681

Change-Id: I2d913baf61a6bdeec412c58270c0024b80ca15c6
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766765
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-11-01 00:05:18 -07:00
Debarshi Dutta
280b69e66d nvgpu: userspace: add unit test for nvs
Add a unit test to add verification for S/W parts of
NVGPU-KMD based scheduler

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I266cb4167074dc5f7da647ce627e96188fc6bdcb
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2767591
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-10 14:08:03 -07:00
Debarshi Dutta
17dc483a6b gpu: nvgpu: enclose NVS KMD inside a config
Use CONFIG_NVS_KMD_BACKEND to enclose all NVS KMD based scheduling
code.

Current configuration contains all the scheduling code managed within
CONFIG_NVS_PRESENT. Eventually, scheduling code shall only use GSP.
Hence, isolate KMD based scheduling code to a config
CONFIG_NVS_KMD_BACKEND. This shall make it easier to remove this code
later.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I9dc668e0fa3e7706c111fda7a5e2415e1fc0dd03
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2769465
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-10-10 14:07:37 -07:00
Sagar Kamble
6d836becf5 gpu: nvgpu: retry unbind when force killing the channel
If NEXT bit remains set for a channel being unbound, it can lead to
MMU fault of type unbound inst block. When userspace is closing the
channel and NEXT bit is set, userspace retries.

When force killing the channel, nvgpu can retry few iterations to
ensure the channel is truly idle and unbound. If the channel is
really stuck then unbind will fail and TSG will be aborted.

Bug 3800844

Change-Id: I8fb024630ff2dd272245ae27116f3db6d6e0f788
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2787533
(cherry picked from commit 99e39f4b387743a93b05ba4b097c33b23fbbcf68)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2786479
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-10 08:17:12 -07:00
vivekku
5bb56723be gpu: nvgpu: gsp: Create functions to pass nvs data to gsp firmware
Changes:
- created functions to populate gsp interface data from nvs and runlist
structures.
- Handled both user domains and shadow domains.
- Provided support for four engines from two.

NVGPU-8531

Signed-off-by: vivekku <vivekku@nvidia.com>
Change-Id: I1d9ec9ded8a9b47a5b2a00c44dacbab22e3b04b1
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2743596
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-05 06:18:18 -07:00
Sagar Kamble
f1896e0a64 gpu: nvgpu: acquire tsg ctx_init_lock when changing ctx state
GR context associated with channel is updated in various driver paths.
Sequence to do the same is disable the TSG, preempt the TSG, update
the GR context or instance block and then enable the TSG.
These operations and runlist updates for channel have to be done under
TSG specific ctx_init_lock to avoid the race.

suspend_contexts and resume_contexts needs special handling which is
not covered in this patch.

Bug 3677982

Change-Id: I837257fe9d9ef3eb6f69f5d7e0707e0bb6d4ea72
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720222
Reviewed-by: Scott Long <scottl@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-08 21:00:36 -07:00
Sagar Kamble
ef99d9f010 gpu: nvgpu: implement scg, pbdma and cilp rules
Only certain combination of channels of GFX/Compute object classes can
be assigned to particular pbdma and/or VEID. CILP can be enabled only
in certain configs. Implement checks for the configurations verified
during alloc_obj_ctx and/or setting preemption mode.

Bug 3677982

Change-Id: Ie7026cbb240819c1727b3736ed34044d7138d3cd
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719995
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-08 21:00:30 -07:00
Sagar Kamble
693305c0fd gpu: nvgpu: subcontext add/remove support
Subcontext PDBs and valid mask in the instance blocks of the channels
in various subcontexts has to be updated when new subcontext is
created or a subcontext is removed.

Replayable fault state is cached in the channel structure. Replayable
fault state for subcontext is set based on first channel's bind
parameter. It was earlier programmed in function channel_setup_ramfc.

init_inst_block_core is updated to setup TSG level pdb map and mask.

Added new hal gv11b_channel_bind to enable the subcontext on channel
bind.

Bug 3677982

Change-Id: I58156c5b3ab6309b6a4b8e72b0e798d6a39c1bee
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719994
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-08 21:00:20 -07:00
Sagar Kamble
f55fd5dc8c gpu: nvgpu: multiple address spaces support for subcontexts
This patch introduces following relationships among various nvgpu
objects to support multiple address spaces with subcontexts.
IOCTLs setting the relationships are shown in the braces.

nvgpu_tsg             1<---->n nvgpu_tsg_subctx (TSG_BIND_CHANNEL_EX)
nvgpu_tsg             1<---->n nvgpu_gr_ctx_mappings (ALLOC_OBJ_CTX)

nvgpu_tsg_subctx      1<---->1 nvgpu_gr_subctx (ALLOC_OBJ_CTX)
nvgpu_tsg_subctx      1<---->n nvgpu_channel (TSG_BIND_CHANNEL_EX)

nvgpu_gr_ctx_mappings 1<---->n nvgpu_gr_subctx (ALLOC_OBJ_CTX)
nvgpu_gr_ctx_mappings 1<---->1 vm_gk20a (ALLOC_OBJ_CTX)

On unbinding the channel, objects are deleted according
to dependencies.

Without subcontexts, gr_ctx buffers mappings are maintained in the
struct nvgpu_gr_ctx. For subcontexts, they are maintained in the
struct nvgpu_gr_subctx.

Preemption buffer with index NVGPU_GR_CTX_PREEMPT_CTXSW and PM
buffer with index NVGPU_GR_CTX_PM_CTX are to be mapped in all
subcontexts when they are programmed from respective ioctls.

Global GR context buffers are to be programmed only for VEID0.
Based on the channel object class the state is patched in
the patch buffer in every ALLOC_OBJ_CTX call unlike
setting it for only first channel like before.

PM and preemptions buffers programming is protected under TSG
ctx_init_lock.

tsg->vm is now removed. VM reference for gr_ctx buffers mappings
is managed through gr_ctx or gr_subctx mappings object.

For vGPU, gr_subctx and mappings objects are created to reference
VMs for the gr_ctx lifetime.

The functions nvgpu_tsg_subctx_alloc_gr_subctx and nvgpu_tsg_-
subctx_setup_subctx_header sets up the subcontext struct header
for native driver.

The function nvgpu_tsg_subctx_alloc_gr_subctx is called from
vgpu to manage the gr ctx mapping references.

free_subctx is now done when unbinding channel considering
references to the subcontext by other channels. It will unmap
the buffers in native driver case. It will just release the
VM reference in vgpu case.

Note that TEGRA_VGPU_CMD_FREE_CTX_HEADER ioctl is not called
by vgpu any longer as it would be taken care by native driver.

Bug 3677982

Change-Id: Ia439b251ff452a49f8514498832e24d04db86d2f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718760
Reviewed-by: Scott Long <scottl@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-08 20:59:59 -07:00
Sagar Kamble
b69c035520 gpu: nvgpu: init golden context image with nvgpu VEID0 channel
With subcontexts support added, nvgpu has to allocate VEID0 channel
itself to initialize the golden context image. Allocate the channel
and init the golden context image at the beginning of alloc_obj_ctx
call for first user channel.

It can't be initialized at the end of probe as tpc pg settings need
to be updated before golden context image is initialized.

Bug 3677982

Change-Id: Ia82f6ad6e088c2bc1578a6bd32b7c7a707a17224
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2756289
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-08-31 20:25:11 -07:00
Debarshi Dutta
143034daab gpu: nvgpu: modify wait_pending
The wait_pending HAL is now modified to simply
check the pending status of a given runlist.
The while loop is removed from this HAL.

A new function nvgpu_runlist_wait_pending_legacy() is
added that emulates the older wait_pending() HAL.

nvgpu_runlist_tick() is modified to accept a 64 bit
"preempt_grace_ns" value.

These changes prepare for upcoming control-fifo parser
changes.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: If3f288eb6f2181743c53b657219b3b30d56d26bc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-30 23:45:43 -07:00
Debarshi Dutta
42beb7f4db gpu: nvgpu: simplify the runlist update sequence
Following changes are added here to simplify the overall
sequence.

1) Remove deferred update for runlists. NVS worker thread
shall submit the updated runlist.

2) Moved Runlist mem swap inside update itself. Protect
the swap() and hw_submit() path with a spinlock. This
is temporary till GSP.

3) Enable Control-Fifo mode from nvgpu driver.

Jira NVGPU-8609

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icc52e5d8ccec9d3653c9bc1cf40400fc01a08fde
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757406
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-20 23:33:45 -07:00
Debarshi Dutta
13699c4c15 gpu: nvgpu: ensure worker thread is disabled during rg
A previous commit ID 44b6bfbc1 added a hack to prevent
the worker thread from calling nvgpu_runlist_tick()
in post_process if the next domain matches the previous.

This could potentially still face issues with multi-domains
in future.

A better way is to synchronize the thread to suspend/resume
alongwith the device's common OS agnostic suspend/resume
operations. This shall emulate the GSP as well.
This shall also take care of the power constraints
i.e. the worker thread can be expected to always work
with the power enabled and thus we can get rid of the complex
gk20a_busy() lock here for good.

Implemented a state-machine based approach for suspending/
resuming the NVS worker thread from the existing callbacks.

Remove support for NVS worker thread creation for VGPU.
hw_submit method is currently set to NULL for VGPU. VGPU
instead submits its updates via the runlist.reload() method.

Jira NVGPU-8609

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I51a20669e02bf6328dfe5baa122d5bfb75862ea2
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750403
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-08-20 23:33:29 -07:00
Sagar Kamble
7085934303 gpu: nvgpu: update channel TSG unbind fail paths
On tsg.unbind_channel hal failure, channel teardown was being done
again that was done already as part of the function
nvgpu_tsg_unbind_channel_common.

Just abort the TSG and return err in that case. Also, decrement
the TSG ch_count in the fail path.

Bug 3677982

Change-Id: I466f2b2c693d43ed64dc531b08bf740bf00f28a6
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2749970
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
2022-08-10 15:42:34 -07:00
atanand
eae4593343 gpu: nvgpu: add ioctl to configure implicit ERRBAR
Add ioctl support to configure implicit ERRBAR by setting/unsetting
NV_PGRAPH_PRI_GPCS_TPCS_SM_SCH_MACRO_SCHED register.

Add gpu characteritics flag: NVGPU_SCHED_EXIT_WAIT_FOR_ERRBAR_SUPPORTED
to allow userspace driver to determine if implicit ERRBAR ioctl is
supported.

Bug: 200782861

Change-Id: I530a4cf73bc5c844e8d73094d3e23949568fe335
Signed-off-by: atanand <atanand@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718672
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-08-05 23:10:18 -07:00
Sagar Kamble
3fb2a2e209 gpu: nvgpu: track gr_ctx init state
On successful obj_ctx allocation, set ctx_initialized member in gr_ctx
to true and when it is true then only invoke free_gr_ctx.

With this we can get rid of tsg->vm check while calling free_gr_ctx.
tsg->vm will go away with multiple address spaces support in TSG.

Bug 3677982

Change-Id: I4a64842411ce4ab157010808e4e8e4d5cd254a7f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2746803
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Scott Long <scottl@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-19 10:32:35 -07:00
Debarshi Dutta
1bdca92c50 gpu: nvgpu: modify rl_domain member
KMD needs to send the domain id and GPU_VA corresponding
to the struct runlist_domains to GSP. In the current
implementation, struct nvgpu_runlist_domain contains
the domain name instead of domain id. This requires
an additional search by name everytime an update
is needed to be submitted to the GSP.

Modify the struct nvgpu_runlist_domain to store domain id
instead of domain name. This simplifies the flow and avoids
unnecessary search.

Removed the conditional check for existence of shadow domain
as its a deadcode. Shadow Domain is not searchable in the list
of domains inside the struct nvgpu_runlist.

Jira NVGPU-8610

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0d67cfa93d89186240290e933aa750702b14f4f0
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2744890
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-15 15:15:30 -07:00
Sagar Kamble
d75473a115 gpu: nvgpu: fix unit test traceability issues
Some of the functions with no traceability to unit tests are already
covered by callee API functions. Skip these functions in SWVR by
skipping doxygen for them.

Some of the functions are non-fusa like those in profile.h and
bsearch.h. Those were included as the header was included in
Doxygen sources. Mark then non-safe.

Some of the nvgpu functions were not added to Targets entries for
respective tests. Fix those.

JIRA NVGPU-7211

Change-Id: Iacf22dccdd9340100cf93814566d3979734c455d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2612982
(cherry picked from commit a40f62654747102cc8ef53ddbd9f953c21c2b745)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2737672
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-15 07:15:34 -07:00
Sagar Kamble
f95cb5f4f8 gpu: nvgpu: maintain ctx buffers mappings separately from ctx mems
In order to maintain separate mappings of GR TSG and global context
buffers for different subcontexts, we need to separate the memory
struct and the mapping struct for the buffers. This patch moves
the mappings of all GR ctx buffers to new structure
nvgpu_gr_ctx_mappings.

This will be instantiated per subcontext in the upcoming patches.

Summary of changes:
  1. Various context buffers were allocated and mapped separately.
     All TSG context buffers are now stored in gr_ctx->mem[] array
     since allocation and mapping is unified for them.
  2. Mapping/unmapping and querying the GPU VA of the context
     buffers is now handled in ctx_mappings unit. Structure
     nvgpu_gr_ctx_mappings in nvgpu_gr_ctx holds the maps.
     On ALLOC_OBJ_CTX this struct is instantiated and deleted
     on free_gr_ctx.
  3. Introduce mapping flags for TSG and global context buffers.
     This is to map different buffers with different caching
     attribute. Map all buffers as cacheable except
     PRIV_ACCESS_MAP, RTV_CIRCULAR_BUFFER, FECS_TRACE, GR CTX
     and PATCH ctx buffers. Map all buffers as privileged.
  4. Wherever VM or GPU VA is passed in the obj_ctx allocation
     functions, they are now replaced by nvgpu_gr_ctx_mappings.
  5. free_gr_ctx API need not accept the VM as mappings struct
     will hold the VM. mappings struct will be kept in gr_ctx.
  6. Move preemption buffers allocation logic out of
     nvgpu_gr_obj_ctx_set_graphics_preemption_mode.
  7. set_preemption_mode and gr_gk20a_update_hwpm_ctxsw_mode
     functions need update to ensure buffers are allocated
     and mapped.
  8. Keep the unit tests and documentation updated.

With these changes there is clear seggregation of allocation and
mapping of GR context buffers. This will simplify further change
to add multiple address spaces support. With multiple address
spaces in a TSG, subcontexts created after first subcontext
just need to map the buffers.

Bug 3677982

Change-Id: I3cd5f1311dd85aad1cf547da8fa45293fb7a7cb3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2712222
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-15 07:10:11 -07:00
Sagar Kamble
80efe558b1 gpu: nvgpu: add BVEC test for nvgpu_rc_pbdma_fault
Update nvgpu_rc_pbdma_fault with invalid checks and add BVEC test
for it.

Make ga10b_fifo_pbdma_isr static.

NVGPU-6772

Change-Id: I5485760c53e1fff1278557a5b25659a1fc0e4eaf
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551617
(cherry picked from commit e917042d395d07cb902580bad3d5a7d0096cc303)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623625
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-14 08:58:31 -07:00
Debarshi Dutta
d8e8eb65d3 nvgpu: gpu: separate runlist submit from construction
This patch primary separates runlist modification from
runlist submits.

Instead of submitting the runlist(domain) immediately after
modification, a worker thread interface is now being used to
synchronously schedule runlist submits. If the runlist being
scheduled is currently active, the submit happens instantly,
otherwise, it will happen in the next iteration when the nvs
thread will schedule the domain. This external interface uses
a condition variable to wait for the completion of the
synchronous submits.

A pending_update variable is used to synchronize domain memory
swaps just before being submitted.

To facilitate faster scheduling via the NVS thread, nvgpu_dom
itself contains an array of rl_domain pointers. This can then
be used to select the appropriate rl_domain directly for scheduling
as against the earlier approach of maintaining nvs domains and rl
domains in sync everytime.

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-13 16:36:19 -07:00
Sagar Kamble
5b55088970 gpu: nvgpu: skip subctx pdb init during as-channel bind
While creating a new channel, ioctls are called in the below sequence:
  1. GPU_IOCTL_OPEN_CHANNEL
  2. AS_IOCTL_BIND_CHANNEL
  3. TSG_IOCTL_BIND_CHANNEL_EX
  4. CHANNEL_ALLOC_GPFIFO_EX
  5. CHANNEL_ALLOC_OBJ_CTX.

subctx pdbs and valid mask are programmed in the channel instance block
in the channel ioctls AS_IOCTL_BIND_CHANNEL & CHANNEL_ALLOC_GPFIFO_EX.

Programming them in the ioctl AS_IOCTL_BIND_CHANNEL is redundant.
Remove related hal g->ops.mm.init_inst_block_for_subctxs.

The hal init_inst_block will program context pdb and big page size.
The hal init_inst_block_core will program context pdb, big page size
and subctx 0 pdb. This is used by h/w units (fecs, pmu, hwpm, bar1,
bar2, sec2, gsp, perfbuf etc.).

For user channels, subctx pdbs are programmed as part of ramfc setup.

Bug 3677982

Change-Id: I6656b002d513404c1fd7c3d349933e80cca7e604
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680907
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-06-28 23:33:31 -07:00
Krishna Reddy
961925be02 Revert "gpu: nvgpu: correct usage for gk20a_busy_noresume"
This reverts commit c1ea9e3955.

Reason for revert: ap_vulkan, ap_opengles, ap_mods tests failures
Bug 3661058
Bug 3661080 
Bug 3659004 

Change-Id: I929b5675a4fb0ddc8cbf3eeefc982b4ba04ddc59
Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718996
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
2022-05-27 14:49:26 -07:00
Debarshi Dutta
c1ea9e3955 gpu: nvgpu: correct usage for gk20a_busy_noresume
Background: In case of a deferred suspend implemented by gk20a_idle,
the device waits for a delay before suspending and invoking
power gating callbacks. This helps minimize resume latency for any
resume calls(gk20a_busy) that occur before the delay.

Now, some APIs spread across the driver requires that if the device
is powered on, then they can proceed with register writes, but if its
powered off, then it must return. Examples of such APIs include
l2_flush, fb_flush and even nvs_thread. We have relied on
some hacks to ensure the device is kept powered on to prevent any such
delayed suspension to proceed. However, this still raced for some calls
like ioctl l2_flush, so gk20a_busy() was added (Refer to commit Id
dd341e7ecbaf65843cb8059f9d57a8be58952f63)

Upstream linux kernel has introduced the API pm_runtime_get_if_active
specifically to handle the corner case for locking the state during the
event of a deferred suspend.

According to the Linux kernel docs, invoking the API with
ign_usage_count parameter set to true, prevents an incoming suspend
if it has not already suspended.

With this, there is no longer a need to check whether
nvgpu_is_powered_off(). Changed the behavior of gk20a_busy_noresume()
to return bool. It returns true, iff it managed to prevent
an imminent suspend, else returns false. For cases where
PM runtime is disabled, the code follows the existing implementation.

Added missing gk20a_busy_noresume() calls to tlb_invalidate.

Also, moved gk20a_pm_deinit to after nvgpu_quiesce() in
the module removal path. This is done to prevent regs access
after registers are locked out at the end of nvgpu_quiesce. This
can happen as some free function calls post quiesce  might still
have l2_flush, fb_flush deep inside their stack, hence invoke
gk20a_pm_deinit to disable pm_runtime immediately after quiesce.

Kept the legacy implementation same for VGPU and
older kernels

Jira NVGPU-8487

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I972f9afe577b670c44fc09e3177a5ce8a44ca338
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715654
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-25 04:59:46 -07:00
Debarshi Dutta
76cc8870e1 nvgpu: gpu: update default nvs domain implementation
In current form, the default domain acts like any schedulable
domain. TSGs are bound to it and it can be enumerated via the
public interfaces.

The new expectation for the default domain is meant to change
from the current form to a pseudo domain that cannot act like
an ordinary domain in other ways, i.e. it must not be reachable
by in particular the domain management API, it can't be removed,
does not show up in lists, and TSGs cannot be explicitly bound to
this domain. It won't participate in round-robin domain scheduling.
It is not really a domain, and acts like one only when activated in
the manual mode.

Following changes are made overall to support the above change in
definition.

1) Domain creation and attaching the domain to the scheduler are now
split into two separate functions. The new default domain (having ID
= UINT64_MAX) is created separately from a static function without
linking it with other domains in the scheduler.

2) struct nvgpu_nvs_scheduler explicitely stores the default domain
to support direct lookups.

3) TSGs are initially not bound to default domain/rl_domain.

Jira NVGPU-8165

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-12 00:24:58 -07:00
Debarshi Dutta
26525cb1cf gpu: nvgpu: runlist changes for default domain implementation
In order to support the concept of the default domain, a new
rl domain is created that shadows all the other domains i.e.
all channels of all TSGs are replicated here. This is scheduled
by default during GPU boot.

1) The shadow rl_domain is constructed during poweron sequence via
nvgpu_runlist_alloc_shadow_rl_domain(). struct nvgpu_runlist
is appended to store this separately as 'shadow_rl_domain'.
This is scheduled in background as long as no other user created
rl domains exist.

2) 'shadow_rl_domain' is scheduled out once user created rl domain
exist. At this point, any updates in the user created rl domains
are synchronized with the 'shadow_rl_domain'. i.e. 'shadow_rl_domain'
is also reconstructed to contain active channels and tsgs from the rl
domain.

3) 'shadow_rl_domain' is scheduled back in when the last user created
rl domain is removed.

4) In future for manual mode, driver shall support explicitely
   switching to 'shadow_rl_domain'. Also, we will move to an
   implementation where 'shadow_rl_domain' is switched out only when
   other domains are actively scheduled.  These changes will be
   implemented later.

Jira NVGPU-8165

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Ia6a07d6bfe90e7f6c9e04a867f58c01b9243c3b0
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704702
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-12 00:24:46 -07:00
Sagar Kamble
d82400d2b8 gpu: nvgpu: fix MISRA Rule 5.1 violation
BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault
started reporting below MISRA issue.

kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321:
  1. misra_c_2012_rule_5_1_violation: Declaration with identifier
     "nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous.
kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349:
  2. other_declaration: The first 31 characters of identifiers
     "nvgpu_tsg_unbind_channel_check_ctx_reload" and
     "nvgpu_tsg_unbind_channel_check_hw_state" are identical.

Do below renames to fix the issue. Doing both for consistency.

s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check
s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check

JIRA NVGPU-6772

Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152
(cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-11 04:18:12 -07:00
Richard Zhao
1ce899ce46 gpu: nvgpu: fix compile error of new compile flags
Preparing to push hvrtos gpu server changes which requires bellow CFLAGS:
        -Werror -Wall -Wextra \
        -Wmissing-braces -Wpointer-arith -Wundef \
        -Wconversion -Wsign-conversion \
        -Wformat-security \
        -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough

Jira GVSCI-11640

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I25167f17f231ed741f19af87ca0aa72991563a0f
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2653746
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-07 15:11:49 -07:00
Jinesh Parakh
d4cb2eb3c0 gpu: nvgpu: Fix Dereference Coverity issues
Fixed following Coverity Defects:

fw.c : Dereference after null check
channel.c : Dereference before null check
log.c : Dereference before null check

CID 10064128
CID 10056456
CID 10127934

Bug 3460991

Signed-off-by: Jinesh Parakh <jparakh@nvidia.com>
Change-Id: I9c075f5c38c2254d5c656af58bb002714bd53396
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685320
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-28 07:36:10 -07:00
Dinesh T
e4cf52123f gpu: nvgpu: Add ce halt function
This is adding CE halt fuction to reset CE properly
by setting stall req and waiting for stallack.

Bug 200641946

Change-Id: I501ccf68a4f6fe95911e73fa2eb65bde93a9f3e9
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678366
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-11 20:44:38 -08:00
srajum
8381647662 gpu: nvgpu: fixing MISRA violations
- MISRA Directive 4.7
  Calling function "nvgpu_tsg_unbind_channel(tsg, ch, true)" which returns
  error information without testing the error information.

- MISRA Rule 10.3
  Implicit conversion from essential type "unsigned 64-bit int" to different
  or narrower essential type "unsigned 32-bit int"

- MISRA Rule 5.7
  A tag name shall be a unique identifier

JIRA NVGPU-5955

Change-Id: I109e0c01848c76a0947848e91cc6bb17d4cf7d24
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572776
(cherry picked from commit 073daafe8a11e86806be966711271be51d99c18e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678681
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-10 16:01:18 -08:00
srajum
069fe05dca gpu: nvgpu: remove whitelisting for wrongly reported violations by tool
- Earlier we whitelisted wrongly reported static analysis violations
  by tool, raised coverity tool bugs for these cases.

- These bugs are fixed with new version of tool, so no need fo whitelisting.

JIRA NVGPU-7119

Change-Id: Ib2341db0d46fa7fac4c0cc9a6c1bdc8704377ef1
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2604365
(cherry picked from commit dc2d8ddaa409aefe0e04e0bacb3a8a977f6dbd64)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2677523
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-10 16:01:06 -08:00
Konsta Hölttä
2ab6184955 gpu: nvgpu: debug dump tsg domain name
Include the scheduling domain name in the channel debug dump. The domain
name of a channel is the domain name of its parent TSG, if any. Copy
just the name into the dump info to avoid refcounting concerns.

While at it, reword the deterministic flag for less ambiguity.

Jira NVGPU-6791

Change-Id: I06041277f938e20f23de9aa419cfffbaa028035e
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673101
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-01 00:09:12 -08:00
Konsta Hölttä
f10ee4ab0e gpu: nvgpu: add domain name API
Add nvgpu_nvs_domain_get_name() to minimize messing up with nvs
internals and to help code organization when nvs is not built in yet. A
stub to help compilation returns NULL because no domains can exist when
the stub is built in, and thus it won't be used.

Jira NVGPU-6788

Change-Id: If663f7c0e8434ef00dd3a3f40f6404a35b477f2b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673120
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 00:09:01 -08:00
Konsta Hölttä
2a8914619d gpu: nvgpu: bind sched domains as fds
Replace id-based lookup with fd-based lookup when binding a TSG to a
domain. The device node based domain interface naturally provides access
control; this way userspace tools can limit which uid/gid can access
each domain.

Also, explicitly disallow binding channels to a TSG that has no runlist
domain yet. Normally a TSG is in the default domain if nothing else has
been specified, but the default domain can be deleted.

Jira NVGPU-6788

Change-Id: I2af96dfc002367d894eaf0c175006332f790c27f
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651165
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 00:08:55 -08:00
Shashank Singh
5ec241a1d8 gpu: nvgpu: remove non stall intr from top handler for safety
On safety nonstall interrupt is not used and should be compiled out to
rule out any chance of interference with safety code. Remove top handler
support of nonstall interrupt for safety which is currently not
applicable to linux.

Jira NVGPU-7066
Jira NVGPU-4078

Change-Id: I278efc8da6ddd0f22c128af6630cfd1b20ba4784
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589006
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671586
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-21 02:31:38 -08:00
Shashank Singh
19a3b86f06 gpu: nvgpu: remove unused code from common.nvgpu on safety build
- remove unused code from common.nvgpu unit on safety build. Also,
remove the code which uses them in other places.
- document use of compiler intrinsics as mandated in code inspection
  checklist.

Jira NVGPU-6876

Change-Id: Ifd16dd197d297f56a517ca155da4ed145015204c
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561584
(cherry picked from commit 900391071e9a7d0448cbc1bb6ed57677459712a4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561583
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 04:58:32 -08:00
Konsta Hölttä
359e83b45a gpu: nvgpu: tsg: release default nvs domain ref
A reference to the default scheduling domain is taken when a TSG is
opened. Although the explicit bind is designed to support only one bind,
the TSG is bound to the default one implicitly at that point. Release
the reference to avoid leaking it.

The domain might be null at that point if the default domain has been
removed; in that case there's just no domain to put back.

Change-Id: I7db43f7bbb2a8c86c391280eb7fa32431c8982da
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2663420
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-06 10:09:34 -08:00
Konsta Hölttä
8736c0d467 gpu: nvgpu: add and use sw-only timers
The nvgpu timeout API has an internal override for presilicon mode by
default: in presi simulation environments the timeouts never trigger.
This behaviour is intended in the original usecase of the timer unit
with hardware polling loops. In pure software logic though, the timer
must trigger after the specified timeout even in presi mode so add a new
init function to produce a timer for software logic. Use this new kind
of timer in channel and scheduling worker threads.

The channel worker currently times out for just the purpose of the
channel watchdog timer which has its own internal timer. Although that's
just software, the general expectation is that the watchdog does not
trigger in presilicon tests that run slower than usual. The internal
watchdog timer thus keeps the non-sw mode.

Bug 3521828

Change-Id: I48ae8522c7ce2346a930e766528d8b64195f81d8
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662541
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-04 22:02:33 -08:00
Sagar Kamble
29a0a146ac gpu: nvgpu: fix coverity defects
Fix following coverity defects:
  ioctl_prof.c resource leak
  ioctl_dbg.c logically dead code
  global_ctx.c identical code for branches
  therm_dev.c resource leak
  pmu_pstate.c unused value
  nvgpu_mem.c dead default in switch
  tsg.c Dereference before null check
  nvlink_gv100.c logically dead code
  nvlink.c Out-of-bounds write
  fifo_vgpu.c Dereference null return value
  pmu_pg.c Dereference before null check
  fw_ver_ops.c Identical code for different branches
  boardobjgrp.c Dereference after null check
  boardobjgrp.c Dereference before null check
  boardobjgrp.c Dereference after null check
  engines.c Dereference before null check
  nvgpu_init.c Unused value

CID 10127875
CID 10127820
CID 10063535
CID 10059311
CID 10127863
CID 9875900
CID 9865875
CID 9858045
CID 9852644
CID 9852635
CID 9852232
CID 9847593
CID 9847051
CID 9846056
CID 9846055
CID 9846054
CID 9842821

Bug 3460991

Change-Id: I91c215a545d07eb0e5b236849d5a8440ed6fe18d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2657444
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-28 04:50:12 -08:00
Richard Zhao
9ab1271269 gpu: nvgpu: common: fix compile error of new compile flags
It's preparing to add bellow CFLAGS:
    -Werror -Wall -Wextra \
    -Wmissing-braces -Wpointer-arith -Wundef \
    -Wconversion -Wsign-conversion \
    -Wformat-security \
    -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough

Jira GVSCI-11640

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ia8f508c65071aa4775d71b8ee5dbf88a33b5cbd5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555056
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-13 12:36:14 -08:00
Konsta Hölttä
55afe1ff4c gpu: nvgpu: improve nvs uapi
- Make the domain scheduler timeslice type nanoseconds to future proof
  the interface
- Return -ENOSYS from ioctls if the nvs code is not initialized
- Return the number of domains also when user supplied array is present
- Use domain id instead of name for TSG binding
- Improve documentation in the uapi headers
- Verify that reserved fields are zeroed
- Extend some internal logging
- Release the sched mutex on alloc error
- Add file mode checks in the nvs ioctls. The create and remove ioctls
  require writable file permissions, while the query does not; this
  allows filesystem based access control on domain management on the
  single dev node.

Jira NVGPU-6788

Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-12-15 06:05:25 -08:00