Commit Graph

36 Commits

Author SHA1 Message Date
Richard Zhao
a587d94f5a gpu: nvgpu: init nvs scheduler for vf
nvs does not have a clean cut. runlist submit path uses nvs worker no
matter whether the feature is enabled.

Jira GVSCI-15773

Change-Id: I6f6db1e766b8079ad6ca4a6b530b3ec27094f840
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2863443
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-03-21 02:31:21 -07:00
prsethi
505690f505 gpu: nvgpu: add validation check for domain name
Currently there is no validation checks for domain name used in domain
create command which can cause some security risk.
Patch enable the validation for domain name by only allowing char from
([a-z], [A-Z], [0-9], -, _) list.

Bug 3994374

Change-Id: Ia2cb6f533ed136e74e7a72934ad5267803d1236d
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2871515
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-03-17 04:17:14 -07:00
vivekku
a2a86eed27 gpu: nvgpu: gsp: migration from KMD to GSP
Changes:
- submit shadow domain for legacy used cases in case user domain is not
present.
- disabling config flags for KMD to submit user domain.

Bug 3935433
NVGPU-9664

Change-Id: I498226df36d0b482d1af369526adb369d921b6ca
Signed-off-by: vivekku <vivekku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2843968
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-03-17 03:55:20 -07:00
vivekku
35960f8f40 gpu: nvgpu: gsp: call runlist update and send ctrl fifo info
Changes:
- function calls to add and delete domains
- updating runlist
- integrating control fifo changes with ioctls to send queue info to GSP
FW

Bug 3884011

Change-Id: I5ad29eb9501cc2df66843c074ee6a00aae91af23
Signed-off-by: vivekku <vivekku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2826482
Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2023-03-17 03:55:08 -07:00
prsethi
6b2c080f8f gpu:nvgpu: add enable flag for KMD_SCHEDULING_WORKER_THREAD support
Currently KMD_SCHEDULING_WORKER_THREAD can be enabled/disabled using
compile time flag but this flag does give ability to control the
feature based on the chip.
GSP is enabled only on ga10b where KMD_SCHEDULING_WORKER_THREAD should
be disabled while should be enabled for other chips at the same time
to support GVS tests.
Change adds enabled flag to control KMD_SCHEDULING_WORKER_THREAD based
on the chip.

Bug 3935433

Change-Id: I9d2f34cf172d22472bdc4614073d1fb88ea204d7
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2867023
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-03-17 03:55:02 -07:00
prsethi
8c710694e8 gpu:nvgpu: fix for consecutive domain submission
When a user-domain gets removed, tsg belongs to this also gets removed
and runlist update happens accordingly. If same tsg was submitted to gpu
then updated runlist also needs to re-submit. This works fine with the
existing legacy cases but if GPU is running the shadow domain submitted
by manual mode scheduler and domain belongs to this gets removed then
updated runlist is not being submitted to GPU. This runlist buffer
inconsistency causes mmu fault later.

This change adds a "remove" field in the runlist domain which gets set
to true when runlist update happens for the channel removal. Later
worker thread submit the updated runlist if this flag set to true.

Bug 3884011

Change-Id: I3ce08a5a281e20661915746e70ac0dcd711f3f38
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2838808
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2023-01-09 12:26:12 -08:00
Debarshi Dutta
63e8de5106 gpu: nvgpu: Remove NVGPU_SUPPORT_NVS_CTRL_FIFO
Now that we are planning to enable CTRL_FIFO support with NVS,
there is no need for a separate enabled flag for the same.

CTRL_FIFO support is instead determined by the presence of
NVGPU_SUPPORT_NVS enable flag alone.

For non-auto platforms, Control-Fifo can be disabled by restricting
access to /dev/nvsched_ctrl_fifo.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I9dbec60e5668f38e1460c43800584e88b16a2550
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2814435
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-24 00:47:37 -08:00
Debarshi Dutta
a8bdb67b2e gpu: nvgpu: add doxygen comments for NVS
Add doxygen comments for Domain Management APIs
of NVS.

Added NULL handling where required.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I23f45b95c070c8249bb83a336239b2b2d1a852a4
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2805043
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-24 00:39:21 -08:00
Debarshi Dutta
5d2dfc88a3 gpu: nvgpu: Replace CONFIG_NVS_KMD_BACKEND
Use CONFIG_KMD_SCHEDULING_WORKER_THREAD instead of
CONFIG_NVS_KMD_BACKEND to remove confusion about the CPU based
KMD scheduling worker thread.

The KMD based scheduling worker thread caters to both Manual Mode
CPU based scheduler as well as Automatic Round Robin CPU based
scheduler.

For the traditional submit path, add correct handling of the
CONFIG_NVS_PRESENT. CPU based worker thread should be part of
CONFIG_NVS_PRESENT. Eventually, when DCONFIG_KMD_SCHEDULING_WORKER_THREAD
is removed, the application must switch to GSP.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0886ef3b2e0124b6fe22c2bf0bf7d1fa98039d00
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2810217
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-23 08:07:24 -08:00
Debarshi Dutta
53aa8b7244 gpu: nvgpu: disable multi-domain RR scheduling for NVS
Disable multi-domain RR scheduling for NVS code for auto platforms
and keep it enabled only for L4T.

Bug 3839407

Change-Id: I7c7977f83f72756a9c5e8f9a9f22dde2c7766fb9
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2802937
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-11-17 07:16:03 -08:00
Debarshi Dutta
280b69e66d nvgpu: userspace: add unit test for nvs
Add a unit test to add verification for S/W parts of
NVGPU-KMD based scheduler

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I266cb4167074dc5f7da647ce627e96188fc6bdcb
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2767591
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-10 14:08:03 -07:00
Debarshi Dutta
b2e3810514 gpu: nvgpu: add support for manual mode
NVS worker thread is changed to support manual mode
exclusively with multi-domain round-robin scheduling.

If control-fifo is enabled, NVS worker thread parses the
ring buffer.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icc78e0749d5e4ebdb52f0c503ec303947011b163
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757241
Reviewed-by: Vivek Kumar (SW-TEGRA) <vivekku@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-10 14:07:58 -07:00
Debarshi Dutta
17dc483a6b gpu: nvgpu: enclose NVS KMD inside a config
Use CONFIG_NVS_KMD_BACKEND to enclose all NVS KMD based scheduling
code.

Current configuration contains all the scheduling code managed within
CONFIG_NVS_PRESENT. Eventually, scheduling code shall only use GSP.
Hence, isolate KMD based scheduling code to a config
CONFIG_NVS_KMD_BACKEND. This shall make it easier to remove this code
later.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I9dc668e0fa3e7706c111fda7a5e2415e1fc0dd03
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2769465
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-10-10 14:07:37 -07:00
vivekku
5bb56723be gpu: nvgpu: gsp: Create functions to pass nvs data to gsp firmware
Changes:
- created functions to populate gsp interface data from nvs and runlist
structures.
- Handled both user domains and shadow domains.
- Provided support for four engines from two.

NVGPU-8531

Signed-off-by: vivekku <vivekku@nvidia.com>
Change-Id: I1d9ec9ded8a9b47a5b2a00c44dacbab22e3b04b1
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2743596
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-10-05 06:18:18 -07:00
ht
125cc72c39 gpu: nvgpu: Fix devg_nvgpu_igpu process crash-2.
As part of the negative test case we replace the ACR binaries with
corrupted one(by editing the binary in hex editor). The expectaion
is that the process should log the error and exit properly but instead
the process crashed.
The root cause was because NVGPU driver was trying to pause the thread
using nvgpu_nvs_worker_pause but the but NVS isn't initialized at that
point. NVS is initialized after acr init.

Mitigated this failure by adding a checking condition in
nvgpu_nvs_worker_pause.

Bug 3670576

Change-Id: Ibfe66b253be034e7ca2c3ed298dc28d27e1d6de9
Signed-off-by: ht <ht@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2782937
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-29 15:06:46 -07:00
Debarshi Dutta
667867a199 gpu: nvgpu: Resolve failed cond init.
Following changes are added to fix the issue.

1) Threads having higher priority e.g. RT may preempt
threads with sched-normal priority. As a consequence, higher priority
threads might not still see initialization of data in another thread
resulting in failures such as accessing a condition value before initialization.

Any initialization in the parent thread must be accompanied by a barrier
to make it visible in other thread. Added appropriate barriers to prevent
reordering of the initialization in the thread construction path.

2) There is a race condition between nvgpu_cond_signal() and
nvgpu_cond_destroy() in the asynchronous submit code and corresponding
worker thread's process_item callback for NVS. This may lead to
data corruption and resulting in the above errors as well. Fixed
that by adding a refcount based mechanism for ownership sharing
of the struct nvgpu_nvs_worker_item between the two threads.

Bug 3778235

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Ie9b9ba57bc1dcbb8780801be79863adc39690f72
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2771535
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Ketan Patil <ketanp@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-27 23:25:55 -07:00
Debarshi Dutta
50f95f789c gpu: nvgpu: improvements to NVS code
Fix the bug in NVS worker initialization code. Ensure main thread
waits for NVS worker to start.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I2a719bad691099881f3ac4468d32f9e81ece3800
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2773376
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
2022-09-14 09:40:16 -07:00
Debarshi Dutta
143034daab gpu: nvgpu: modify wait_pending
The wait_pending HAL is now modified to simply
check the pending status of a given runlist.
The while loop is removed from this HAL.

A new function nvgpu_runlist_wait_pending_legacy() is
added that emulates the older wait_pending() HAL.

nvgpu_runlist_tick() is modified to accept a 64 bit
"preempt_grace_ns" value.

These changes prepare for upcoming control-fifo parser
changes.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: If3f288eb6f2181743c53b657219b3b30d56d26bc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-30 23:45:43 -07:00
prsethi
e4d1a739da gpu: nvgpu: nvs: plug nvs with safety code
- Change enables CONFIG_NVS_PRESENT for safety build.
- Fixes misra vioations.
- Renames sched.h to nvs_sched.h to avoid the conflict with QNX system
sched.h file for the safety support.
- Disable test_channel_close, test_tsg_unbind_channel,
test_channel_enable_disable_tsg, test_gv11b_fifo_preempt_tsg,
test_tsg_unbind_channel_check_hw_state and test_rc_deinit unit tests.

Jira NVGPU-8619

Change-Id: I7c983de2f4910fcb23687ec23368a060ce89c918
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2763579
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-29 17:31:03 -07:00
prsethi
440cf0c75e gpu: nvgpu: nvs: supporting changes to plug nvs with QNX
- Remove '()' from logging macro to fix compilation issue with QNX
- Add NSEC_PER_MSEC which is missing for QNX.

Jira NVGPU-8619

Change-Id: I0bc5c5a9c6979a0a78e29d26a40ca7927b25e5d0
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2754721
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-29 17:30:45 -07:00
Debarshi Dutta
42beb7f4db gpu: nvgpu: simplify the runlist update sequence
Following changes are added here to simplify the overall
sequence.

1) Remove deferred update for runlists. NVS worker thread
shall submit the updated runlist.

2) Moved Runlist mem swap inside update itself. Protect
the swap() and hw_submit() path with a spinlock. This
is temporary till GSP.

3) Enable Control-Fifo mode from nvgpu driver.

Jira NVGPU-8609

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icc52e5d8ccec9d3653c9bc1cf40400fc01a08fde
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757406
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-20 23:33:45 -07:00
Debarshi Dutta
1d4b7b1c5d gpu: nvgpu: modify priority of NVS worker thread
In linux threaded interrupts run with a Realtime priority
of 50. This bumps up the priority of bottom-half handlers
over regular kernel/User threads even during process
context.

In the current implementation scheduler thread still
runs in normal kernel thread priority. In order to
allow a seamless scheduling experience, the worker
thread is now created with a Realtime priority of 1.
This allows for the Worker thread to work at a priority
lower than interrupt handlers but higher than the regular
kernel threads.

Linux kernel allows setting priority with the help of
sched_set_fifo() API. Only two modes are supported
i.e. sched_set_fifo() and sched_set_fifo_low().

For more reference, refer to this article
https://lwn.net/Articles/818388/.

Added an implementation of nvgpu_thread_create_priority()
for linux thread using the above two APIs.

Jira NVGPU-860

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0a5a611bf0e0a5b9bb51354c6ff0a99e42e76e2f
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2751736
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-08-20 23:33:34 -07:00
Debarshi Dutta
13699c4c15 gpu: nvgpu: ensure worker thread is disabled during rg
A previous commit ID 44b6bfbc1 added a hack to prevent
the worker thread from calling nvgpu_runlist_tick()
in post_process if the next domain matches the previous.

This could potentially still face issues with multi-domains
in future.

A better way is to synchronize the thread to suspend/resume
alongwith the device's common OS agnostic suspend/resume
operations. This shall emulate the GSP as well.
This shall also take care of the power constraints
i.e. the worker thread can be expected to always work
with the power enabled and thus we can get rid of the complex
gk20a_busy() lock here for good.

Implemented a state-machine based approach for suspending/
resuming the NVS worker thread from the existing callbacks.

Remove support for NVS worker thread creation for VGPU.
hw_submit method is currently set to NULL for VGPU. VGPU
instead submits its updates via the runlist.reload() method.

Jira NVGPU-8609

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I51a20669e02bf6328dfe5baa122d5bfb75862ea2
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750403
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-08-20 23:33:29 -07:00
Debarshi Dutta
44b6bfbc1d gpu: nvgpu: skip post_process() operations for single domain.
There is a possible deadlock that gets triggered when
device is being resumed() and NVS worker thread tries to submit
the data as part of the post_process() operation.

The NVS worker thread works asynchronously in the post_process() part
w.r.t the USER threads and thus an initial implementation requires
acquiring the busy lock() arriving at a deadlock scenario.

This quick change shall disallow post_process() from executing
during the case where we have only one scheduling domain present(legacy)
Any submits meant to be updated are handled via the synchronous
wakeup_process_item() callback.

This implementation is being modified to allow the worker thread
to be suspended/resumed during GPU railgate/unrailgate in upcoming
releases and currently is in a state of flux.

Bug 3723127

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I318cda0fbdd5651884cf21f748c86687679e6fdb
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750293
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-25 18:41:43 -07:00
Debarshi Dutta
1bdca92c50 gpu: nvgpu: modify rl_domain member
KMD needs to send the domain id and GPU_VA corresponding
to the struct runlist_domains to GSP. In the current
implementation, struct nvgpu_runlist_domain contains
the domain name instead of domain id. This requires
an additional search by name everytime an update
is needed to be submitted to the GSP.

Modify the struct nvgpu_runlist_domain to store domain id
instead of domain name. This simplifies the flow and avoids
unnecessary search.

Removed the conditional check for existence of shadow domain
as its a deadcode. Shadow Domain is not searchable in the list
of domains inside the struct nvgpu_runlist.

Jira NVGPU-8610

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0d67cfa93d89186240290e933aa750702b14f4f0
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2744890
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-15 15:15:30 -07:00
Debarshi Dutta
62c03dfaef gpu: nvgpu: add support for nvs control_fifo
Add a device node for management of nvs control fifo buffers for
scheduling domains. The current design consists of a master structure
struct nvgpu_nvs_domain_sched_ctrl for management of users as well
as control queues. Initially all users are added as non-exclusive users.

Subsequent changes will add support for IOCTLS to manage opening of
Send/Receive and Event buffers, querying characteristics etc.

In subsequent changes, a user that tries to open a Send/Receive queue
will first try to reserve itself as an exclusive user and only if that
succeeds can proceed with creation of both Send/Receive queues.

Exclusive users will be reset to non-exclusive users just before they
close their device node handle.

Jira NVGPU-8128

Change-Id: I15a83f70cd49c685510a9fd5ea4476ebb3544378
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2691404
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-15 07:08:22 -07:00
Debarshi Dutta
d8e8eb65d3 nvgpu: gpu: separate runlist submit from construction
This patch primary separates runlist modification from
runlist submits.

Instead of submitting the runlist(domain) immediately after
modification, a worker thread interface is now being used to
synchronously schedule runlist submits. If the runlist being
scheduled is currently active, the submit happens instantly,
otherwise, it will happen in the next iteration when the nvs
thread will schedule the domain. This external interface uses
a condition variable to wait for the completion of the
synchronous submits.

A pending_update variable is used to synchronize domain memory
swaps just before being submitted.

To facilitate faster scheduling via the NVS thread, nvgpu_dom
itself contains an array of rl_domain pointers. This can then
be used to select the appropriate rl_domain directly for scheduling
as against the earlier approach of maintaining nvs domains and rl
domains in sync everytime.

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-13 16:36:19 -07:00
Debarshi Dutta
76cc8870e1 nvgpu: gpu: update default nvs domain implementation
In current form, the default domain acts like any schedulable
domain. TSGs are bound to it and it can be enumerated via the
public interfaces.

The new expectation for the default domain is meant to change
from the current form to a pseudo domain that cannot act like
an ordinary domain in other ways, i.e. it must not be reachable
by in particular the domain management API, it can't be removed,
does not show up in lists, and TSGs cannot be explicitly bound to
this domain. It won't participate in round-robin domain scheduling.
It is not really a domain, and acts like one only when activated in
the manual mode.

Following changes are made overall to support the above change in
definition.

1) Domain creation and attaching the domain to the scheduler are now
split into two separate functions. The new default domain (having ID
= UINT64_MAX) is created separately from a static function without
linking it with other domains in the scheduler.

2) struct nvgpu_nvs_scheduler explicitely stores the default domain
to support direct lookups.

3) TSGs are initially not bound to default domain/rl_domain.

Jira NVGPU-8165

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-12 00:24:58 -07:00
Konsta Hölttä
f10ee4ab0e gpu: nvgpu: add domain name API
Add nvgpu_nvs_domain_get_name() to minimize messing up with nvs
internals and to help code organization when nvs is not built in yet. A
stub to help compilation returns NULL because no domains can exist when
the stub is built in, and thus it won't be used.

Jira NVGPU-6788

Change-Id: If663f7c0e8434ef00dd3a3f40f6404a35b477f2b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673120
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 00:09:01 -08:00
Konsta Hölttä
3a64fdefc4 gpu: nvgpu: domains as files for access control
Create device nodes for user-created scheduling domains. This helps
leverage filesystem based access control: domains can be chosen to be
available for a limited set of users on a system.

The device nodes are dynamic: they can be removed while the driver is
running normally. This is a bit different from the nodes that exist
until the driver is unloaded, so the devno/domain mapping is stored in a
separate list. The usual container_of pattern would suffer from an
unavoidable race condition if a domain file was opened while the same
domain would get removed.

As usual, domain refcounting prevents a domain from being removed. Now
the open device files hold refs and thus any open domain files prevent a
domain from getting removed, in addition to the userspace-invisible ref
that is taken when a TSG is bound to a domain.

While at it, make the query ioctl guarded by the sched domain mutex, as
domains might technically get added or removed during the querying code.

Jira NVGPU-6788

Change-Id: Ief2a09a442c4e70f1f2be8a32359341071d74659
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651164
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 00:08:49 -08:00
Konsta Hölttä
8736c0d467 gpu: nvgpu: add and use sw-only timers
The nvgpu timeout API has an internal override for presilicon mode by
default: in presi simulation environments the timeouts never trigger.
This behaviour is intended in the original usecase of the timer unit
with hardware polling loops. In pure software logic though, the timer
must trigger after the specified timeout even in presi mode so add a new
init function to produce a timer for software logic. Use this new kind
of timer in channel and scheduling worker threads.

The channel worker currently times out for just the purpose of the
channel watchdog timer which has its own internal timer. Although that's
just software, the general expectation is that the watchdog does not
trigger in presilicon tests that run slower than usual. The internal
watchdog timer thus keeps the non-sw mode.

Bug 3521828

Change-Id: I48ae8522c7ce2346a930e766528d8b64195f81d8
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662541
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-04 22:02:33 -08:00
Sagar Kamble
d424598b7b gpu: nvgpu: stop nvs thread during unload
nvs worker thread is created on each resume and deinitialized on every
suspend. nvgpu can be resumed when process is getting killed. Thread
creation can fail when the process is getting killed. That will lead
to driver resume failure.

To avoid the issue above, don't stop the nvs worker thread in suspend
and let the first created thread handle the nvs work always.
Deinitialize the nvs worker thread during nvgpu unload.

Also, log the error returned by nvgpu_thread_create in the function
nvgpu_worker_start.

bug 3480192

Change-Id: I8d5d9e7716a950b162cc3c2d9fcfde07c4edfcf6
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646218
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-12-29 09:35:03 -08:00
Konsta Hölttä
55afe1ff4c gpu: nvgpu: improve nvs uapi
- Make the domain scheduler timeslice type nanoseconds to future proof
  the interface
- Return -ENOSYS from ioctls if the nvs code is not initialized
- Return the number of domains also when user supplied array is present
- Use domain id instead of name for TSG binding
- Improve documentation in the uapi headers
- Verify that reserved fields are zeroed
- Extend some internal logging
- Release the sched mutex on alloc error
- Add file mode checks in the nvs ioctls. The create and remove ioctls
  require writable file permissions, while the query does not; this
  allows filesystem based access control on domain management on the
  single dev node.

Jira NVGPU-6788

Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-12-15 06:05:25 -08:00
Konsta Hölttä
d086c678fd gpu: nvgpu: add domain scheduler worker
Move away from the prototype call in channel wdt worker and create a
separate worker thread for the domain scheduler. The details of runlist
domains are still encapsulated in the runlist code; the domain scheduler
controls when to switch domains. Switching happens based on domain
timeslices or when the current domain is deleted.

The worker thread is paused on railgate and spun back on poweron. The
scheduler data was also left dangling, so fix that by deinitializing all
nvs-related when gk20a_remove_support() is called. The runlist domains
already get freed as part of fifo removal.

Jira NVGPU-6427

Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-14 06:26:16 -08:00
Konsta Hölttä
632644b44a gpu: nvgpu: couple runlist domains and nvs
Now that the main nvsched code exists in the nvgpu build, make it
control the runlist domains. As a new nvs domain is created, create the
relevant runlist data too. To support the default domain, create a
default nvs domain at boot.

The scheduling domain code owns the responsibility of domain lifetime,
and runlist domains exist to serve that logic although the RL domains
are directly used by channel and TSG logic. Add refcounting to the
scheduler uapi level to make sure that busy domains (that still have TSG
participants) do not get removed too early.

Adjust error injection sensitive unit tests to match the updated logic.

Jira NVGPU-6425
Jira NVGPU-6427

Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-07 07:07:12 -08:00
Konsta Hölttä
1d14a4412f gpu: nvgpu: scheduler management uapi
Add ioctls for creating, removing and querying scheduling domains and
interface with the "nvsched" entity that will be the core scheduler.
Include the scheduler in the Linux build.

The core scheduler code will ultimately hold data on and control what
gets scheduled, but this intermediate layer in nvgpu-rm needs a bit of
bookeeping to manage the userspace interface.

To keep changes isolated, this does not touch the internal runlist
domains yet. The core scheduler logic will eventually control the
runlist domains.

Jira NVGPU-6788

Change-Id: I7b4064edb6205acbac2d8c593dad019d517243ce
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463625
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-07 07:07:01 -08:00