linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Richard Zhao	a587d94f5a	gpu: nvgpu: init nvs scheduler for vf nvs does not have a clean cut. runlist submit path uses nvs worker no matter whether the feature is enabled. Jira GVSCI-15773 Change-Id: I6f6db1e766b8079ad6ca4a6b530b3ec27094f840 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2863443 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2023-03-21 02:31:21 -07:00
prsethi	505690f505	gpu: nvgpu: add validation check for domain name Currently there is no validation checks for domain name used in domain create command which can cause some security risk. Patch enable the validation for domain name by only allowing char from ([a-z], [A-Z], [0-9], -, _) list. Bug 3994374 Change-Id: Ia2cb6f533ed136e74e7a72934ad5267803d1236d Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2871515 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2023-03-17 04:17:14 -07:00
vivekku	a2a86eed27	gpu: nvgpu: gsp: migration from KMD to GSP Changes: - submit shadow domain for legacy used cases in case user domain is not present. - disabling config flags for KMD to submit user domain. Bug 3935433 NVGPU-9664 Change-Id: I498226df36d0b482d1af369526adb369d921b6ca Signed-off-by: vivekku <vivekku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2843968 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2023-03-17 03:55:20 -07:00
vivekku	35960f8f40	gpu: nvgpu: gsp: call runlist update and send ctrl fifo info Changes: - function calls to add and delete domains - updating runlist - integrating control fifo changes with ioctls to send queue info to GSP FW Bug 3884011 Change-Id: I5ad29eb9501cc2df66843c074ee6a00aae91af23 Signed-off-by: vivekku <vivekku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2826482 Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2023-03-17 03:55:08 -07:00
prsethi	6b2c080f8f	gpu:nvgpu: add enable flag for KMD_SCHEDULING_WORKER_THREAD support Currently KMD_SCHEDULING_WORKER_THREAD can be enabled/disabled using compile time flag but this flag does give ability to control the feature based on the chip. GSP is enabled only on ga10b where KMD_SCHEDULING_WORKER_THREAD should be disabled while should be enabled for other chips at the same time to support GVS tests. Change adds enabled flag to control KMD_SCHEDULING_WORKER_THREAD based on the chip. Bug 3935433 Change-Id: I9d2f34cf172d22472bdc4614073d1fb88ea204d7 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2867023 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2023-03-17 03:55:02 -07:00
prsethi	8c710694e8	gpu:nvgpu: fix for consecutive domain submission When a user-domain gets removed, tsg belongs to this also gets removed and runlist update happens accordingly. If same tsg was submitted to gpu then updated runlist also needs to re-submit. This works fine with the existing legacy cases but if GPU is running the shadow domain submitted by manual mode scheduler and domain belongs to this gets removed then updated runlist is not being submitted to GPU. This runlist buffer inconsistency causes mmu fault later. This change adds a "remove" field in the runlist domain which gets set to true when runlist update happens for the channel removal. Later worker thread submit the updated runlist if this flag set to true. Bug 3884011 Change-Id: I3ce08a5a281e20661915746e70ac0dcd711f3f38 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2838808 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2023-01-09 12:26:12 -08:00
Debarshi Dutta	63e8de5106	gpu: nvgpu: Remove NVGPU_SUPPORT_NVS_CTRL_FIFO Now that we are planning to enable CTRL_FIFO support with NVS, there is no need for a separate enabled flag for the same. CTRL_FIFO support is instead determined by the presence of NVGPU_SUPPORT_NVS enable flag alone. For non-auto platforms, Control-Fifo can be disabled by restricting access to /dev/nvsched_ctrl_fifo. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I9dbec60e5668f38e1460c43800584e88b16a2550 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2814435 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-11-24 00:47:37 -08:00
Debarshi Dutta	a8bdb67b2e	gpu: nvgpu: add doxygen comments for NVS Add doxygen comments for Domain Management APIs of NVS. Added NULL handling where required. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I23f45b95c070c8249bb83a336239b2b2d1a852a4 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2805043 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-11-24 00:39:21 -08:00
Debarshi Dutta	5d2dfc88a3	gpu: nvgpu: Replace CONFIG_NVS_KMD_BACKEND Use CONFIG_KMD_SCHEDULING_WORKER_THREAD instead of CONFIG_NVS_KMD_BACKEND to remove confusion about the CPU based KMD scheduling worker thread. The KMD based scheduling worker thread caters to both Manual Mode CPU based scheduler as well as Automatic Round Robin CPU based scheduler. For the traditional submit path, add correct handling of the CONFIG_NVS_PRESENT. CPU based worker thread should be part of CONFIG_NVS_PRESENT. Eventually, when DCONFIG_KMD_SCHEDULING_WORKER_THREAD is removed, the application must switch to GSP. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I0886ef3b2e0124b6fe22c2bf0bf7d1fa98039d00 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2810217 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-11-23 08:07:24 -08:00
Debarshi Dutta	53aa8b7244	gpu: nvgpu: disable multi-domain RR scheduling for NVS Disable multi-domain RR scheduling for NVS code for auto platforms and keep it enabled only for L4T. Bug 3839407 Change-Id: I7c7977f83f72756a9c5e8f9a9f22dde2c7766fb9 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2802937 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-11-17 07:16:03 -08:00
Debarshi Dutta	280b69e66d	nvgpu: userspace: add unit test for nvs Add a unit test to add verification for S/W parts of NVGPU-KMD based scheduler Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I266cb4167074dc5f7da647ce627e96188fc6bdcb Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2767591 Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-10-10 14:08:03 -07:00
Debarshi Dutta	b2e3810514	gpu: nvgpu: add support for manual mode NVS worker thread is changed to support manual mode exclusively with multi-domain round-robin scheduling. If control-fifo is enabled, NVS worker thread parses the ring buffer. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: Icc78e0749d5e4ebdb52f0c503ec303947011b163 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757241 Reviewed-by: Vivek Kumar (SW-TEGRA) <vivekku@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-10-10 14:07:58 -07:00
Debarshi Dutta	17dc483a6b	gpu: nvgpu: enclose NVS KMD inside a config Use CONFIG_NVS_KMD_BACKEND to enclose all NVS KMD based scheduling code. Current configuration contains all the scheduling code managed within CONFIG_NVS_PRESENT. Eventually, scheduling code shall only use GSP. Hence, isolate KMD based scheduling code to a config CONFIG_NVS_KMD_BACKEND. This shall make it easier to remove this code later. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I9dc668e0fa3e7706c111fda7a5e2415e1fc0dd03 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2769465 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-10-10 14:07:37 -07:00
vivekku	5bb56723be	gpu: nvgpu: gsp: Create functions to pass nvs data to gsp firmware Changes: - created functions to populate gsp interface data from nvs and runlist structures. - Handled both user domains and shadow domains. - Provided support for four engines from two. NVGPU-8531 Signed-off-by: vivekku <vivekku@nvidia.com> Change-Id: I1d9ec9ded8a9b47a5b2a00c44dacbab22e3b04b1 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2743596 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-10-05 06:18:18 -07:00
ht	125cc72c39	gpu: nvgpu: Fix devg_nvgpu_igpu process crash-2. As part of the negative test case we replace the ACR binaries with corrupted one(by editing the binary in hex editor). The expectaion is that the process should log the error and exit properly but instead the process crashed. The root cause was because NVGPU driver was trying to pause the thread using nvgpu_nvs_worker_pause but the but NVS isn't initialized at that point. NVS is initialized after acr init. Mitigated this failure by adding a checking condition in nvgpu_nvs_worker_pause. Bug 3670576 Change-Id: Ibfe66b253be034e7ca2c3ed298dc28d27e1d6de9 Signed-off-by: ht <ht@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2782937 Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-09-29 15:06:46 -07:00
Debarshi Dutta	667867a199	gpu: nvgpu: Resolve failed cond init. Following changes are added to fix the issue. 1) Threads having higher priority e.g. RT may preempt threads with sched-normal priority. As a consequence, higher priority threads might not still see initialization of data in another thread resulting in failures such as accessing a condition value before initialization. Any initialization in the parent thread must be accompanied by a barrier to make it visible in other thread. Added appropriate barriers to prevent reordering of the initialization in the thread construction path. 2) There is a race condition between nvgpu_cond_signal() and nvgpu_cond_destroy() in the asynchronous submit code and corresponding worker thread's process_item callback for NVS. This may lead to data corruption and resulting in the above errors as well. Fixed that by adding a refcount based mechanism for ownership sharing of the struct nvgpu_nvs_worker_item between the two threads. Bug 3778235 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: Ie9b9ba57bc1dcbb8780801be79863adc39690f72 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2771535 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: Ketan Patil <ketanp@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-09-27 23:25:55 -07:00
Debarshi Dutta	50f95f789c	gpu: nvgpu: improvements to NVS code Fix the bug in NVS worker initialization code. Ensure main thread waits for NVS worker to start. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I2a719bad691099881f3ac4468d32f9e81ece3800 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2773376 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>	2022-09-14 09:40:16 -07:00
Debarshi Dutta	143034daab	gpu: nvgpu: modify wait_pending The wait_pending HAL is now modified to simply check the pending status of a given runlist. The while loop is removed from this HAL. A new function nvgpu_runlist_wait_pending_legacy() is added that emulates the older wait_pending() HAL. nvgpu_runlist_tick() is modified to accept a 64 bit "preempt_grace_ns" value. These changes prepare for upcoming control-fifo parser changes. Jira NVGPU-8619 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: If3f288eb6f2181743c53b657219b3b30d56d26bc Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766100 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-08-30 23:45:43 -07:00
prsethi	e4d1a739da	gpu: nvgpu: nvs: plug nvs with safety code - Change enables CONFIG_NVS_PRESENT for safety build. - Fixes misra vioations. - Renames sched.h to nvs_sched.h to avoid the conflict with QNX system sched.h file for the safety support. - Disable test_channel_close, test_tsg_unbind_channel, test_channel_enable_disable_tsg, test_gv11b_fifo_preempt_tsg, test_tsg_unbind_channel_check_hw_state and test_rc_deinit unit tests. Jira NVGPU-8619 Change-Id: I7c983de2f4910fcb23687ec23368a060ce89c918 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2763579 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-08-29 17:31:03 -07:00
prsethi	440cf0c75e	gpu: nvgpu: nvs: supporting changes to plug nvs with QNX - Remove '()' from logging macro to fix compilation issue with QNX - Add NSEC_PER_MSEC which is missing for QNX. Jira NVGPU-8619 Change-Id: I0bc5c5a9c6979a0a78e29d26a40ca7927b25e5d0 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2754721 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-08-29 17:30:45 -07:00
Debarshi Dutta	42beb7f4db	gpu: nvgpu: simplify the runlist update sequence Following changes are added here to simplify the overall sequence. 1) Remove deferred update for runlists. NVS worker thread shall submit the updated runlist. 2) Moved Runlist mem swap inside update itself. Protect the swap() and hw_submit() path with a spinlock. This is temporary till GSP. 3) Enable Control-Fifo mode from nvgpu driver. Jira NVGPU-8609 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: Icc52e5d8ccec9d3653c9bc1cf40400fc01a08fde Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2757406 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-08-20 23:33:45 -07:00
Debarshi Dutta	1d4b7b1c5d	gpu: nvgpu: modify priority of NVS worker thread In linux threaded interrupts run with a Realtime priority of 50. This bumps up the priority of bottom-half handlers over regular kernel/User threads even during process context. In the current implementation scheduler thread still runs in normal kernel thread priority. In order to allow a seamless scheduling experience, the worker thread is now created with a Realtime priority of 1. This allows for the Worker thread to work at a priority lower than interrupt handlers but higher than the regular kernel threads. Linux kernel allows setting priority with the help of sched_set_fifo() API. Only two modes are supported i.e. sched_set_fifo() and sched_set_fifo_low(). For more reference, refer to this article https://lwn.net/Articles/818388/. Added an implementation of nvgpu_thread_create_priority() for linux thread using the above two APIs. Jira NVGPU-860 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I0a5a611bf0e0a5b9bb51354c6ff0a99e42e76e2f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2751736 Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-08-20 23:33:34 -07:00
Debarshi Dutta	13699c4c15	gpu: nvgpu: ensure worker thread is disabled during rg A previous commit ID `44b6bfbc1` added a hack to prevent the worker thread from calling nvgpu_runlist_tick() in post_process if the next domain matches the previous. This could potentially still face issues with multi-domains in future. A better way is to synchronize the thread to suspend/resume alongwith the device's common OS agnostic suspend/resume operations. This shall emulate the GSP as well. This shall also take care of the power constraints i.e. the worker thread can be expected to always work with the power enabled and thus we can get rid of the complex gk20a_busy() lock here for good. Implemented a state-machine based approach for suspending/ resuming the NVS worker thread from the existing callbacks. Remove support for NVS worker thread creation for VGPU. hw_submit method is currently set to NULL for VGPU. VGPU instead submits its updates via the runlist.reload() method. Jira NVGPU-8609 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I51a20669e02bf6328dfe5baa122d5bfb75862ea2 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750403 Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-08-20 23:33:29 -07:00
Debarshi Dutta	44b6bfbc1d	gpu: nvgpu: skip post_process() operations for single domain. There is a possible deadlock that gets triggered when device is being resumed() and NVS worker thread tries to submit the data as part of the post_process() operation. The NVS worker thread works asynchronously in the post_process() part w.r.t the USER threads and thus an initial implementation requires acquiring the busy lock() arriving at a deadlock scenario. This quick change shall disallow post_process() from executing during the case where we have only one scheduling domain present(legacy) Any submits meant to be updated are handled via the synchronous wakeup_process_item() callback. This implementation is being modified to allow the worker thread to be suspended/resumed during GPU railgate/unrailgate in upcoming releases and currently is in a state of flux. Bug 3723127 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I318cda0fbdd5651884cf21f748c86687679e6fdb Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750293 Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-25 18:41:43 -07:00
Debarshi Dutta	1bdca92c50	gpu: nvgpu: modify rl_domain member KMD needs to send the domain id and GPU_VA corresponding to the struct runlist_domains to GSP. In the current implementation, struct nvgpu_runlist_domain contains the domain name instead of domain id. This requires an additional search by name everytime an update is needed to be submitted to the GSP. Modify the struct nvgpu_runlist_domain to store domain id instead of domain name. This simplifies the flow and avoids unnecessary search. Removed the conditional check for existence of shadow domain as its a deadcode. Shadow Domain is not searchable in the list of domains inside the struct nvgpu_runlist. Jira NVGPU-8610 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I0d67cfa93d89186240290e933aa750702b14f4f0 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2744890 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-07-15 15:15:30 -07:00
Debarshi Dutta	62c03dfaef	gpu: nvgpu: add support for nvs control_fifo Add a device node for management of nvs control fifo buffers for scheduling domains. The current design consists of a master structure struct nvgpu_nvs_domain_sched_ctrl for management of users as well as control queues. Initially all users are added as non-exclusive users. Subsequent changes will add support for IOCTLS to manage opening of Send/Receive and Event buffers, querying characteristics etc. In subsequent changes, a user that tries to open a Send/Receive queue will first try to reserve itself as an exclusive user and only if that succeeds can proceed with creation of both Send/Receive queues. Exclusive users will be reset to non-exclusive users just before they close their device node handle. Jira NVGPU-8128 Change-Id: I15a83f70cd49c685510a9fd5ea4476ebb3544378 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2691404 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-15 07:08:22 -07:00
Debarshi Dutta	d8e8eb65d3	nvgpu: gpu: separate runlist submit from construction This patch primary separates runlist modification from runlist submits. Instead of submitting the runlist(domain) immediately after modification, a worker thread interface is now being used to synchronously schedule runlist submits. If the runlist being scheduled is currently active, the submit happens instantly, otherwise, it will happen in the next iteration when the nvs thread will schedule the domain. This external interface uses a condition variable to wait for the completion of the synchronous submits. A pending_update variable is used to synchronize domain memory swaps just before being submitted. To facilitate faster scheduling via the NVS thread, nvgpu_dom itself contains an array of rl_domain pointers. This can then be used to select the appropriate rl_domain directly for scheduling as against the earlier approach of maintaining nvs domains and rl domains in sync everytime. Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-13 16:36:19 -07:00
Debarshi Dutta	76cc8870e1	nvgpu: gpu: update default nvs domain implementation In current form, the default domain acts like any schedulable domain. TSGs are bound to it and it can be enumerated via the public interfaces. The new expectation for the default domain is meant to change from the current form to a pseudo domain that cannot act like an ordinary domain in other ways, i.e. it must not be reachable by in particular the domain management API, it can't be removed, does not show up in lists, and TSGs cannot be explicitly bound to this domain. It won't participate in round-robin domain scheduling. It is not really a domain, and acts like one only when activated in the manual mode. Following changes are made overall to support the above change in definition. 1) Domain creation and attaching the domain to the scheduler are now split into two separate functions. The new default domain (having ID = UINT64_MAX) is created separately from a static function without linking it with other domains in the scheduler. 2) struct nvgpu_nvs_scheduler explicitely stores the default domain to support direct lookups. 3) TSGs are initially not bound to default domain/rl_domain. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-12 00:24:58 -07:00
Konsta Hölttä	f10ee4ab0e	gpu: nvgpu: add domain name API Add nvgpu_nvs_domain_get_name() to minimize messing up with nvs internals and to help code organization when nvs is not built in yet. A stub to help compilation returns NULL because no domains can exist when the stub is built in, and thus it won't be used. Jira NVGPU-6788 Change-Id: If663f7c0e8434ef00dd3a3f40f6404a35b477f2b Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673120 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:09:01 -08:00
Konsta Hölttä	3a64fdefc4	gpu: nvgpu: domains as files for access control Create device nodes for user-created scheduling domains. This helps leverage filesystem based access control: domains can be chosen to be available for a limited set of users on a system. The device nodes are dynamic: they can be removed while the driver is running normally. This is a bit different from the nodes that exist until the driver is unloaded, so the devno/domain mapping is stored in a separate list. The usual container_of pattern would suffer from an unavoidable race condition if a domain file was opened while the same domain would get removed. As usual, domain refcounting prevents a domain from being removed. Now the open device files hold refs and thus any open domain files prevent a domain from getting removed, in addition to the userspace-invisible ref that is taken when a TSG is bound to a domain. While at it, make the query ioctl guarded by the sched domain mutex, as domains might technically get added or removed during the querying code. Jira NVGPU-6788 Change-Id: Ief2a09a442c4e70f1f2be8a32359341071d74659 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651164 Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:08:49 -08:00
Konsta Hölttä	8736c0d467	gpu: nvgpu: add and use sw-only timers The nvgpu timeout API has an internal override for presilicon mode by default: in presi simulation environments the timeouts never trigger. This behaviour is intended in the original usecase of the timer unit with hardware polling loops. In pure software logic though, the timer must trigger after the specified timeout even in presi mode so add a new init function to produce a timer for software logic. Use this new kind of timer in channel and scheduling worker threads. The channel worker currently times out for just the purpose of the channel watchdog timer which has its own internal timer. Although that's just software, the general expectation is that the watchdog does not trigger in presilicon tests that run slower than usual. The internal watchdog timer thus keeps the non-sw mode. Bug 3521828 Change-Id: I48ae8522c7ce2346a930e766528d8b64195f81d8 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662541 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-02-04 22:02:33 -08:00
Sagar Kamble	d424598b7b	gpu: nvgpu: stop nvs thread during unload nvs worker thread is created on each resume and deinitialized on every suspend. nvgpu can be resumed when process is getting killed. Thread creation can fail when the process is getting killed. That will lead to driver resume failure. To avoid the issue above, don't stop the nvs worker thread in suspend and let the first created thread handle the nvs work always. Deinitialize the nvs worker thread during nvgpu unload. Also, log the error returned by nvgpu_thread_create in the function nvgpu_worker_start. bug 3480192 Change-Id: I8d5d9e7716a950b162cc3c2d9fcfde07c4edfcf6 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646218 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-12-29 09:35:03 -08:00
Konsta Hölttä	55afe1ff4c	gpu: nvgpu: improve nvs uapi - Make the domain scheduler timeslice type nanoseconds to future proof the interface - Return -ENOSYS from ioctls if the nvs code is not initialized - Return the number of domains also when user supplied array is present - Use domain id instead of name for TSG binding - Improve documentation in the uapi headers - Verify that reserved fields are zeroed - Extend some internal logging - Release the sched mutex on alloc error - Add file mode checks in the nvs ioctls. The create and remove ioctls require writable file permissions, while the query does not; this allows filesystem based access control on domain management on the single dev node. Jira NVGPU-6788 Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-12-15 06:05:25 -08:00
Konsta Hölttä	d086c678fd	gpu: nvgpu: add domain scheduler worker Move away from the prototype call in channel wdt worker and create a separate worker thread for the domain scheduler. The details of runlist domains are still encapsulated in the runlist code; the domain scheduler controls when to switch domains. Switching happens based on domain timeslices or when the current domain is deleted. The worker thread is paused on railgate and spun back on poweron. The scheduler data was also left dangling, so fix that by deinitializing all nvs-related when gk20a_remove_support() is called. The runlist domains already get freed as part of fifo removal. Jira NVGPU-6427 Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-14 06:26:16 -08:00
Konsta Hölttä	632644b44a	gpu: nvgpu: couple runlist domains and nvs Now that the main nvsched code exists in the nvgpu build, make it control the runlist domains. As a new nvs domain is created, create the relevant runlist data too. To support the default domain, create a default nvs domain at boot. The scheduling domain code owns the responsibility of domain lifetime, and runlist domains exist to serve that logic although the RL domains are directly used by channel and TSG logic. Add refcounting to the scheduler uapi level to make sure that busy domains (that still have TSG participants) do not get removed too early. Adjust error injection sensitive unit tests to match the updated logic. Jira NVGPU-6425 Jira NVGPU-6427 Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:12 -08:00
Konsta Hölttä	1d14a4412f	gpu: nvgpu: scheduler management uapi Add ioctls for creating, removing and querying scheduling domains and interface with the "nvsched" entity that will be the core scheduler. Include the scheduler in the Linux build. The core scheduler code will ultimately hold data on and control what gets scheduled, but this intermediate layer in nvgpu-rm needs a bit of bookeeping to manage the userspace interface. To keep changes isolated, this does not touch the internal runlist domains yet. The core scheduler logic will eventually control the runlist domains. Jira NVGPU-6788 Change-Id: I7b4064edb6205acbac2d8c593dad019d517243ce Signed-off-by: Alex Waterman <alexw@nvidia.com> Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463625 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:01 -08:00

36 Commits