linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Debarshi Dutta	d8e8eb65d3	nvgpu: gpu: separate runlist submit from construction This patch primary separates runlist modification from runlist submits. Instead of submitting the runlist(domain) immediately after modification, a worker thread interface is now being used to synchronously schedule runlist submits. If the runlist being scheduled is currently active, the submit happens instantly, otherwise, it will happen in the next iteration when the nvs thread will schedule the domain. This external interface uses a condition variable to wait for the completion of the synchronous submits. A pending_update variable is used to synchronize domain memory swaps just before being submitted. To facilitate faster scheduling via the NVS thread, nvgpu_dom itself contains an array of rl_domain pointers. This can then be used to select the appropriate rl_domain directly for scheduling as against the earlier approach of maintaining nvs domains and rl domains in sync everytime. Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I1725c7cf56407cca2e3d2589833d1c0b66a7ad7b Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2739795 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-13 16:36:19 -07:00
Debarshi Dutta	76cc8870e1	nvgpu: gpu: update default nvs domain implementation In current form, the default domain acts like any schedulable domain. TSGs are bound to it and it can be enumerated via the public interfaces. The new expectation for the default domain is meant to change from the current form to a pseudo domain that cannot act like an ordinary domain in other ways, i.e. it must not be reachable by in particular the domain management API, it can't be removed, does not show up in lists, and TSGs cannot be explicitly bound to this domain. It won't participate in round-robin domain scheduling. It is not really a domain, and acts like one only when activated in the manual mode. Following changes are made overall to support the above change in definition. 1) Domain creation and attaching the domain to the scheduler are now split into two separate functions. The new default domain (having ID = UINT64_MAX) is created separately from a static function without linking it with other domains in the scheduler. 2) struct nvgpu_nvs_scheduler explicitely stores the default domain to support direct lookups. 3) TSGs are initially not bound to default domain/rl_domain. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-12 00:24:58 -07:00
Konsta Hölttä	f10ee4ab0e	gpu: nvgpu: add domain name API Add nvgpu_nvs_domain_get_name() to minimize messing up with nvs internals and to help code organization when nvs is not built in yet. A stub to help compilation returns NULL because no domains can exist when the stub is built in, and thus it won't be used. Jira NVGPU-6788 Change-Id: If663f7c0e8434ef00dd3a3f40f6404a35b477f2b Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673120 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:09:01 -08:00
Konsta Hölttä	3a64fdefc4	gpu: nvgpu: domains as files for access control Create device nodes for user-created scheduling domains. This helps leverage filesystem based access control: domains can be chosen to be available for a limited set of users on a system. The device nodes are dynamic: they can be removed while the driver is running normally. This is a bit different from the nodes that exist until the driver is unloaded, so the devno/domain mapping is stored in a separate list. The usual container_of pattern would suffer from an unavoidable race condition if a domain file was opened while the same domain would get removed. As usual, domain refcounting prevents a domain from being removed. Now the open device files hold refs and thus any open domain files prevent a domain from getting removed, in addition to the userspace-invisible ref that is taken when a TSG is bound to a domain. While at it, make the query ioctl guarded by the sched domain mutex, as domains might technically get added or removed during the querying code. Jira NVGPU-6788 Change-Id: Ief2a09a442c4e70f1f2be8a32359341071d74659 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651164 Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:08:49 -08:00
Konsta Hölttä	8736c0d467	gpu: nvgpu: add and use sw-only timers The nvgpu timeout API has an internal override for presilicon mode by default: in presi simulation environments the timeouts never trigger. This behaviour is intended in the original usecase of the timer unit with hardware polling loops. In pure software logic though, the timer must trigger after the specified timeout even in presi mode so add a new init function to produce a timer for software logic. Use this new kind of timer in channel and scheduling worker threads. The channel worker currently times out for just the purpose of the channel watchdog timer which has its own internal timer. Although that's just software, the general expectation is that the watchdog does not trigger in presilicon tests that run slower than usual. The internal watchdog timer thus keeps the non-sw mode. Bug 3521828 Change-Id: I48ae8522c7ce2346a930e766528d8b64195f81d8 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662541 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-02-04 22:02:33 -08:00
Sagar Kamble	d424598b7b	gpu: nvgpu: stop nvs thread during unload nvs worker thread is created on each resume and deinitialized on every suspend. nvgpu can be resumed when process is getting killed. Thread creation can fail when the process is getting killed. That will lead to driver resume failure. To avoid the issue above, don't stop the nvs worker thread in suspend and let the first created thread handle the nvs work always. Deinitialize the nvs worker thread during nvgpu unload. Also, log the error returned by nvgpu_thread_create in the function nvgpu_worker_start. bug 3480192 Change-Id: I8d5d9e7716a950b162cc3c2d9fcfde07c4edfcf6 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646218 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-12-29 09:35:03 -08:00
Konsta Hölttä	55afe1ff4c	gpu: nvgpu: improve nvs uapi - Make the domain scheduler timeslice type nanoseconds to future proof the interface - Return -ENOSYS from ioctls if the nvs code is not initialized - Return the number of domains also when user supplied array is present - Use domain id instead of name for TSG binding - Improve documentation in the uapi headers - Verify that reserved fields are zeroed - Extend some internal logging - Release the sched mutex on alloc error - Add file mode checks in the nvs ioctls. The create and remove ioctls require writable file permissions, while the query does not; this allows filesystem based access control on domain management on the single dev node. Jira NVGPU-6788 Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-12-15 06:05:25 -08:00
Konsta Hölttä	d086c678fd	gpu: nvgpu: add domain scheduler worker Move away from the prototype call in channel wdt worker and create a separate worker thread for the domain scheduler. The details of runlist domains are still encapsulated in the runlist code; the domain scheduler controls when to switch domains. Switching happens based on domain timeslices or when the current domain is deleted. The worker thread is paused on railgate and spun back on poweron. The scheduler data was also left dangling, so fix that by deinitializing all nvs-related when gk20a_remove_support() is called. The runlist domains already get freed as part of fifo removal. Jira NVGPU-6427 Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-14 06:26:16 -08:00
Konsta Hölttä	632644b44a	gpu: nvgpu: couple runlist domains and nvs Now that the main nvsched code exists in the nvgpu build, make it control the runlist domains. As a new nvs domain is created, create the relevant runlist data too. To support the default domain, create a default nvs domain at boot. The scheduling domain code owns the responsibility of domain lifetime, and runlist domains exist to serve that logic although the RL domains are directly used by channel and TSG logic. Add refcounting to the scheduler uapi level to make sure that busy domains (that still have TSG participants) do not get removed too early. Adjust error injection sensitive unit tests to match the updated logic. Jira NVGPU-6425 Jira NVGPU-6427 Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:12 -08:00
Konsta Hölttä	1d14a4412f	gpu: nvgpu: scheduler management uapi Add ioctls for creating, removing and querying scheduling domains and interface with the "nvsched" entity that will be the core scheduler. Include the scheduler in the Linux build. The core scheduler code will ultimately hold data on and control what gets scheduled, but this intermediate layer in nvgpu-rm needs a bit of bookeeping to manage the userspace interface. To keep changes isolated, this does not touch the internal runlist domains yet. The core scheduler logic will eventually control the runlist domains. Jira NVGPU-6788 Change-Id: I7b4064edb6205acbac2d8c593dad019d517243ce Signed-off-by: Alex Waterman <alexw@nvidia.com> Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463625 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:01 -08:00

10 Commits