gpu: nvgpu: skip post_process() operations for single domain.

There is a possible deadlock that gets triggered when device is being resumed() and NVS worker thread tries to submit the data as part of the post_process() operation. The NVS worker thread works asynchronously in the post_process() part w.r.t the USER threads and thus an initial implementation requires acquiring the busy lock() arriving at a deadlock scenario. This quick change shall disallow post_process() from executing during the case where we have only one scheduling domain present(legacy) Any submits meant to be updated are handled via the synchronous wakeup_process_item() callback. This implementation is being modified to allow the worker thread to be suspended/resumed during GPU railgate/unrailgate in upcoming releases and currently is in a state of flux. Bug 3723127 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I318cda0fbdd5651884cf21f748c86687679e6fdb Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2750293 Reviewed-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit
2025-12-22 17:36:20 +03:00 · 2022-07-25 11:37:10 +05:30
parent ee5053f7be
commit 44b6bfbc1d
1 changed files with 19 additions and 0 deletions
--- a/drivers/gpu/nvgpu/common/nvs/nvs_sched.c
+++ b/drivers/gpu/nvgpu/common/nvs/nvs_sched.c
@@ -117,6 +117,25 @@ static u64 nvgpu_nvs_tick(struct gk20a *g)
 		nvs_next = sched->shadow_domain->parent;
 	}

+	if (nvs_next->priv == domain) {
+		/*
+		 * This entire thread is going to be changed soon.
+		 * The above check ensures that there are no other domain,
+		 * besides the active domain. So, its safe to simply return here.
+		 * Any active domain updates shall are taken care of during
+		 * nvgpu_nvs_worker_wakeup_process_item().
+		 *
+		 * This is a temporary hack for legacy cases where we donot have
+		 * any active domains available. This needs to be relooked at
+		 * during implementation of manual mode.
+		 *
+		 * A better fix is to ensure this thread is suspended during Railgate.
+		 */
+		timeslice = nvs_next->timeslice_ns;
+		nvgpu_mutex_release(&g->sched_mutex);
+		return timeslice;
+	}
+
 	timeslice = nvs_next->timeslice_ns;
 	nvgpu_domain_next = nvs_next->priv;