gpu: nvgpu: engine preempt timeout in safety

Preempt TSG occurs in non-mission mode, when unbinding channel from TSG, or aborting TSG. Should a preempt not complete on engine, we expect other HW safety mechanisms such as FECS watchdog to detect issues that prevented saving current context. Add BUG_ON when attempting to recover from preempt timeout, to make sure we got such error, and sw_quiesce has been requested. Jira NVGPU-4230 Change-Id: Ia26a61e703f74eb28d29e72e75664ca4ec97a586 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2265082 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2025-12-25 11:04:51 +03:00 · 2019-12-18 17:22:37 -05:00
parent 07a0fe707f
commit 6b62e0f79a
1 changed files with 4 additions and 0 deletions
--- a/drivers/gpu/nvgpu/common/rc/rc.c
+++ b/drivers/gpu/nvgpu/common/rc/rc.c
@@ -168,7 +168,11 @@ void nvgpu_rc_preempt_timeout(struct gk20a *g, struct nvgpu_tsg *tsg)
 	nvgpu_tsg_set_error_notifier(g, tsg,
 		NVGPU_ERR_NOTIFIER_FIFO_ERROR_IDLE_TIMEOUT);

+#ifdef CONFIG_NVGPU_RECOVERY
 	nvgpu_rc_tsg_and_related_engines(g, tsg, true, RC_TYPE_PREEMPT_TIMEOUT);
+#else
+	BUG_ON(!g->sw_quiesce_pending);
+#endif
 }

 void nvgpu_rc_gr_fault(struct gk20a *g, struct nvgpu_tsg *tsg,