gpu: nvgpu: cancel job clean up before aborting channel

It is possible that when we abort the channel, we have job clean up worker running, which could race with abort and sometimes result in below panic [ 245.483566] Unable to handle kernel paging request at virtual address 800000000 ... [ 245.548991] PC is at gk20a_channel_abort_clean_up+0xb8/0x140 [ 245.554683] LR is at gk20a_channel_abort_clean_up+0xac/0x140 ... [ 247.301860] [<ffffffc000479390>] gk20a_channel_abort_clean_up+0xb8/0x140 [ 247.312853] [<ffffffc0004794d4>] gk20a_channel_abort+0xbc/0xc8 [ 247.322970] [<ffffffc0004794f8>] gk20a_disable_channel+0x18/0x30 [ 247.333267] [<ffffffc000479628>] gk20a_free_channel+0x118/0x584 [ 247.343473] [<ffffffc000479aa0>] gk20a_channel_close+0xc/0x14 [ 247.353479] [<ffffffc000479b80>] gk20a_channel_release+0xd8/0x104 Fix this by cancelling the job clean up worker before aborting the channel Bug 1777281 Change-Id: Ic24c7c03b27cfb5cd164a52efdb1e2813a41a10a Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1174416 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2025-12-24 10:34:43 +03:00 · 2016-07-01 12:35:27 +05:30
parent 1e01a49fdc
commit 1002f40a3b
1 changed files with 4 additions and 0 deletions
--- a/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
@@ -69,6 +69,8 @@ static void gk20a_free_error_notifiers(struct channel_gk20a *ch);
 static u32 gk20a_get_channel_watchdog_timeout(struct channel_gk20a *ch);

 static void gk20a_channel_clean_up_jobs(struct work_struct *work);
+static void gk20a_channel_cancel_job_clean_up(struct channel_gk20a *c,
+				bool wait_for_completion);

 /* allocate GPU channel */
 static struct channel_gk20a *allocate_channel(struct fifo_gk20a *f)
@@ -460,6 +462,8 @@ void gk20a_channel_abort_clean_up(struct channel_gk20a *ch)
 	struct channel_gk20a_job *job, *n;
 	bool released_job_semaphore = false;

+	gk20a_channel_cancel_job_clean_up(ch, true);
+
 	/* ensure no fences are pending */
 	mutex_lock(&ch->sync_lock);
 	if (ch->sync)