gpu: nvgpu: cancel job clean up before aborting channel

It is possible that when we abort the channel, we have
job clean up worker running, which could race with abort
and sometimes result in below panic

[  245.483566] Unable to handle kernel paging request at virtual address
800000000
...
[  245.548991] PC is at gk20a_channel_abort_clean_up+0xb8/0x140
[  245.554683] LR is at gk20a_channel_abort_clean_up+0xac/0x140
...
[  247.301860] [<ffffffc000479390>]
gk20a_channel_abort_clean_up+0xb8/0x140
[  247.312853] [<ffffffc0004794d4>] gk20a_channel_abort+0xbc/0xc8
[  247.322970] [<ffffffc0004794f8>] gk20a_disable_channel+0x18/0x30
[  247.333267] [<ffffffc000479628>] gk20a_free_channel+0x118/0x584
[  247.343473] [<ffffffc000479aa0>] gk20a_channel_close+0xc/0x14
[  247.353479] [<ffffffc000479b80>] gk20a_channel_release+0xd8/0x104

Fix this by cancelling the job clean up worker before aborting
the channel

Bug 1777281

Change-Id: Ic24c7c03b27cfb5cd164a52efdb1e2813a41a10a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1174416
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This commit is contained in:
Deepak Nibade
2016-07-01 12:35:27 +05:30
committed by Terje Bergstrom
parent 1e01a49fdc
commit 1002f40a3b

View File

@@ -69,6 +69,8 @@ static void gk20a_free_error_notifiers(struct channel_gk20a *ch);
static u32 gk20a_get_channel_watchdog_timeout(struct channel_gk20a *ch);
static void gk20a_channel_clean_up_jobs(struct work_struct *work);
static void gk20a_channel_cancel_job_clean_up(struct channel_gk20a *c,
bool wait_for_completion);
/* allocate GPU channel */
static struct channel_gk20a *allocate_channel(struct fifo_gk20a *f)
@@ -460,6 +462,8 @@ void gk20a_channel_abort_clean_up(struct channel_gk20a *ch)
struct channel_gk20a_job *job, *n;
bool released_job_semaphore = false;
gk20a_channel_cancel_job_clean_up(ch, true);
/* ensure no fences are pending */
mutex_lock(&ch->sync_lock);
if (ch->sync)