gpu: nvgpu: fix race for channel sync read/write

CTS test dEQP-VK.api.object_management.max_concurrent.device_group
crashes with invalid userspace memory access.
Currently, nvgpu_submit_prepare_syncs() races with
nvgpu_channel_clean_up_jobs() and this race condition is exposed when
aggressive_sync_destroy_thresh is set to non-zero value.
nvgpu_submit_prepare_syncs() gets ref for c->sync to submit job and
releases channel sync_lock immediately. Meanwhile,
nvgpu_worker_poll_work() triggers nvgpu_channel_clean_up_jobs(), which
destroys ref'd c->sync pointer.
Channel sync is deleted by nvgpu_channel_clean_up_jobs() only if
aggressive_sync_destroy_thresh is non-zero.
So, nvgpu_channel_clean_up_jobs() and nvgpu_submit_prepare_syncs() will
race only in this scenario.
Hence, if aggressive_sync_destroy_thresh value is non-zero, this patch
protects channel's sync pointer by holding channel sync_lock
during complete execution of nvgpu_submit_prepare_syncs().

Bug 2613870

Change-Id: I030d8df7af10d4ed86f921b5cf60de2b1d60e5d3
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2181360
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This commit is contained in:
Vedashree Vidwans
2019-08-22 11:43:42 -07:00
committed by mobile promotions
parent 83fea157a3
commit 5fd301c61b

View File

@@ -66,13 +66,11 @@ static int nvgpu_submit_prepare_syncs(struct nvgpu_channel *c,
c->sync = nvgpu_channel_sync_create(c, false);
if (c->sync == NULL) {
err = -ENOMEM;
nvgpu_mutex_release(&c->sync_lock);
goto fail;
}
new_sync_created = true;
}
nvgpu_channel_sync_get_ref(c->sync);
nvgpu_mutex_release(&c->sync_lock);
}
if ((g->ops.channel.set_syncpt != NULL) && new_sync_created) {
@@ -164,6 +162,9 @@ static int nvgpu_submit_prepare_syncs(struct nvgpu_channel *c,
goto clean_up_incr_cmd;
}
if (g->aggressive_sync_destroy_thresh != 0U) {
nvgpu_mutex_release(&c->sync_lock);
}
return 0;
clean_up_incr_cmd:
@@ -182,6 +183,9 @@ clean_up_wait_cmd:
job->wait_cmd = NULL;
}
fail:
if (g->aggressive_sync_destroy_thresh != 0U) {
nvgpu_mutex_release(&c->sync_lock);
}
*wait_cmd = NULL;
return err;
}