gpu: nvgpu: Guard profiler_objects list operations with a lock

Both profiler and debugger device nodes access and update the list,
g->profiler_objects. List operations were currently not guarded by
lock thus leading to synchronisation issues. Stress-ng test attempts
to trigger repeated random open close sessions on all the device nodes
exposed by gpu. This results in kernel panic at random stages of test.

Failure signature - Profiler node receives a release call and as part
of it, nvgpu_profiler_free attempts to delete the prof_obj_entry and
free the prof memory. Simulataneously debugger node also receives a
release call and as part of gk20a_dbg_gpu_dev_release, nvgpu attempts
to access g->profiler_objects to check for any profiling sessions
associated with debugger node. There is a race to access the list which
results in kernel panic for address 0x8 because nvgpu tries to access
prof_obj->session_id which is at offset 0x8.

As part of this change, g->profiler_objects list access/update is
guarded with a mutex lock.

Bug 4858627

Change-Id: I1e2cf8d27d195bbc9c012cf511029de9eaadb038
Signed-off-by: Kishan Palankar <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3239897
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
This commit is contained in:
Kishan Palankar
2024-10-30 05:57:46 +00:00
committed by mobile promotions
parent 7a1c4e54ad
commit 2eabcdb8a4
5 changed files with 42 additions and 9 deletions

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -130,7 +130,9 @@ static int nvgpu_prof_fops_open(struct gk20a *g, struct file *filp,
free_umd_buf:
nvgpu_kfree(g, prof_priv->regops_umd_copy_buf);
free_prof:
nvgpu_mutex_acquire(&g->prof_obj_lock);
nvgpu_profiler_free(prof);
nvgpu_mutex_release(&g->prof_obj_lock);
free_priv:
nvgpu_kfree(g, prof_priv);
return err;
@@ -211,7 +213,9 @@ int nvgpu_prof_fops_release(struct inode *inode, struct file *filp)
nvgpu_prof_free_pma_stream_priv_data(prof_priv);
nvgpu_mutex_acquire(&g->prof_obj_lock);
nvgpu_profiler_free(prof);
nvgpu_mutex_release(&g->prof_obj_lock);
nvgpu_kfree(g, prof_priv->regops_umd_copy_buf);
nvgpu_kfree(g, prof_priv->regops_staging_buf);