Both profiler and debugger device nodes access and update the list,
g->profiler_objects. List operations were currently not guarded by
lock thus leading to synchronisation issues. Stress-ng test attempts
to trigger repeated random open close sessions on all the device nodes
exposed by gpu. This results in kernel panic at random stages of test.
Failure signature - Profiler node receives a release call and as part
of it, nvgpu_profiler_free attempts to delete the prof_obj_entry and
free the prof memory. Simulataneously debugger node also receives a
release call and as part of gk20a_dbg_gpu_dev_release, nvgpu attempts
to access g->profiler_objects to check for any profiling sessions
associated with debugger node. There is a race to access the list which
results in kernel panic for address 0x8 because nvgpu tries to access
prof_obj->session_id which is at offset 0x8.
As part of this change, g->profiler_objects list access/update is
guarded with a mutex lock.
Bug 4858627
Change-Id: I1e2cf8d27d195bbc9c012cf511029de9eaadb038
Signed-off-by: Kishan Palankar <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3239897
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
In Linux v6.12, commit 88a2f6468d01 ("struct fd: representation change")
removed the 'struct file' pointer from 'struct fd'. This breaks building
the NVGPU driver that tries to directly access the 'file' pointer
from the 'fd' structure. Fix this by using the helper macros 'fd_empty'
and 'fd_file' as necessary to fix the build.
Bug 4593750
Change-Id: I4c66b8e59be9df196b851983a1b6dbf0dda905ee
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3217454
(cherry picked from commit 9708dc5effde47d013240d898186416d9ed55fe0)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3226682
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
job->post_fence dma_fence is created for a syncpoint by nvgpu during
submission. This fence is not freed after job completion as extra
dma_fence reference (dma_fence_get) is taken in
nvgpu_nvhost_fence_install.
sync_file_create called from this function already takes one ref
which will be dropped when the sync file is closed. dma_fence_get
called here from nvgpu_nvhost_fence_install is not paired with
corresponding put.
Remove this dma_fence_get call to fix the dma_fence memleak.
Bug 4788227
Change-Id: I003756dc9e023751b28161d322740309e08dedb5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3199426
(cherry picked from commit a6818ea83516c8bd8961f03ad78421366d269572)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3211402
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Using kernel version for detecting kernel changes does not work for some
3rd party Linux distributions that back port kernel changes to their
kernel. The conftest script has a test for detecting if the 'vm_flags'
variable can be set directly or if the appropriate helper functions must
be used. Update the NVGPU driver to use the definition provided by
conftest to determine if the 'vm_flags' variable can set set directly or
not.
Bug 4014315
Change-Id: I6ebfbfa622259e15560152bf70315451a52fba81
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3164870
(cherry picked from commit 2c9097363d29a235eb5c41530cdd3896694599d2)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3172302
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Upstream commit 690da22dbfa8 ("asm-generic/io.h: kill vmalloc.h
dependency") removed the vmalloc.h header file from io.h and this causes
the NVGPU driver build to fail with the latest -next kernels. Fix this
by ensuring vmalloc.h is included. Note that it is fine to make this
change for all current supported kernels.
Bug 4593750
Change-Id: I426a4ead6607f0f41f31ce635f4d83222fa57b07
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3111678
(cherry picked from commit 5c166b94c93ee7bdcff5c97a3e8316864e82cc70)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3142060
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
- Update HW header to increase ENGINE_DELAY_BEFORE
field from 4 clk utils to 10 clk utils.
- The increase from 4 to 10 means the ELCG will
wait 2^10 utilsclk after all engine/priv reports
idle.
- With clk utils set to 4, when performance_cudaGraphs
is run target hangs.
- This change is needed for cuda tests to run and
pass on qnx.
Bug 3821730
Bug 4475968
Change-Id: Iec579021d16eef03b481e5e45bc9362734cb0f3d
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit 2b1df1cef738def5e41a884d442698bb226cbb57)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3009686
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3108175
Tested-by: Viresh Kumar <vireshk@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
- ENGINE_DELAY_BEFORE field value gets reset to 0x4 during
nvgpu gr init path.
- Set the elcg_idle_filters just before nvgpu_gr_init_support
so that till this point all the GR reset and hw init of GR
has happened and we just have to do sw init of GR.
- This change helps to reatin the value of ENGINE_DELAY_BEFORE
field to 0xA
Bug 4315638
Bug 3821730
Bug 4475968
Change-Id: Ieef38eb63e596f1f95f1c19a121fbbf5fca34ab7
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit 5832c9884a079592beda7bb77c267903fda6d775)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3015609
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3097398
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Tested-by: Viresh Kumar <vireshk@nvidia.com>
For Linux v6.8, the function strlcpy() has been removed. The function
strscpy() was added in Linux v4.3 and has been preferred over strlcpy().
See upstream Linux commit 30035e45753b ("string: provide strscpy()") for
more details. The Linux checkpatch.pl script warns against using
strlcpy().
The function strscpy() takes the same arguments as strlcpy(), but
returns a type of ssize_t instead of size_t. Update the NVGPU to use
strscpy() instead of strlcpy().
Bug 4448428
Change-Id: I0464b13720de20288a50375b167740ea514ca130
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3059558
(cherry picked from commit 5a12d5469192620e5c5b9e8828c728c148f10425)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3062999
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Upstream commit "7e3f926bf453"
[rcu/kvfree: Eliminate k[v]free_rcu() single argument macro]
Removes the single-argument kvfree_rcu() and kfree_rcu() macros. Code
that would have previously used these single-argument kvfree_rcu() and
kfree_rcu() macros should instead use kvfree_rcu_mightsleep() or
kfree_rcu_mightsleep().
Use kfree_rcu_mightsleep() instead of kfree_rcu() if using Kernel-6.5 or
newer.
Bug 4276500
Change-Id: I1adb4125c83019ea6c0e7d37175716ee7ed659cc
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2978710
(cherry picked from commit 2ede069c81db420c2d5fbd00bee4030fd943f3f7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3037021
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Laxman Dewangan <ldewangan@nvidia.com>
Tested-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The compiler option -Wmissing-prototypes is being enabled globally in
the upstream Linux kernel and this causes build failures for nvgpu. The
build failures occur because either the driver is missing an include
file which has the prototype or because the function is not declared
statically when it should be (ie. there are no external users).
Fix the various build failures and enable -Wmissing-prototypes to
prevent any new instances from occurring.
Bug 4404965
Change-Id: I551922836e37b0c94c158232d6277f4053e9d2d3
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3027483
(cherry picked from commit e8cbf90db2d0db7277db9e3eec9fb88d69c7fcc7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3035518
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
When compiling NVGPU with the GCC option '-Wmissing-prototypes' the
following error is observed ...
nvgpu/drivers/gpu/nvgpu/os/linux/nvlink.c:42:5:
error: no previous prototype for nvgpu_nvlink_train
[-Werror=missing-prototypes]
| int nvgpu_nvlink_train(struct gk20a *g, u32 link_id, bool from_off)
| ^~~~~~~~~~~~~~~~~~
The function nvgpu_nvlink_train() is no longer used and has not been
used for a long time. Therefore, fix the above by removing this legacy
function.
Bug 4404965
Change-Id: Ib5d13b024a072d20cb569cfa77a86a74274d3fe7
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3029179
(cherry picked from commit e378ca55059fdf649d3300ae98ba80ac6262c893)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3035510
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The ap_devtools_judy kernel warning test saw
consistent failrures because of empty newline
warns being reported as kernel failures. These
new line warnings were found to be reported
only from the changed kernel warning.
Since all other nvgpu warns dont have new line
endings, removing the new line ending for the
error causing warn in this case.
Change-Id: Iaaf415085708eb970ae74f01c18be989ca068776
Signed-off-by: Pruthav Sanwatsarkar <pruthavs@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3014285
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
When profiler session is terminated abnormally, PMA
control path is still in active/incorrect state with
existing teardown sequence.
This change ensures we clear PMA command slice
registers before we wait for routers to be idle.
Once PMM routers are idle, we clear PMA channel
registers to drain all the in-flight records.
Bug 4123716
Change-Id: I0659dc89b00f468c2f2df5af952ac68c70387746
Signed-off-by: Kishan <kpalankar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
(cherry picked from commit 64bcf057bf0930f414a700a378d33ee098bdf2e2)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2973882
Reviewed-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
For POR like safety GPU, we may specify devfreq
timer and governor to keep GPU running with high
performance.
This change supports module parameters for
specifying devfreq governor and devfreq timer.
safety nvgpu module parameter example:
devfreq_timer="delayed" devfreq_gov="performance"
Regarding devfreq timers, a delayed timer can
ensure that the devfreq monitor polls on time.
However, a deferrable timer might potentially
cause a delay in polling time.
Regarding devfreq governors, the default governor
of nvgpu is nvhost_podgov, which scales the gpu
frequency based on GPU load reported by PMU.
Using the performance governor will keep the GPU
operating at a higher GPU frequency,
providing better performance.
Bug 4084478
Change-Id: I9dfef11648203c6af281e980d3a5790b36742414
Signed-off-by: shaochunk <shaochunk@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2978810
Reviewed-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Instead of relying on kernel version to determine if the 'dev' member of
the 'struct class' is const, add a compile time test to the conftest.sh
script to determine this at compile time for the kernel being used. This
is beneficial for working with 3rd party Linux kernels that may have
back-ported upstream changes into their kernel and so the kernel version
checks do not work.
Bug 4119327
Change-Id: I5b3c886fa4bac7560c2c26534bed9f495d57195b
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2986637
(cherry picked from commit f9d48796dc2a292bb02a90677774a5568f9f1651)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2990696
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The MODULE_IMPORT_NS macro adds a namespace tag to the module
information. The DMA_BUF namespace is required for Linux v5.16+ kernels
for drivers that use DMA BUF, there is no reason not to populate this
for earlier kernels. Furthermore, some 3rd party kernels prior to v5.16
may require this too and so drop the version check around the DMA BUF
namespace.
The MODULE_IMPORT_NS macro was introduced in Linux v5.4 and so if the
kernel defines this, then add the DMA_BUF namespace.
Bug 4119327
Change-Id: Ic9acfeedce3a783434fe9cc2ded788c522629bbd
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2986636
(cherry picked from commit fd8ed471078450172062ca2a11f13d99ad57e1cb)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2990695
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Determining whether the header file iosys-map.h is present in the kernel
is currently determine by kernel version. However, for Linux v5.15,
iosys-map.h has been backported in order to support simple-framebuffer
for early display. Therefore, we cannot rely on the kernel version to
indicate whether iosys-map is present. This is also true for 3rd party
Linux kernels that backport changes as well. Fix this by adding a
compile time flag, that will be set accordingly by the conftest script
if this header is present.
Bug 4119327
Bug 4228080
Change-Id: I303b1060643b18709a236be5e0268d39cf540054
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2974081
(cherry picked from commit 41c1afb165122e98004005b8513d131b492269e9)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2946965
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The soft-depedenency for the governor_pod_scaling_v2 driver was
initially added to ensure this driver is loaded when built as an
out-of-tree module. The driver has now been renamed to
governor_pod_scaling but instead of simply updating the soft-dependency
name a new soft-dependency was added specifically with the new driver
name. This is not necessary and this also breaks frequency scaling
support for kernels for kernel to prior to v5.15. Fix this by removing
this compile time check and legacy soft-dependency.
Bug 4074863
Bug 4223170
Change-Id: Ie3179446896c388ec13b63a1af368149ed4b145c
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2948084
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Johnny Liu <johnliu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
In order to conform upstream path, we migrated GPU
firmwares to /lib/firmware/nvidia* for l4t.
Search the GPU fw from both /lib/firmware/nvidia/*
and /lib/firmware/*.
Bug 2975694
Signed-off-by: shaochunk <shaochunk@nvidia.com>
Change-Id: I6c3cadbabd3291d75ecacd8fbb0de80d23566494
(cherry picked from commit 92d0afc571fabbdc144fa25a18931d9250fd78bf)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2934925
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
When building NVGPU with virtualization enabled, we need to ensure that
the ccflag CONFIG_TEGRA_VIRTUALIZATION is defined. The Tegra HV driver
is only compiled when CONFIG_TEGRA_VIRTUALIZATION is defined. When NVGPU
is compiled without defining CONFIG_TEGRA_VIRTUALIZATION, then function
stubs in the header file "soc/tegra/virt/hv-ivc.h" will be used and these
stubs will return an error when called causing virtualization to fail.
Bug 4159372
Bug 4170085
Change-Id: Iab3cd47e25e086e31f8cc3337c0a732645ed4a7a
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2930315
(cherry picked from commit 6aa1600cb5b4b6ea783188c6d9e4cc56eedc17ea)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2931658
Reviewed-by: Laxman Dewangan <ldewangan@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
In raw addressing mode of CBC backing storage, comptaglines are not
required to be allocated or need to programmed in the ptes. Introduce a
flag to detect if the hardware supports raw mode and use that to skip
all the comptagline allocations and respective page table programming.
JIRA NVGPU-9717
Change-Id: I0a16881fc3e897c3c408b30d1835f30564649dad
Signed-off-by: Prathap Kumar Valsan <prathapk@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2908278
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement a hw semaphore which is used to track the gpfifo submission.
This is implementation used when the userd.gp_get() is not defined and
also the feature flag NVGPU_SUPPORT_SEMA_BASED_GPFIFO_GET is set.
At the end of each job submitted, submit a semaphore to write the
gpfifo get pointer at hw semaphore addr. At next job submission
processing we will read the gpfifo.get from the designated hw semaphore
location.
JIRA NVGPU-9588
Change-Id: Ic88ace1a3f60e3f38f159e1861464ebcaea04469
Signed-off-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2898143
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Tested-by: Martin Radev <mradev@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>