Both profiler and debugger device nodes access and update the list,
g->profiler_objects. List operations were currently not guarded by
lock thus leading to synchronisation issues. Stress-ng test attempts
to trigger repeated random open close sessions on all the device nodes
exposed by gpu. This results in kernel panic at random stages of test.
Failure signature - Profiler node receives a release call and as part
of it, nvgpu_profiler_free attempts to delete the prof_obj_entry and
free the prof memory. Simulataneously debugger node also receives a
release call and as part of gk20a_dbg_gpu_dev_release, nvgpu attempts
to access g->profiler_objects to check for any profiling sessions
associated with debugger node. There is a race to access the list which
results in kernel panic for address 0x8 because nvgpu tries to access
prof_obj->session_id which is at offset 0x8.
As part of this change, g->profiler_objects list access/update is
guarded with a mutex lock.
Bug 4858627
Change-Id: I1e2cf8d27d195bbc9c012cf511029de9eaadb038
Signed-off-by: Kishan Palankar <kpalankar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3239897
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
- ENGINE_DELAY_BEFORE field value gets reset to 0x4 during
nvgpu gr init path.
- Set the elcg_idle_filters just before nvgpu_gr_init_support
so that till this point all the GR reset and hw init of GR
has happened and we just have to do sw init of GR.
- This change helps to reatin the value of ENGINE_DELAY_BEFORE
field to 0xA
Bug 4315638
Bug 3821730
Bug 4475968
Change-Id: Ieef38eb63e596f1f95f1c19a121fbbf5fca34ab7
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit 5832c9884a079592beda7bb77c267903fda6d775)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3015609
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3097398
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Tested-by: Viresh Kumar <vireshk@nvidia.com>
The ap_devtools_judy kernel warning test saw
consistent failrures because of empty newline
warns being reported as kernel failures. These
new line warnings were found to be reported
only from the changed kernel warning.
Since all other nvgpu warns dont have new line
endings, removing the new line ending for the
error causing warn in this case.
Change-Id: Iaaf415085708eb970ae74f01c18be989ca068776
Signed-off-by: Pruthav Sanwatsarkar <pruthavs@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/3014285
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
When profiler session is terminated abnormally, PMA
control path is still in active/incorrect state with
existing teardown sequence.
This change ensures we clear PMA command slice
registers before we wait for routers to be idle.
Once PMM routers are idle, we clear PMA channel
registers to drain all the in-flight records.
Bug 4123716
Change-Id: I0659dc89b00f468c2f2df5af952ac68c70387746
Signed-off-by: Kishan <kpalankar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
(cherry picked from commit 64bcf057bf0930f414a700a378d33ee098bdf2e2)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2973882
Reviewed-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
In raw addressing mode of CBC backing storage, comptaglines are not
required to be allocated or need to programmed in the ptes. Introduce a
flag to detect if the hardware supports raw mode and use that to skip
all the comptagline allocations and respective page table programming.
JIRA NVGPU-9717
Change-Id: I0a16881fc3e897c3c408b30d1835f30564649dad
Signed-off-by: Prathap Kumar Valsan <prathapk@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2908278
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement a hw semaphore which is used to track the gpfifo submission.
This is implementation used when the userd.gp_get() is not defined and
also the feature flag NVGPU_SUPPORT_SEMA_BASED_GPFIFO_GET is set.
At the end of each job submitted, submit a semaphore to write the
gpfifo get pointer at hw semaphore addr. At next job submission
processing we will read the gpfifo.get from the designated hw semaphore
location.
JIRA NVGPU-9588
Change-Id: Ic88ace1a3f60e3f38f159e1861464ebcaea04469
Signed-off-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2898143
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Tested-by: Martin Radev <mradev@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
This patch adds the following functions which can be used to set/query
skyline configuration:
nvgpu_gr_config_set_singleton_mask
nvgpu_gr_config_get_singleton_mask
nvgpu_gr_config_set_num_singletons
nvgpu_gr_config_get_num_singletons
nvgpu_gr_config_set_num_tpc_in_skyline
nvgpu_gr_config_get_num_tpc_in_skyline
nvgpu_gr_config_set_gpc_skyline
nvgpu_gr_config_get_gpc_skyline
nvgpu_gr_config_set_virtual_gpc_id
nvgpu_gr_config_get_virtual_gpc_id
nvgpu_gr_config_set_sm_info_virtual_gpc_index
nvgpu_gr_config_get_sm_info_virtual_gpc_index
JIRA NVGPU-9897
Change-Id: If80e19f709472d74b42cb9c4b47dc4ce4f9a54dc
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2888783
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Changes:
- update nvgpu_nvs_ctrl_queue to have queue direction as it is required
by gsp scheduler to erase queue individually
- queue direction is updated during ioctl call to create queue and is
used only by gsp scheduler. So no other moduler should be affected by
it.
- need to pass the size of struct which is u32 so downgrading it from
u64 to u32 is intentional, misra C violation 10.3 can be ignored here
Bug 4027512
Change-Id: I6ef6e4b06124e25da3d004a2d8822516c3ac2105
Signed-off-by: vivekku <vivekku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2881804
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Remove NVGPU_GPU_IOCTL_ALLOC_AS_FLAGS_USERSPACE_MANAGED and
NVGPU_AS_ALLOC_USERSPACE_MANAGED flags which are used for supporting
userspace managed address-space. This functionality is not implemented
fully in kernel neither going to be implemented in near future.
Jira NVGPU-9832
Bug 4034184
Change-Id: I3787d92c44682b02d440e52c7a0c8c0553742dcc
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2882168
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Safety build temporal requirement is that on FECS power up it should go
through entire initialization methods.
init_golden_image callback is being called from devctl/ioctl path and
triggers FECS method 10 and 11. As these methods are part of APP init,
not being called during resume and causing quiesce on safety build.
To fix this issue, calling the callback from poweron API.
Bug 4082813
Bug 4037712
Change-Id: I2d27203d3cb4326ae7d8bd6025693fd61d5237df
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2893218
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- During nvgpu_poweron, PERFMON_INIT RPC and
ACR_INIT_WPR_REGION command is sent to PMU in two different threads.
- For perfmon RPC method is used and for ACR, CMD-MSG queue is used.
- Since the pmu thread and poweron thread run in parallel, the
pmu sequence acquired by both can have the same seq_id.
- For Perfmon RPC, nvgpu_pmu_seq_free_release() is called
followed by nvgpu_pmu_seq_release().
- This causes clearing of sequence for the next command.
- To resolve this, instead of nvgpu_pmu_seq_free_release(),
just free the rpc-payload after getting ack for perfmon and
then do sequence release.
- This ensures that the ACR cmd sent just after perfmon RPC does
not get the same seq_id and the sequence is not cleared.
Bug 4074021
Change-Id: Id9972cb719458062d8c7d9e226a25599026c052b
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2889840
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
- On older chips, PMU uses CMD-MSG queue method to
communicate with NvGPU.
- From Turing onwards, PMU uses RPC method for this.
- During poweroff, we release pmu_sequence and reset the
members of the structure.
- For chips that use RPC, we need to free the payload as well
and then reset the members.
- Add pmu_seq_cleanup hal for this.
Bug 4019694
Bug 4059157
Change-Id: Ieb474fe4ed81f54d78480214cde53b51d45652c6
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2882267
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- During driver unload, shutdown or RG path as part of
pmu destroy, pmu sequences have to be cleaned up to
free payload memory and allocation info which is stored
as part of pmu_sequence.
- While doing so there can be race condition with pmu_isr
or nvgpu_pmu_rpc_execute path where it waits for fw ack.
- This race condition can lead to freeing of payload memory
before nvgpu_pmu_sequences_cleanup() does.
- This can lead to memory corruption or double free issue
when the cleanup code again tries to free the payload mem.
- To resolve this add a new function nvgpu_pmu_seq_free_release()
which will check for seq->id in pmu seq tbl before freeing the
memory and other info from pmu_sequence.
- Use this nvgpu_pmu_seq_free_release() in non-blocking RPC calls
and also when fw ack fails or driver is dying scenario.
- For blocking call, synchronise freeing of rpc payload memory by
using a new boolean seq_free_status.
Bug 4019694
Bug 4059157
Change-Id: Id45a6914a2d383a654539a87861c471a77fb6850
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2882210
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
On deleting the subcontext, tsg->subctx_vms[] entries are set to NULL as
per the subcontext id. For async subcontexts the index logic was used
from that of tsg->async_veids bitmask. However subctx_vms is an array
shared by all subcontexts hence index should be subcontext id aka veid.
Also update the description of function nvgpu_tsg_validate_ch_subctx_vm
as some of the functionality is now moved to another function
nvgpu_tsg_create_sync_subcontext_internal.
Bug 3979886
Change-Id: Ic290fb175b34988c6ffabe9c9dc4ec124d2c70af
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2879025
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In case of process crash or forceful closure of the channels, userspace
may not release the VEID. In that case, creating further subcontexts
may not be possible.
Hence, when the channel is closed forcibly (linux), release the VEID on
closure of the last channel in the subcontext.
With this, normally on linux, channel close will not relase the VEID
However, on qnx it will release the VEID. So delete subcontext devctl
call on qnx will be nop in normal case hence changed the error print
and error return to success.
Also added check in the subcontext delete ioctl fn that all channels
are unbound before deleting the subcontext. This is to ensure that
channels don't refer to dangling subcontext pointer.
Bug 3979886
Change-Id: I434944b01740720011abce3664394ae8cb0d4e2e
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2858060
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
This patch adds nvenc support for TU104
- Fetch engine/dev info for nvenc
- Falcon NS boot (fw loading) support
- Engine context creation for nvenc
- Skip golden image for multimedia engines
- Avoid subctx for nvenc as it is a non-VEID engine
- Job submission/flow changes for nvenc
- Code refactoring to scale up the support for other multimedia
engines in future.
Bug 3763551
Change-Id: I03d4e731ebcef456bcc5ce157f3aa39883270dc0
Signed-off-by: Santosh BS <santoshb@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2859416
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>