In the current logic for nvgpu_timeout_expired(), function always
returns 0 if fault injection is enabled. This only helps for testing
timeout not expired scenarios. However, if nvgpu_timeout_expired() is
used in a while(true) loop, it is impossible to break the infinite loop.
This patch modifies nvgpu_timeout_expired() to not expire until fault
injection counter is non-zero. The function will now return -ETIMEDOUT
when fault injection is enabled and counter is zero.
Jira NVGPU-4675
Change-Id: I494031698ade19cf1ec5b75e4dbe5a1157da2aa7
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2275290
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
event_id_list and event_id_list_locks fields are only
needed in nvgpu_tsg when CONFIG_NVGPU_CHANNEL_TSG_CONTROL
is defined.
Conditionally compile those fields and related code,
so that they are removed from safety build.
Jira NVGPU-4376
Change-Id: I8678aa1b8cd4166aa37bcb42cda1eb9c703fd32f
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2273261
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Advisory Directive 4.5 states that identifiers in the same
name space with overlapping visibility should be typographically
unambiguous.
The presence of both the roundup(x,y) and round_up(x,y) macros in
the posix utils.h header incurs a violation of this rule.
These macros were added to keep in sync with the linux kernel variants.
However, there is a key distinction between how these two macros
work in the linux kernel; roundup(x,y) can handle any y alignment while
round_up(x,y) is intended to work only when y is a power-of-two.
Passing a non-power-of-two alignment to round_up(x,y) results in an
incorrect value being returned (silently).
Because all current uses of roundup(x,y) and round_up(x,y) in
nvgpu specify a y value that is a power-of-two and the underlying
posix macro implementations assume as much, it is best to remove
roundup(x,y) from nvgpu altogether to avoid any confusion.
So this change converts all uses of roundup(x,y) to round_up(x,y).
Jira NVGPU-3178
Change-Id: I0ee974d3e088fa704e251a38f6b7ada5a7600aec
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2271385
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For some code that is tested using public APIs it's not safe for the
parent thread to continue while the child thread is running. Fault
injection per thread pointer points to the same container for both
parent thread as well as the created one. So, there is chance of a race
in fault injection functionality. Serialize the run so that race can be
mitigated. Caller of nvgpu_thread_create() API should ensure that the
created thread is stopped using some fault injection or otherwise.
Bug 200580790
Change-Id: I334c07c4bac6e43d67de9bfc581dad021e421acd
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268133
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Tested-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Created ucode_perf_vfe_inf.h and moved all VFE
interface structs and MACROs into this header
-Created nvgpu_clk_fll_get_fmargin_idx to get
freq margin index
-Created nvgpu_vfe_var_get_s_param to read s_param
-Removed MACROs and header includes which are
not needed
NVGPU-4448
Change-Id: I89f946d555bcbc7823665d2a5a761049f7a5e963
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2260150
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA rule 11.2 doesn't allow conversions of a pointer from or to an
incomplete type. These type of conversions may result in a pointer
aligned incorrectly and may further result in undefined behavior.
This patch addresses rule 11.2 violations related to pointers to and
from struct nvgpu_sgl. This patch replaces struct nvgpu_sgl pointers by
void pointers.
Jira NVGPU-3736
Change-Id: I8fd5766eacace596f2761b308bce79f22f2cb207
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2267876
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The disable_syncpoints debugfs knob allows the user to disable syncpt
support at runtime. This knob was incorrectly defined as a u32. Convert
it into a boolean variable.
JIRA NVGPU-3873
Change-Id: If1cfe07fa7b795c0d1b507395bd6e4fa547e3615
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2262193
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- The only sysfs node supported on safety build
was "/dev/gpu_powered_on".
- In QNX, GPU is always powered on. So, this sysfs node doesn't
convey any extra info. Hence, this patch removes sysfs from safety
build.
Bug 200573132
Change-Id: If5f2a6ac81eefb28e71fb919843328cbe87e417c
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2256767
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute
on different hardware resources.
This feature requires a change in software to make it possible
to modify the virtual SM id to TPC mapping across mission and
redundant contexts.
This CL adds only SM diversity flags which are exposed to
its clients through ioctl/devctl interfaces.
Actual virtual SM id to TPC mapping implementation
will be part of upcoming patch sets.
Added NvGpu CFLAGS to identify the safety build
"CONFIG_NVGPU_BUILD_CONFIGURATION_IS_SAFETY"
JIRA NVGPU-4133
Change-Id: I5a18256780e6726e399e39c1c8d155d2ef07d7bd
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2250461
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch constructs the initial setup for sync unit.
There are three simple tests currently. The first test inits the
environment necessary such as regspace init, hal init. The second
step simply fails the creation of the sync and the last test is meant
as a deinit step.
JIRA NVGPU-913
Change-Id: I1db72d9833c3c4bc3c3903a7d81cce06e9983509
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2248493
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Change approach to how the fault injection state is stored to facilitate
propagating fault injection state to child-threads. Rather than each
unit maintaining a thread-local object, there is a thread-local
container stored in the posix-fault-injection itself. This container is
initialized for each test module so that is independent of other other
test modules (for parallel test module execution). When child threads
are created with nvgpu_create_thread(), the fault injection container is
configured for the child.
JIRA NVGPU-3981
Change-Id: I9b580dc7f1621a7770eef8eba796f3918f2738bf
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2238474
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Previously, unit interrupt enabling/disabling and corresponding MC level
interrupt enabling/disabling was not done at the same time.
With this change, stall and nonstall interrupt for units are programmed
at MC level along with individual unit interrupts. Kept access to MC
interrupt registers through mc.intr_lock spinlock.
For doing this separated CE and GR interrupt mask functions.
mc.intr_enable is only used when there is global interrupt
control to be set. Removed mc_gp10b.c as mc_gp10b_intr_enable
is now removed. Removed following functions - mc_gv100_intr_enable,
mc_gv11b_intr_enable & intr_tu104_enable. Removed intr_pmu_unit_config
as we can use the generic unit interrupt control function.
JIRA NVGPU-4336
Change-Id: Ibd296d4a60fda6ba930f18f518ee56ab3f9dacad
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196178
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The pmu init thread typically returns immediately
without calling nvgpu_thread_should_stop().
pmu_pg_kill_task() checks if the thread is running, and
if it is, calls nvgpu_thread_stop().
However, there's a race condition where the init thread could
have exited between the time that kill_task() checked the
running flag and the time we actually stop the thread, leading
to a kernel crash.
Fix this by making the running flag in the nvgpu_thread struct
atomic. Both the thread proxy function and the thread_stop()
function will set the flag to false.
In the case of nvgpu_thread_proxy(), if the flag is already false,
then nvgpu_thread_stop() has already reset it, at which point we
just wait for nvgpu_thread_should_stop() to return true.
In the case of nvgpu_thread_stop(), if the flag is already false,
then the thread proxy function has already exited, and there is
nothing more to do.
Bug 2591298
Change-Id: I9ba6b63c30a5c3e1df11e790094836b44373122b
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2230358
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
tegra_get_chip_id is going to be deprecated soon and this patch removes
calls to it. Chip-ID is already read via DT and call to
tegra_get_chip_id() can be avoided by adding metadata to store the
Chip-ID information in struct gk20a_platform.
Bug 200524194
Bug 200551105
Change-Id: I5f9f5abf679cf9afe98840e20144d76eb0238426
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2236311
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support in nvgpu to parse and get the freq cap from DT.
The patch does below
Parse the DT and gets the freq cap value during probe.
During clk_arb init compare this with P0.Max and takes the lowest.
Send change_seq with the new value and set dgpu freq.
Use the lowest for "get points","get default","set VF".
Bug 200556366
Change-Id: Ie10243f9bf83cb5ae07ebcc4cdc8efaffa56c309
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2204644
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_pm_runtime_suspend can fail and invoke gk20a_pm_finalize_poweron
that can cause double mapping of the usermode mmap region via
io_remap_pfn_range(). Avoid this by using a boolean variable to track
whether the region is already mapped.
Bug 2707416
Change-Id: I4d8cbe427400a5b986348a19af145367cc08ffc6
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2229312
GVS: Gerrit_Virtual_Submit
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The devinit executes in parallel with PCIE link training
to reduce exit latency. Therefore, all PCIE settings that
normally occur during devinit after the PCIE link is up are
deferred until nvgpu has resumed control.
Bug 2661545
Change-Id: Ifdd4f645b2e1791d93567cc34d6ab0691a25d101
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2210625
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move graphics related defs and functions under CONFIG_NVGPU_GRAPHICS
switch.
Move classes not supported in GV11B under CONFIG_NVGPU_NON_FUSA
switch.
Add missing valid class numbers to gpu_class.is_valid HAL.
Also remove un-used class defs from class.h header.
Lot of qnx safety tests are still using graphics 3d class.
Until those tests got fixed, allowing 3d graphics class
as valid class for safety build.
JIRA NVGPU-4301
Change-Id: Ifd2a13bee3210821799c2bca10e7245eb3c79121
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2224658
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>