Passing branch of nvgpu_timeout_peek_expired was not covered due to jump
over it in nvgpu_falcon_mem_scrub_wait. Remove that jump to cover the
branch.
Add unit test for covering the error handling in case of read from
DMEM control register returns invalid data using fault injection.
JIRA NVGPU-4814
Change-Id: I9f99186bd2b1c5f39ead130d3161d3e7fa622ac4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2272937
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Advisory Directive 4.5 states that identifiers in the same
name space with overlapping visibility should be typographically
unambiguous.
In both the nvgpu_locate_pte_last_level() and gp10b_get_pde0_pgz()
routines this violation is raised because a variable used as
a for-loop index ('i') is ambiguous with a parameter that points
to a struct gk20a_mmu_level ('l').
To fix these violations the loop index 'i' is renamed to 'idx'.
Jira NVGPU-3178
Change-Id: I2cec904201075b48ab6ccfbd0ff6d7e9dcac2867
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2271456
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA rule 11.2 doesn't allow conversions of a pointer from or to an
incomplete type. These type of conversions may result in a pointer
aligned incorrectly and may further result in undefined behavior.
This patch addresses rule 11.2 violations related to pointers to and
from struct nvgpu_sgl. This patch replaces struct nvgpu_sgl pointers by
void pointers.
Jira NVGPU-3736
Change-Id: I8fd5766eacace596f2761b308bce79f22f2cb207
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2267876
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Preempt TSG occurs in non-mission mode, when unbinding channel
from TSG, or aborting TSG. Should a preempt not complete on
engine, we expect other HW safety mechanisms such as FECS
watchdog to detect issues that prevented saving current context.
Add BUG_ON when attempting to recover from preempt timeout,
to make sure we got such error, and sw_quiesce has been
requested.
Jira NVGPU-4230
Change-Id: Ia26a61e703f74eb28d29e72e75664ca4ec97a586
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265082
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
As a part of refactoring, we need to move the volt unit from perf to pmu
as it belongs there and also move the arbitor specific functions under
CLK_ARB as they will be removed from safety build.
This patch does the following
*Move volt struct from perf to pmu
*Move volt setup from pmu_pstate to volt
*Move clk freq related functions into CLK_ARB
NVGPU-4491
NVGPU-4492
Change-Id: I7180cd12bbf91cc4d2e79b6e2d71c16e494c8ff0
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268215
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new unit test to cover gops.gr.init.ecc_scrub_reg HAL function
gops.gr.init.ecc_scrub_reg HAL can generate TIMEOUT errors which are
not returned to caller currently. Update this HAL to return int value
for error propagation.
Jira NVGPU-4458
Change-Id: I98f4d5af2ef17cc4301951fec4d660638c8ef72c
Signed-off-by: dnibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265456
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Runlist update occurs in non-mission mode, when
adding/removing channel/TSGs. The pending bit
is a debug only feature. As a result logging a
warning is sufficient.
We expect other HW safety mechanisms such as
PBDMA timeout to detect issues that caused pending
to not clear. It's possible bad base address could
cause some MMU faults too.
Worst case we rely on the application level task
monitor to detect the GPU tasks are not completing
on time.
Jira NVGPU-4322
Change-Id: I7233770349db5dfad6904170a1e9a2d5eada70b2
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265094
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
fbpa related functions are not supported on igpu safety. Don't
compile them if CONFIG_NVGPU_DGPU is not set.
Also compile out fb and ramin hals that are dgpu specific.
Update the tests for the same.
JIRA NVGPU-4529
Change-Id: I1cd976c3bd17707c0d174a62cf753590512c3a37
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265402
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add unit tests to cover the invalid falcon port access, falcon sw init
switch cases, nvgpu_falcon_set_irq, nvgpu_timeout_init failure branch
coverage.
Compile out the functions nvgpu_falcon_get_mem_size & falcon_bootstrap
as they are needed by LS PMU and VBIOS code. For iGPU safety the
falcon functions needing these will call the HAL APIs directly.
This way we avoid the unreachable code as well. Updated the
prototype of falcon bootstrap HAL API as that doesn't return
any error.
With these changes, we get 100% line coverage for common.falcon unit.
JIRA NVGPU-2214
Change-Id: I1fe653d97c1a6a1521d7da38f171928dda58c5b5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2258311
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
PMU IRQs were not enabled assuming entire functionality for LS PMU.
Debugging early init issues of PMU falcon ECC errors triggered
during nvgpu power-on will be cumbersome if interrupts are not
enabled early. FMEA analysis of the nvgpu init path also
requires this interrupt be enabled earlier.
Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron.
pmu_enable_irq is updated to enable interrupts differently for
safety and non-safety. PMU interrupts disabling is moved out
of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new
wrapper API nvgpu_pmu_enable_irq.
PMU ECC init and isr mutex init is moved to the beginning of
nvgpu_pmu_early_init as for safety, ls pmu code path is
disabled. Fixed the pmu_early_init dependent and mc
interrupt related unit tests.
Update the doxygen for changed functions.
JIRA NVGPU-4439
Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2251732
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
pwr_csb slcg, blcg gating registers are covered by pmu slcg/blcg hence
its load functions are not used. Hence, delete the generated data and
functions. slcg, blcg ctxsw_firmware and pg_gr gating reglists are
null hence delete the generated data and functions.
JIRA NVGPU-2175
Change-Id: Ib04d9845331c9a287666d3b8c974e1d3b66a7677
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2263272
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add negative tests that inject memory allocation failures and
HAL function call errors to verify error handling path in
common.gr.obj_ctx unit.
Update common.gr.setup test to cover invalid class input while
setting preemption mode.
Jira NVGPU-4457
Change-Id: I74d1ba63ba8aace6087b51fd50e2c136822d3a00
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2260939
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- nvgpu_gr_ctx_load_golden_ctx_image() does not return any error, change
the return type to void
- Check for preemption modes greater than CILP in
nvgpu_gr_ctx_check_valid_preemption_mode
- Check if received class is valid or not in
nvgpu_gr_setup_set_preemption_mode
- Compile out entire nvgpu_gr_obj_ctx_init_ctxsw_preemption_mode since
it is really not doing anything in safety
- Remove the switch statement in nvgpu_gr_obj_ctx_set_compute_preemption_mode
since it is not possible to receive any other value than supported.
Previous function calls ensure that input values are validated.
- nvgpu_gr_obj_ctx_commit_global_ctx_buffers() does not return any
error, change the return type to void
- gops.gr.init.preemption_state HAL is not needed in safety since it
only configures gfxp related timeout
- remove redundant call to gops.gr.init.wait_idle in
nvgpu_gr_obj_ctx_commit_hw_state. We trigger wait despite earlier
failure in same function call.
Jira NVGPU-4457
Change-Id: I06a474ef7cc1b16fbc3846e0cad1cda6bb2bf2af
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2260938
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split GR ECC initialization into GPC/TPC and FECS ECC init as FECS ECC
errors during acr_construct_execute need to be reported and handled
hence FECS ECC counters are required to be initialized before
acr_construct_execute.
GPC/TPC ECC counters are dependent on the GR config that will be
initialized only after acr_construct_execute.
nvgpu_gr_intr_init_support is moved to nvgpu_gr_prepare_sw.
FECS ECC interrupt is enabled by default hence interrupt is not
enabled through gr_fecs_host_int_enable_r in nvgpu_gr_prepare_sw.
JIRA NVGPU-4439
Change-Id: Ifc9912f0578015a6ba1e9d38765c42633632b15f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2261987
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
get_gpc_mask hal is set only for tu104. Add CONFIG_NVGPU_DGPU check
in the code for using that hal.
gr_config_alloc_struct_mem function is called from nvgpu_gr_config_init
gr_config_free_mem is called gr_config_alloc_struct_mem on failure.
No need to call gr_config_free_mem from nvgpu_gr_config_init again
for failure.
nvgpu_gr_config_init allocate nvgpu_gr_config struct.
config->sm_to_cluster will never get allocated before.So no need to
check for config->sm_to_cluster and do a memset.
Jira NVGPU-4531
Change-Id: I928041c110019bec885f9d5b6978db3032bc493c
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2262229
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
handle_tex_exception hal is not set for safety build. Add
CONFIG_NVGPU_HAL_NON_FUSA checking for that hal.
log_mme_exception hal is supported only for turing. Add
CONFIG_NVGPU_DGPU checking for that hal.
gr_intr_handle_class_error always return -EINVAL. Change the
return as void to avoid unwanted error checking.
nvgpu_gr_intr_get_channel_from_ctx function parameter curr_tsgid will
never be NULL based on the current call. Remove unwanted
(curr_tsgid != NULL) check from this function.
Jira NVGPU-4454
Change-Id: I165d1cc5f9e308dfb11d905b59151b44f63a31bb
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2259763
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, GPU fifo sw_ready flag is not reset after fifo clean_up
execution. This patch resets g->fifo.sw_ready flag in
nvgpu_fifo_cleanup_sw_common() to indicate fifo attributes are reset.
Also, pbdma setup and cleanup functions are optional and may not be
populated. This patch modifies nvgpu_fifo_cleanup_sw_common() to
executes nvgpu_pbdma_cleanup_sw() if pbdma.cleanup_sw is populated.
Jira NVGPU-4339
Change-Id: I6fd53577afdd0a15c75f15b54a916e70e850d1b0
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2237809
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Minion Ucode is enabling HS Ucode Encyrption. Minion Ucode builds
will put out separate Debug-signed and Prod-signed Encrypted image
files. The driver will load prod image or debug image depending
on the setting of DEBUG fuse setting.
Add support to read the SCP_CTL_STAT register to differentiate
debug and prod boards and load correct binary accordingly.
Update the binary name to support two minion ucodes binaries in
the build.
JIRA NVLINK-283
Bug 2701677
Change-Id: I5348e9705708eeab4ce639b0721f10882d8970a7
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2258097
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Petlozu Pravareshwar <petlozup@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Set ENOMEM error if GPU mapping fails in nvgpu_gr_ctx_alloc().
- In nvgpu_gr_ctx_free_patch_ctx(), it is not possible to have a
valid patch buffer without GPU virtual address space. Hence instead
of checking for gpu_va, use nvgpu_mem_is_valid() to check if GPU
mappings can be removed.
- Check if RTV circular buffer is ready within DGPU config.
nvgpu_gr_global_ctx_buffer_map() already checks if buffer is ready
or not and returns error if buffer is not ready.
- g->ops.gr.ctxsw_prog.init_ctxsw_hdr_data() will always be set for
each chip. Remove unnecessary NULL check.
Jira NVGPU-4373
Change-Id: Ib490f81f8b8299f87cffbb8a33fde8cf98e6c288
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2259073
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In nvgpu_gr_setup_set_preemption_mode function, nvgpu_channel_enable_tsg
and nvgpu_channel_disable_tsg calls are changed to gops calls.
In this function beginning nvgpu_tsg_from_ch is checked for NULL pointer
so nvgpu_channel_enable_tsg and nvgpu_channel_disable_tsg functions
never fail with NULL tsg pointer for branch coverage.
Jira NVGPU-3968
Change-Id: If2e1fee493426e56d62865b015dc8b21bad494b6
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2258788
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
* Updated types and added error checks
* Modified GR condition for ctxsw disable count
CERT-C error check was added to detect error on integer overflow
But below logic couldn't detect first overflow, so updated condition
INT_MAX < gr->ctxsw_disable_count --> it became true after overflow
So, we didn't detected in first overflow and lead to assert on enable
JIRA NVGPU-3400
Change-Id: I6b0265a464f8f19efa7b0761612c6e9ffb3bd2bd
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2206282
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix bug where the CE mask includes other engine types besides just CEs
in nvgpu_ce_engine_interrupt_mask().
The intent of this API is to return mask of CE interrupts. However, the
if clause in the for loop is only excluding engine interrupts if the CE
stall or non-stall ISR is NULL. So, it does not distinquish between CE
or GR engine interrupts if the CE ISR is non-null.
Since the expectation is to not return CE interrupts if the ISRs are
NULL, just return a 0 mask if either ISR is NULL without having to
bother with the loop.
If the ISRs are set in the CE HAL, within the loop, only add interrupts
to the mask returned if the engine type is actually a CE engine (i.e. do
not include GR engine interrupts).
JIRA NVGPU-2224
Change-Id: Ic0048b00f16590fec50bb0858bd3f4498a00650d
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2256269
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>