A call to exit the PMU state machine/kthread must
be prioritized over any other state change.
It was possible to set the state as PMU_STATE_EXIT,
signal the kthread and overwrite the state before
the kthread has had the chance to exit its loop.
This may lead to a "lost" signal, resulting in
indefinite wait during the destroy sequence.
Faulting sequence:
1. pmu_state = PMU_STATE_EXIT in nvgpu_pmu_destroy()
2. cond_signal()
3. pmu_state = PMU_STATE_LOADING_PG_BUF
4. PMU kthread wakes up
5. PMU kthread processes PMU_STATE_LOADING_PG_BUF
6. PMU kthread sleeps
7. nvgpu_pmu_destroy() waits indefinitely
This patch adds a sticky flag to indicate PMU_STATE_EXIT,
irrespective of any subsequent changes to pmu_state.
The PMU PG init kthread may wait on a call to
NVGPU_COND_WAIT_INTERRUPTIBLE, which requires a
corresponding call to nvgpu_cond_signal_interruptible()
as the core kernel code requires this task mask to
wake-up an interruptible task.
Bug 2658750
Bug 200532122
Change-Id: I61beae80673486f83bf60c703a8af88b066a1c36
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2181926
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Change the directly called init functions to function pointers in the
HAL. This makes it more consistent. This also allows for writing more
comprehensive unit tests for nvgpu.common.init.
JIRA NVGPU-2239
Change-Id: I05d739a8f8a2e7d385322d93154206eb0bfddc10
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2173920
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rename reserved identifier __here to label_here to fix
around 180 DCL37-C violations - "The reserved identifier "__here",
which is reserved for use as identifiers with file scope in both
the ordinary and tag name spaces, is declared.
JIRA NVGPU-3908
Change-Id: Iecaaa03674800bec919d71dc5fd226d362fc6f56
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2178457
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Advisory Rule 2.7 states that there should be no unused
parameters in functions.
This patch removes the unused 'post_event', 'fault_ch' and
'hww_global_esr' parameters from the following:
* gv11b_gr_intr_handle_l1_tag_exception()
* gv11b_gr_intr_handle_lrf_exception()
* gv11b_gr_intr_handle_cbu_exception()
* gv11b_gr_intr_handle_l1_data_exception()
* gv11b_gr_intr_handle_icache_exception()
These changes allowed the same parameters to be removed from the
the gr.intr.handle_tpc_sm_ecc_exception() interface.
Jira NVGPU-3178
Change-Id: I4d5dcbf2a5325e38782cdac67f9dd0b223fa1a18
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2171220
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Changes added to support "rmmod nvgpu" in dgpu simulation after gpu
poweron.
nvgpu_engine-wait_for_idle got stuck in busy mode for nvdec and nvec
engines in simulation as simulation doesnt support timeout.
These engines are not valid engines in nvgpu engine list.
Add nvgpu_engine_check_valid_id before checking engine status.
Simulation crash on accessing 0xb81604 top interrupt register.
Add func_priv_cpu_intr_top__size_1_v() function to get the supported
size than using default MAX_INTR_TOP_REGS.
nvlink is not supprted in dgpu simulation. Avoid warning for
-ENODEV return.
Avoid register read following gpu power off completion.
Bug 2498574
Change-Id: I9f9f1cf1ac4620242bda1d2cc0f29f51f81a6711
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2179930
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
CE app functionality from nvgpu is non-safe for igpu. CE engines init
/reset/cg related functionality is required in safety. Hence move the
CE app logic under CONFIG_NVGPU_DGPU flag and update the sources
accordingly.
JIRA NVGPU-3814
Change-Id: I37aa00b1184baccd5fe569ec315be60ac42dac9b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168956
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Advisory Rule 2.7 states that there should be no unused
parameters in functions.
This patch removes unused function parameters from the following:
* nvgpu_channel_ctxsw_timeout_debug_dump_state()
* nvgpu_channel_destroy()
* nvgpu_tsg_destroy()
* nvgpu_rc_pdbma_fault()
Jira NVGPU-3178
Change-Id: I12ad0d287fd7980533663a9776428ef5d4fd1fb9
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176066
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- In GV11B, read fuse_status_opt_tpc_gpc register
to read which TPCs are floorswept.
- The driver will also read sysfs node: tpc_pg_mask
- Based on these two values "can_tpc_powergate" will
be set to true or false and mask will be used to write to
fuse_ctrl_opt_tpc_gpc register to powergate the TPC.
- can_tpc_powergate = true indicates that the mask value
sent from userspace is valid and can be used to power gate
the desired TPC
- can_tpc_powergate = false indicates that the mask value
sent from userspace is not valid and cannot be used to
power gate the desired TPC.
Bug 200532639
Change-Id: Ib0806e4c96305a13b3574e8063ad8e16770aa7cd
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170736
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rule INT31-C requires that integer conversions do not result in lost or
misinterpreted data.
Rule INT32-C requires that operations on signed integers do not result
in overflow.
Rule EXP34-C requires that pointer dereferences never include NULL.
Fix violations of these types in nvgpu.common.utils.
JIRA NVGPU-3868
Change-Id: Ifcf4bc6536ca2df2adcb53b40b3e58316cc3e457
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168576
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rule INT30-C requires that unsigned integer operations do not
wrap. Fix these violations by using the safe ops.
Rule INT31-C requires that integer conversions do not result in lost or
misinterpreted data.
Rule INT33-C requires ensurance that division does not divide by 0.
Fix violations of these types in ptimer.
JIRA NVGPU-3868
Change-Id: Ib513da20eb14fa058630aa7a1f0aa5002373dd08
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168575
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Some recovery functions are currently exported in libnvgpu_safe.export.
Once CONFIG_NVGPU_RECOVERY is permanently disabled for safety build,
we can remove those functions from the export file.
Until we can disable it, make sure that related functions do exist,
even when CONFIG_NVGPU_RECOVERY is disabled, by using #ifdefs inside
the functions, instead of redeclaring functions as static inline.
Jira NVGPU-3871
Change-Id: Ib682ae81268b35cd1050a55cc73653fb6637b87c
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170433
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Added following gr init related gv11b hal for safety golden context
creation:
void gv11b_gr_init_restore_stats_counter_bundle_data(struct gk20a *g,
struct netlist_av_list *sw_bundle_init);
int gv11b_gr_init_load_sw_bundle_init(struct gk20a *g,
struct netlist_av_list *sw_bundle_init);
gv11b_gr_init_restore_stats_counter_bundle_data implements functionality
required to re-store stats bundle data, to avoid fe and mme mismatch.
gv11b_gr_init_load_sw_bundle_init implements functionality required for
disable stats idle clock counter to avoid mismatches
with two golden context saves.
JIRA NVGPU-3558
Change-Id: I73886770dac30934cbd3989b19ba87553286453d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2167211
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Added helper function to compare two golden context images:
bool nvgpu_gr_global_ctx_compare_golden_images(struct gk20a *g,
bool is_sysmem,
struct nvgpu_gr_global_ctx_local_golden_image *local_golden_image1,
struct nvgpu_gr_global_ctx_local_golden_image *local_golden_image2,
size_t size);
In case of sysmmem, direct mem comparison can be used and for vidmem.
only word by word comparison can be done.
Since this code is used only for safety, all implementation is under
NV_BUILD_CONFIGURATION_IS_SAFETY flag.
JIRA NVGPU-3558
Change-Id: Ie3d0ac19e561b19d44e90a9d6188eaade0cdec44
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2167209
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Raghuram Kothakota <rkothakota@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
INT31-C requires that integer conversions do not result in lost or
misinterpreted data.
Fix violations of INT31-C in utils unit. Add safe sub function for
u8 variables to fix the violation.
Jira NVGPU-3609
Change-Id: Ife51f5cc00c7127dd87d5d7b1b3c19ecf7bbfa4d
Signed-off-by: ajesh <akv@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2169974
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rule 11.3 forbids pointer cast between two different object types.
Rule 13.5 doesn't allow right hand operand of a logical operator to have
persistent side effects.
This patch fixes mentioned rules in nvgpu.common.mm.
Jira NVGPU-3864
Change-Id: I08b7fb4d3fb623f14f8760a50648b39b3e53b233
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168522
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add CONFIG_NVGPU_RECOVERY in order to conditionally compile
recovery code. This code will be removed from safety build
when sw quiesce state is implemented, and negative tests are
disabled or modified such that they do not expect recovery
to happen.
Added static inline functions for recovery handlers, when
CONFIG_NVGPU_RECOVERY is not defined. These inline functions
can later be wired to the sw quiesce functions.
Also moved gv11b recovery code to non-fusa, as it will ultimately
be removed from safety build.
Jira NVGPU-3871
Change-Id: Ia705b059fab6120899c7e15082f2a0f51ff51dc9
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2166074
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
There was a header file circular dependency that was preventing
including some files. For example, for utils.h to include safe_ops.h
would include bug.h which included log.h which included bitops.h which
included utils.h. To break this loop, the macro nvgpu_do_assert_print()
into a function in a new file assert.c. With this change, log.h is no
longer required in bug.h.
This change also required adding a few includes in C files that were
picking up definitions through the chain above.
JIRA NVGPU-3868
Change-Id: Icf95677bb36e4aa034cba25594cf71f2d028c289
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168528
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvgpu_rc_pbdma_fault just checks for the id and id_type from struct
nvgpu_pbdma_status_info. These contain invalid values during chsw_load
and chsw_switch. This patch corrects the above bug by checking for the
chsw status and then loading the values for id and type.
The current code reads the pbdma_status info after clearing the
interrupt. Other interrupts can cause enough delay between clearing the
interrupt and pbdma switching the channel leading to invalid channel/tsg
ID. Correct that by reading the pbdma_status info register before
clearing of the pbdma interrupt to correctly read the context
information before the pbdma can switch out the context.
Bug 2648298
Change-Id: Ic2f0682526e00d14ad58f0411472f34388183f2b
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2165047
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rule 2.2 doesn't allow unused variable assignments. The reason is
presence of unused variable assignments may indicate error in program's
logic.
Rule 21.x doesn't allow reserved identifier or macro names starting with
'_' to be reused or defined.
Jira NVGPU-3864
Change-Id: I8ee31c0ee522cd4de00b317b0b4463868ac958ef
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2163723
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Compile-out non-secure gr falcon boot related code for safety build by
adding non-secure gr falcon related code under following flag:
CONFIG_NVGPU_GR_FALCON_NON_SECURE_BOOT
Added nvgpu_gr_falcon_load_ctxsw_ucode and related functions under
CONFIG_NVGPU_GR_FALCON_NON_SECURE_BOOT flag and enabled this flag only
for non-safety builds.
JIRA NVGPU-3741
Change-Id: I817d8a7be6a675eee514faf7bb93f1382c6da5ce
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2158935
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add documentation for channel functions used in safety build.
Made nvgpu_channel_commit_va return void, as it can never fail.
Removed nvgpu_channel_recover prototype (function moved to rc unit).
Compile out refcount tracking definitions when
GK20A_CHANNEL_REFCOUNT_TRACKING is disabled.
Removed unused nvgpu_channel_set_ctx_mmu_error function.
Jira NVGPU-3588
Change-Id: Ia65a6c60ff30837230d81ca0e5f6dadafcc3af4e
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159674
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Added PCIE device info for TU104-QS chip & marked as FUSA SKU
using device flag
-is_fusa_sku flag will be set if device flag has FUSA SKU flag set
& this will be checked in driver to execute functionality
specific to FUSA SKU
JIRA NVGPU-3727
Change-Id: I49ea357133ce0b9bbf52dae72afcf8139ab01346
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2161163
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
emem related handling is specific to DGPU hence compile it out.
falcon_sw_init for platforms other than gv11b is also compiled
out.
Restored the DGPU falcons handling as the falcon common sw
init handling has case for initialization of unsupported
falcons that needs to be covered.
JIRA NVGPU-898
Change-Id: I41e886aba2aead052f3ef00278309759dd410df3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2160233
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>