Enable build flags for dGPU in safety, when
NVGPU_FORCE_DGPU_SAFETY_PROFILE is set.
Use libnvgpu-dgpu_safe.exports for dGPU safety build.
Add build flags for tu104 HAL initialization (to solve
undefined symbols in safety build).
Temporarily add non-fusa files needed to build dGPU in safety.
related functions will have to move to fusa files.
Jira NVGPU-4611
Change-Id: I41db0c039c7f15d9191cdb811b4906e779d5cc88
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2310276
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new hal function
gp10b_gr_intr_handle_class_error.
Update handle_class_error hal function for
gp10b, gv11b and tu104 to
gp10b_gr_intr_handle_class_error from
gm20b_gr_intr_handle_class_error.
gr_trapped_data_mme_pc uses 12 bits from gp10b.
Move gm20b_gr_intr_handle_class_error hal function
to non-fusa section.
Jira NVGPU-4913
Signed-off-by: vinodg <vinodg@nvidia.com>
Change-Id: Ic93013ba43d4bf409527109f2f2d43db11c4238e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2314249
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Current clk unit has multiple header files under pmuif folder.
This has combination of public struct which is accessed outside the
unit and private struct which is accessed within clk unit.
This patch segregates them based on their accessibility.
All private items are moved into ucode_clk_inf.h from pmuif which only
clk can access.
All public items are moved into include/clk.h which other units can
access
This will help in documentation of items for public items.
NVGPU-4491
Change-Id: Iccb0571e05ecb3cb13363390bed8c7214409b543
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2292318
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
From gv11b onwards, FECS ucode returns an ACK for set watchdog
timeout method. Failure to wait for this ACK was leading to races,
and in some cases, the ACK could be mistaken for the reply to the
next method.
In particular, this happened for the discover golden image size
method which is sent after set watchdog timeout.
With instrumented FECS ucode, it takes longer for the code to
process the set watchdog timeout method, and the write to ack
that method could happen after nvgpu driver clears the mailbox to
send the discover image size method.
With an invalid golden context image size, FECS ended up causing
an MMU fault while attempting to save past allocated buffer.
Added NVGPU_GR_FALCON_METHOD_SET_WATCHDOG_TIMEOUT to be used with
gops_gr_falcon.ctrl_ctxsw, and implemented 2 variants:
- gm20b_gr_falcon_ctrl_ctxsw, without ACK
- gv11b_gr_falcon_ctrl_ctxsw, with ACK
Added NVGPU_GR_FALCON_SUBMIT_METHOD_F_LOCKED flag to allow
executing above method without re-acquiring FECS lock. Longer term,
the 'flags' could be added to gop_gr_falcon.ctrl_ctxsw parameters.
Use gops_gr_falcon.ctrl_ctxsw instead of register writes to invoke
set watchdog timeout method in gm20b_gr_falcon_wait_ctxsw_ready.
Also replaced calls to gm20b_gr_falcon_ctrl_ctxsw to
gops_gr.falcon.ctrl_ctxsw when appropriate, since there are
multiple variants (gm20b, gp10b and gv11b).
Last, fixed clearing of mailbox 0 in gm20b_gr_falcon_bind_instblk.
Bug 200586923
Change-Id: I653b9a216555eec8cd4bb01d6f202bc77b75a939
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287340
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
For integration testing in safety build, we need to test ctxsw firmware
error codes. inorder to test this, we need some instrumentation in
nvgpu driver.
Added below instrumentation for CTXSW FW error code integration testing.
1. Reduce the ctxsw watchdog time, so ctxsw watchdog always triggered.
Use CONFIG_NVGPU_CTXSW_FW_ERROR_WDT_TESTING for testing watchdog
2. Submit unsupported method to CTXSW FW, it will trigger err.
Use CONFIG_NVGPU_CTXSW_FW_ERROR_CODE_TESTING for testing err code.
JIRA NVGPU-4471
Change-Id: Ib45a946b3e38d3b6dd5bbee277c5d3e7c55521c0
Signed-off-by: sagar <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2284048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Below functions in common.gr hal subunits include unnecessary
asserts to ensure value is not truncated when parsing into U32 size.
gm20b_gr_init_commit_global_attrib_cb()
gp10b_gr_init_commit_global_bundle_cb()
gp10b_gr_init_commit_global_pagepool()
gv11b_gr_init_commit_global_attrib_cb()
Make use of nvgpu_safe_cast_u64_to_u32() and remove unnecessary
asserts
gp10b_gr_init_commit_global_bundle_cb() function checks if size <=
U32_MAX value. But since size is declared as u32, it will always be
<= U32_MAX value so there is no point in the check.
Remove unnecessary check.
Jira NVGPU-4778
Change-Id: I9562afd1b31c3c6b095f607cbdf725d33d87effb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2279898
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new unit test to cover gops.gr.init.ecc_scrub_reg HAL function
gops.gr.init.ecc_scrub_reg HAL can generate TIMEOUT errors which are
not returned to caller currently. Update this HAL to return int value
for error propagation.
Jira NVGPU-4458
Change-Id: I98f4d5af2ef17cc4301951fec4d660638c8ef72c
Signed-off-by: dnibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265456
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- nvgpu_gr_ctx_load_golden_ctx_image() does not return any error, change
the return type to void
- Check for preemption modes greater than CILP in
nvgpu_gr_ctx_check_valid_preemption_mode
- Check if received class is valid or not in
nvgpu_gr_setup_set_preemption_mode
- Compile out entire nvgpu_gr_obj_ctx_init_ctxsw_preemption_mode since
it is really not doing anything in safety
- Remove the switch statement in nvgpu_gr_obj_ctx_set_compute_preemption_mode
since it is not possible to receive any other value than supported.
Previous function calls ensure that input values are validated.
- nvgpu_gr_obj_ctx_commit_global_ctx_buffers() does not return any
error, change the return type to void
- gops.gr.init.preemption_state HAL is not needed in safety since it
only configures gfxp related timeout
- remove redundant call to gops.gr.init.wait_idle in
nvgpu_gr_obj_ctx_commit_hw_state. We trigger wait despite earlier
failure in same function call.
Jira NVGPU-4457
Change-Id: I06a474ef7cc1b16fbc3846e0cad1cda6bb2bf2af
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2260938
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split GR ECC initialization into GPC/TPC and FECS ECC init as FECS ECC
errors during acr_construct_execute need to be reported and handled
hence FECS ECC counters are required to be initialized before
acr_construct_execute.
GPC/TPC ECC counters are dependent on the GR config that will be
initialized only after acr_construct_execute.
nvgpu_gr_intr_init_support is moved to nvgpu_gr_prepare_sw.
FECS ECC interrupt is enabled by default hence interrupt is not
enabled through gr_fecs_host_int_enable_r in nvgpu_gr_prepare_sw.
JIRA NVGPU-4439
Change-Id: Ifc9912f0578015a6ba1e9d38765c42633632b15f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2261987
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
handle_tex_exception hal is not set for safety build. Add
CONFIG_NVGPU_HAL_NON_FUSA checking for that hal.
log_mme_exception hal is supported only for turing. Add
CONFIG_NVGPU_DGPU checking for that hal.
gr_intr_handle_class_error always return -EINVAL. Change the
return as void to avoid unwanted error checking.
nvgpu_gr_intr_get_channel_from_ctx function parameter curr_tsgid will
never be NULL based on the current call. Remove unwanted
(curr_tsgid != NULL) check from this function.
Jira NVGPU-4454
Change-Id: I165d1cc5f9e308dfb11d905b59151b44f63a31bb
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2259763
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Compile out unused code in gr.falcon for safety build.
By default NVGPU_SEC_SECUREGPCCS is enabled in Safety build. Add
CONFIG_NVGU_GR_FALCON_NON_SECURE_BOOT checking with non secure code.
In gm20b_gr_falcon_wait_ctxsw_ready function watchdog timer value is
calculated based on clock rate, not needed for safety code.
Add CONFIG_NVGPU_HAL_NON_FUSE checking for unused code in that function.
gm20b_gr_falcon_gr_code_less_equal is the last op status we check.
SKIP or any other status following this will be failed in checking
for valid op status. So code reaching this function mean it is for
only for GR_IS_UCODE_OP_LESSER_EQUAL. So removing the checking for
opc_status != GR_IS_UCODE_OP_LESSER_EQUAL in this function
Jira NVGPU-4453
Change-Id: I156cac59f52779fa7f78052c1f0115d0e8f03bf9
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2258768
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Remove non-safe TPC powergate feature from the safety
build by introducing a new flag:
CONFIG_NVGPU_TPC_POWERGATE
- Move nvgpu_init_power_gate_gr() under same compile time flag.
and move HAL function gr_gv11b_powergate_tpc() to tpc_gv11b.c
- Also, remove the negative test scenario and
usage of tpc_powergate from unit tests
JIRA NVGPU-4149
Change-Id: If489482401e94de499e472b16b1bc091b00992e6
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2242323
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move code used only with graphics under
CONFIG_NVGPU_GRAPHICS check.
gm20b_gr_init_load_sw_bundle_init hal get called
without CONFIG_NVGPU_GR_GOLDEN_CTX_VERIFICATION check.
Remove dead code in
nvgpu_gr_ctx_check_valid_preemption_mode function.
Jira NVGPU-3968
Change-Id: I399126123006ae44dba29b3c08378d11fe82e543
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2247346
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
To enable ecc interrupts early during nvgpu_finalize_poweron, ecc
support has to be enabled early. ecc support was being initialized
together for GR, LTC, PMU, FB units late in the poweron sequence.
Move the ecc init for each unit to respective unit's init functions.
And separate out the hal ecc functions from GR ecc unit to
respective hal units.
JIRA NVGPU-4336
Change-Id: I2c42fb6ba3192dece00be61411c64a56ce16740a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2239153
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move graphics related defs and functions under CONFIG_NVGPU_GRAPHICS
switch.
Move classes not supported in GV11B under CONFIG_NVGPU_NON_FUSA
switch.
Add missing valid class numbers to gpu_class.is_valid HAL.
Also remove un-used class defs from class.h header.
Lot of qnx safety tests are still using graphics 3d class.
Until those tests got fixed, allowing 3d graphics class
as valid class for safety build.
JIRA NVGPU-4301
Change-Id: Ifd2a13bee3210821799c2bca10e7245eb3c79121
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2224658
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a.h will include gops_mc.h to contain the mc ops definitions. Add
doxygen comments for the HAL functions that are called directly.
Also move mc_gp10b_intr_pmu_unit_config to non-fusa HAL file.
JIRA NVGPU-2524
Change-Id: I4f326332d7842211b004b372d79fac9fe6ed40e7
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2226017
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Moved mmu replayable fault related code under CONFIG_NVGPU_REPLAYABLE_FAULT
switch, so that it will be compiled out for safety build.
Following hals and their related code also moved under
CONFIG_NVGPU_REPLAYABLE_FAULT switch:
void (*handle_replayable_fault)(struct gk20a *g);
int (*mmu_invalidate_replay)(struct gk20a *g, u32 invalidate_replay_val);
JIRA NVGPU-4302
Change-Id: I191ee0c181b276a04bc1531488862380af81a5c9
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2227176
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When a method is submitted to the FECS ucode using
gm20b_gr_falcon_submit_fecs_method_op, the status of the operation
is updated in the mailbox register. The driver can choose to skip validation of
the return status by setting op.cond.ok/fail = GR_IS_UCODE_OP_SKIP. At present
the driver continues to check for mailbox status despite the flag being set and
eventually times out.
Update gm20b_gr_falcon_submit_fecs_method_op so that mailbox status check is
skipped if op.cond.ok/fail is set to GR_IS_UCODE_OP_SKIP.
Change-Id: I45514933898924debedd727dc0c83570755e5b12
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2214039
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
FECS_FEATURE_OVERRIDE_ECC bits for SM_L0_CACHE and SM_L1_CHACHE
need to be checked against NV_PGRAPH_PRI_FECS_FEATURE_OVERRIDE_ECC_1
register.
Correct the error of checking those bits against
NV_PGRAPH_PRI_FECS_FEATURE_OVERRIDE_ECC register.
Jira NVGPU-4095
Change-Id: I09737b83496f9e728e0b022bd6a4e75741bd0c49
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2210429
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change eliminates MISRA Advisory Rule 18.4 violations in the
following cases:
* nvgpu_submit_append_gpfifo_user_direct()
* nvgpu_submit_append_gpfifo_common()
- use array-indexing to access gpfifo entry lists
* gv11b_gr_intr_record_sm_error_state()
- use array-indexing to access sm_error_states table
Advisory Rule 18.4 states that the +, -, +=, and -= operators should
not be applied to an expression of pointer type.
JIRA NVGPU-3798
Change-Id: I736930e4ba09a88888b0ef48f62496c4082ea5a1
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2210173
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>