Get number of SMs from GR instance specific nvgpu_gr_config pointer
instead of global SM count in below functions :
nvgpu_gr_fs_state_init()
gv11b_gr_init_sm_id_config()
Update nvgpu_gr_config_get_gpc_skip_mask() to return 0 in case gpc_index
is greater than available gpc_count. This is not MIG specific, but based
on code review possible even today for existing chips.
See gm20b_gr_init_pd_skip_table_gpc()
Update nvgpu_gr_get_override_ecc_val() to return GR instance specific
value.
Execute gr_init_setup_hw() for each GR instance.
Disable below failing unit tests:
nvgpu_gr_fs_state.test_gr_fs_state_error_injection
nvgpu_gr_init.test_gr_init_hal_config_error_injection
Jira NVGPU-5648
Change-Id: Ie8f1c0c304c634756786d85facf336a5c9ae8195
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410702
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
common.gr unit exports a separate API nvgpu_gr_prepare_sw to
initialize some SW pieces required for nvgpu_gr_enable_hw().
A separate API is really unnecessary since same initialization
can be performed in nvgpu_gr_alloc().
Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw().
Initialize falcon and interrupt structures in loop from
nvgpu_gr_alloc().
Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to
common init path since netlist parsing need not be done from
common.gr unit. It just needs to happen before nvgpu_gr_enable_hw().
Also, trigger nvgpu_gr_free() from gr_remove_support() instead
of OS specific paths. Also remove nvgpu_gr_free() calls from
probe error paths since nvgpu_gr_alloc is no longer called in
probe path.
Move interrupt and falcon data structure free calls to nvgpu_gr_free().
Also remove corresponding unit testing code that tests
nvgpu_gr_prepare_sw() specifically.
Update some unit tests to initialize ecc counters and netlist.
Disable some unit tests that fail for reasons unknown.
Jira NVGPU-5648
Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new API nvgpu_grmgr_get_num_gr_instances() that returns number of
GR instance enumerated by GR manager. This just returns number of sys
pipes enabled since it is same as number of GR instances.
For consistency until common.gr supports multiple GR instances
completely, add a temporary macro NVGPU_GR_NUM_INSTANCES and set it
to 1. If this macro is changed to 0 (for local MIG testing), fall
back to use nvgpu_grmgr_get_num_gr_instances() to get enumerated number
of GR instances.
Use a for loop to initialize other variables of struct nvgpu_gr.
Remove unnecessary NULL check in nvgpu_gr_alloc() since struct gk20a
pointer can never be NULL in this path. Also remove corresponding unit
test code.
Jira NVGPU-5648
Change-Id: Id151d634a23235381229044f2a9af89e390886f2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400151
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new unit test to cover gops.gr.init.ecc_scrub_reg HAL function
gops.gr.init.ecc_scrub_reg HAL can generate TIMEOUT errors which are
not returned to caller currently. Update this HAL to return int value
for error propagation.
Jira NVGPU-4458
Change-Id: I98f4d5af2ef17cc4301951fec4d660638c8ef72c
Signed-off-by: dnibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265456
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split GR ECC initialization into GPC/TPC and FECS ECC init as FECS ECC
errors during acr_construct_execute need to be reported and handled
hence FECS ECC counters are required to be initialized before
acr_construct_execute.
GPC/TPC ECC counters are dependent on the GR config that will be
initialized only after acr_construct_execute.
nvgpu_gr_intr_init_support is moved to nvgpu_gr_prepare_sw.
FECS ECC interrupt is enabled by default hence interrupt is not
enabled through gr_fecs_host_int_enable_r in nvgpu_gr_prepare_sw.
JIRA NVGPU-4439
Change-Id: Ifc9912f0578015a6ba1e9d38765c42633632b15f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2261987
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rename gr register space allocation and deallocation functions
to test_gr_init_setup and test_gr_remove_setup
Add tests to support following functions
nvgpu_gr_init
nvgpu_gr_init_support
nvgpu_gr_suspend
nvgpu_gr_remove_support
Jira NVGPU-3970
Change-Id: I11418ddcb9946ef75de162fd5689fdbbbfb62e79
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2194612
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move gr_init_prepare to separate gr unit test
Use of global register spaces between two different
gr unit tests corrupt the memory in multi thread support.
Add support for local register spaces with pre initialized
register values for each gr unit test.
Jira NVGPU-3582
Change-Id: I4e47c1ca4f312335cd33a73a377f9fa9f12ccd5f
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2189502
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This reverts commit 9bfdb2ba03f90f0cf828f08b99101a3a3e6c4532.
Bug 2693908
Change-Id: I3ef56773e46aad3626f16b84ea5e51c2fdcc3f1c
Signed-off-by: Bo Yan <byan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2189200
Add support gr_prepare for sw and hw.
Add needed registers using nvgpu_posix_io_add_reg_space calls.
Add unit tests covering following functions
nvgpu_gr_prepare_sw
nvgpu_gr_enable_hw
Copy the falcon ucode binaries under userspace/firmware
directory
install-unit.sh modified to copy the firmware binaries
under nvgpu-unnit/firmware directory
Jira NVGPU-3582
Change-Id: If2131d2c48e828251208da86688b0594e62de82e
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2184293
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>