Most of the Orin chip specific code is compiled out of safety build
with CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Remove the
config protection from Orin/GA10B specific code. Currently all code
is enabled. Code not required in safety will be compiled out later
in separate activity.
Other noteworthy changes in this patch related to safety build:
- In ga10b_ce_request_idle(), add a log print to dump num_pce so that
compiler does not complain about unused variable num_pce.
- In ga10b_fifo_ctxsw_timeout_isr(), protect variables active_eng_id and
recover under CONFIG_NVGPU_KERNEL_MODE_SUBMIT to fix compilation
errors of unused variables.
- Compile out HAL gops.pbdma.force_ce_split() from safety since this HAL
is GA100 specific and not required for GA10B.
- Compile out gr_ga100_process_context_buffer_priv_segment() with
CONFIG_NVGPU_DEBUGGER.
- Compile out VAB support with CONFIG_NVGPU_HAL_NON_FUSA.
- In ga10b_gr_intr_handle_sw_method(), protect left_shift_by_2 variable
with appropriate configs to fix unused variable compilation error.
- In ga10b_intr_isr_stall_host2soc_3(), compile ELPG function calls
with CONFIG_NVGPU_POWER_PG.
- In ga10b_pmu_handle_swgen1_irq(), move whole function body under
CONFIG_NVGPU_FALCON_DEBUG to fix unused variable compilation errors.
- Add below TU104 specific files in safety build since some of the code
in those files is required for GA10B. Unnecessary code will be
compiled out later on.
hal/gr/init/gr_init_tu104.c
hal/class/class_tu104.c
hal/mc/mc_tu104.c
hal/fifo/usermode_tu104.c
hal/gr/falcon/gr_falcon_tu104.c
- Compile out GA10B specific debugger/profiler related files from
safety build.
- Disable CONFIG_NVGPU_FALCON_DEBUG from safety debug build temporarily
to work around compilation errors seen with keeping this config
enabled. Config will be re-enabled in safety debug build later.
Jira NVGPU-7276
Change-Id: I35f2489830ac083d52504ca411c3f1d96e72fc48
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2627048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
After CONFIG_UBSAN kernel compilation flag to know any shifting
cause overflow or not enablement ,this is identified.
The register "gr_fe_tpc_fs_r(gpc_index)" is read only after
Volta. The gops where we are computing the index is not needed.
Bug 200727116
Change-Id: Ib2306103389ba9df77fd59d012ec70e775104989
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573296
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fuse registers should be queried with physical gpc-id and not the
logical ones. For tu104 and before chips physical gpc-ids are same as
logical for non-floorswept config but for newer chips it may differ.
Also, logical to physical mapping is not present for a floorswept gpc so
query gpc_tpc mask only upto actual gpcs that are present.
Jira NVGPU-6080
Change-Id: I84c4a3c1f256fdd1927f4365af26e9892fe91beb
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417721
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add separate flag gpu_dbg_gr to enable common.gr specific debugging.
Add this flag to all the existing debug logs that use gpu_dbg_fn or
gpu_dbg_info for debugging. Also add many other debugging logs that
might be helpful in debugging.
Removing debug log in gv11b_gr_init_get_nonpes_aware_tpc() as it dumps
too much data that does not seem useful.
Batch all interrupt enable functions in gr_init_setup_hw() together for
readability.
Jira NVGPU-5648
Change-Id: I0b857650122cdb1f974b452d28c26e7f142baf61
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2411740
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Get number of SMs from GR instance specific nvgpu_gr_config pointer
instead of global SM count in below functions :
nvgpu_gr_fs_state_init()
gv11b_gr_init_sm_id_config()
Update nvgpu_gr_config_get_gpc_skip_mask() to return 0 in case gpc_index
is greater than available gpc_count. This is not MIG specific, but based
on code review possible even today for existing chips.
See gm20b_gr_init_pd_skip_table_gpc()
Update nvgpu_gr_get_override_ecc_val() to return GR instance specific
value.
Execute gr_init_setup_hw() for each GR instance.
Disable below failing unit tests:
nvgpu_gr_fs_state.test_gr_fs_state_error_injection
nvgpu_gr_init.test_gr_init_hal_config_error_injection
Jira NVGPU-5648
Change-Id: Ie8f1c0c304c634756786d85facf336a5c9ae8195
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410702
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
To achieve permanent fault coverage, the CTAs launched by
each kernel in the mission and redundant contexts must execute on
different hardware resources. This feature proposes modifications
in the software to modify the virtual SM id to TPC mapping across
the mission and redundant contexts. The virtual SM identifier to TPC
mapping is done by nvgpu when setting up the patch context.
The recommendation for the redundant setting is to offset the
assignment by one TPC, and not by one GPC. This will ensure that both
GPC and TPC diversity. The SM and Quadrant diversity will happen
naturally. For kernels with few CTAs, the diversity is guaranteed
to be 100%. In case of completely random CTA allocation,
e.g. large number of CTAs in the waiting queue, the diversity is
1 - 1/#SM, or 87.5% for GV11B, 97.9% for TU104.
Added NvGpu CFLAGS to enable/disable the SM diversity support
"CONFIG_NVGPU_SM_DIVERSITY".
This support is only enabled on gv11b and tu104 QNX non safety build.
JIRA NVGPU-4685
Change-Id: I8e3eaa72d8cf7aff97f61e4c2abd10b2afe0fe8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2268026
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
HAL gops.gr.config.init_sm_id_table() initializes SM count in struct
nvgpu_gr_config. It is almost impossible that SM count is detected as
zero.
Hence remove the error check and add an assert instead.
This also helps with code coverage tests since it is difficult to
simulate error condition of having zero SMs detected.
Also, HAL gops.gr.config.init_sm_id_table() should always be defined
for each platform. Hence remove unnecessary check.
Jira NVGPU-4373
Change-Id: Ibd9b301b28d5bd2952367346a8f12fabcee2abd9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2247845
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add macros for whitelisting coverity violations. These macros use pragma
directives. The pragma directives and whitelisting macros are only
enabled when a coverity scan is being run.
The whitelisting macros have been added to a new header called
static_analysis.h. The contents of safe_ops.h (CERT C safe ops) have
been moved into static_analysis.h because this will be the new header
for static analysis related macros/defines/etc.
JIRA NVGPU-3820
Change-Id: I9c63f20f670880b420415535738034619314b7c3
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180600
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Below 5.7 violations are reported in common.gr.config unit :
nvgpu/drivers/gpu/nvgpu/common/gr/gr_config.c:628:
identifier_reuse: Identifier "sm_info" is already used to represent a type.
Fix them by renaming struct sm_info to struct nvgpu_sm_info
Jira NVGPU-3225
Change-Id: I26f70a4ed2a5a845e0dc9daeb8fb5474e35d42fb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2110986
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove gr_priv.h from outside gr files.
Add hal function in gr.init for get_no_of_sm. This helps
to avoid dereferencing gr in couple of files for g->gr->config and
avoid gr_priv.h include in those files.
Replace nvgpu_gr_config_get_no_of_sm call with
g->ops.gr.init.get_no_of_sm for files outside gr unit.
Jira NVGPU-3218
Change-Id: I435bb233f70986e31fbfcb900ada3b3bda0bc787
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109182
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Removed unused struct from gr_gk20a.h
Change static allocation for struct gr_gk20a to dynamic type.
Change all the files that being affected by that change.
Call gr allocation from corresponding init_support functions, which
are part of the probe functions.
nvgpu_pci_init_support in pci.c
vgpu_init_support in vgpu_linux.c
gk20a_init_support in module.c
Call gr free before the gk20a free call in nvgpu_free_gk20a.
Rename struct gr_gk20a to struct nvgpu_gr
JIRA NVGPU-3132
Change-Id: Ief5e664521f141c7378c4044ed0df5f03ba06fca
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2095798
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Many of the functions in common.gr.obj_ctx and common.gr.fs_state units
directly dereference struct gr_gk20a to obtain other structures
e.g. API nvgpu_gr_obj_ctx_set_ctxsw_preemption_mode() obtains pointer
to nvgpu_gr_config struct by direct access g->gr.config
Such accesses add dependency of these units on gr.h and hence create
circular dependency with common.gr.gr unit
Fix this by receiving all required structures in the function parameter
list itself
Jira NVGPU-1886
Change-Id: Iee973ae33fc7e1707b8f025ad61683f725dedb53
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2094995
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move get_nonpes_aware_tpc hal to hal.gr.init . This hal is
implemented for gv11b.
Update sm_id_numbering hal to pass the gr_config struct pointer
as parameter to avoid dereferencing from gr inside hal.
JIRA NVGPU-2951
Change-Id: I1e06b634cc36741e116e41e581a18c7f5b373945
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093835
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvgpu_gr_init_fs_state is right now defined in common.gr.gr unit
This API also needs to be called from common.gr.obj_ctx unit so obj_ctx
unit depends on gr unit for this.
common.gr.gr unit already depends on common.gr.obj_ctx for context
initialization. So this causes a circular dependency
Fix this by moving this API to new standalone unit common.gr.fs_state
Rename it to nvgpu_gr_fs_state_init
Jira NVGPU-1887
Change-Id: I88ca8e1a7bc3c544459462493116f95d92b9ab01
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2090496
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>