mirror of
git://nv-tegra.nvidia.com/linux-nvgpu.git
synced 2025-12-22 09:12:24 +03:00
gpu: nvgpu: Add CE interrupt handling
a. LAUNCH_ERR
- Userspace error.
- Triggered due to faulty launch.
- Handle using recovery to reset CE engine and teardown the
faulty channel.
b. An INVALID_CONFIG -
- Triggered when LCE is mapped to floorswept PCE.
- On iGPU, we use the default PCE 2 LCE HW mapping.
The default mapping can be read from NV_CE_PCE2LCE_CONFIG
INIT value in CE refmanual.
- NvGPU driver configures the mapping on dGPUs (currently only on
Turing).
- So, this interrupt can only be triggered if there is
kernel or HW error
- Recovery ( which is killing the context + engine reset) will
not help resolve this error.
- Trigger Quiesce as part of handling.
c. A MTHD_BUFFER_FAULT -
- NvGPU driver allocates fault buffers for all TSGs or contexts,
maps them in BAR2 VA space and writes the VA into channel
instance block.
- Can be triggered only due to kernel bug
- Recovery will not help, need quiesce
d. FBUF_CRC_FAIL
- Triggered when the CRC entry read from the method fault buffer
does not match the computed CRC from the methods contained in
the buffer.
- This indicates memory corruption and is a fatal interrupt which
at least requires the LCE to be reset before operations can
start again, if not the entire GPU.
- Better to quiesce on memory corruption
CE Engine reset (via recovery) will not help.
e. FBUF_MAGIC_CHK_FAIL
- Triggered when the MAGIC_NUM entry read from the method fault
buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL
- This indicates memory corruption and is a fatal interrupt
- Better to quiesce on memory corruption
f. STALLING_DEBUG
- Only triggered with SW write for debug purposes
- Debug interrupt, currently ignored
Move launch error handling from GP10b to GV11b HAL as -
1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not
defined on Pascal
2. We do not support GP10b on dev-main ToT
JIRA NVGPU-8102
Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This commit is contained in:
committed by
mobile promotions
parent
15739c52e9
commit
b80b2bdab8
@@ -98,8 +98,8 @@ gv100_dump_engine_status
|
||||
gv100_read_engine_status_info
|
||||
gv11b_ce_get_num_pce
|
||||
gv11b_ce_init_prod_values
|
||||
gv11b_ce_mthd_buffer_fault_in_bar2_fault
|
||||
gv11b_ce_stall_isr
|
||||
gv11b_ce_get_inst_ptr_from_lce
|
||||
gv11b_channel_count
|
||||
gv11b_channel_read_state
|
||||
gv11b_channel_reset_faulted
|
||||
@@ -275,6 +275,7 @@ nvgpu_bug_unregister_cb
|
||||
nvgpu_can_busy
|
||||
nvgpu_ce_engine_interrupt_mask
|
||||
nvgpu_ce_init_support
|
||||
nvgpu_ce_stall_isr
|
||||
nvgpu_cg_blcg_fb_load_enable
|
||||
nvgpu_cg_blcg_ltc_load_enable
|
||||
nvgpu_cg_blcg_fifo_load_enable
|
||||
@@ -792,6 +793,7 @@ nvgpu_rc_gr_fault
|
||||
nvgpu_rc_sched_error_bad_tsg
|
||||
nvgpu_rc_tsg_and_related_engines
|
||||
nvgpu_rc_mmu_fault
|
||||
nvgpu_rc_ce_fault
|
||||
nvgpu_init_pramin
|
||||
gk20a_bus_set_bar0_window
|
||||
nvgpu_pramin_ops_init
|
||||
|
||||
@@ -98,8 +98,8 @@ gv100_dump_engine_status
|
||||
gv100_read_engine_status_info
|
||||
gv11b_ce_get_num_pce
|
||||
gv11b_ce_init_prod_values
|
||||
gv11b_ce_mthd_buffer_fault_in_bar2_fault
|
||||
gv11b_ce_stall_isr
|
||||
gv11b_ce_get_inst_ptr_from_lce
|
||||
gv11b_channel_count
|
||||
gv11b_channel_read_state
|
||||
gv11b_channel_reset_faulted
|
||||
@@ -283,6 +283,7 @@ nvgpu_bug_unregister_cb
|
||||
nvgpu_can_busy
|
||||
nvgpu_ce_engine_interrupt_mask
|
||||
nvgpu_ce_init_support
|
||||
nvgpu_ce_stall_isr
|
||||
nvgpu_cg_blcg_fb_load_enable
|
||||
nvgpu_cg_blcg_ltc_load_enable
|
||||
nvgpu_cg_blcg_fifo_load_enable
|
||||
@@ -811,6 +812,7 @@ nvgpu_rc_gr_fault
|
||||
nvgpu_rc_sched_error_bad_tsg
|
||||
nvgpu_rc_tsg_and_related_engines
|
||||
nvgpu_rc_mmu_fault
|
||||
nvgpu_rc_ce_fault
|
||||
gp10b_priv_ring_isr_handle_0
|
||||
gp10b_priv_ring_isr_handle_1
|
||||
nvgpu_cic_mon_setup
|
||||
|
||||
Reference in New Issue
Block a user