mirror of
git://nv-tegra.nvidia.com/linux-nvgpu.git
synced 2025-12-24 10:34:43 +03:00
a. LAUNCH_ERR
- Userspace error.
- Triggered due to faulty launch.
- Handle using recovery to reset CE engine and teardown the
faulty channel.
b. An INVALID_CONFIG -
- Triggered when LCE is mapped to floorswept PCE.
- On iGPU, we use the default PCE 2 LCE HW mapping.
The default mapping can be read from NV_CE_PCE2LCE_CONFIG
INIT value in CE refmanual.
- NvGPU driver configures the mapping on dGPUs (currently only on
Turing).
- So, this interrupt can only be triggered if there is
kernel or HW error
- Recovery ( which is killing the context + engine reset) will
not help resolve this error.
- Trigger Quiesce as part of handling.
c. A MTHD_BUFFER_FAULT -
- NvGPU driver allocates fault buffers for all TSGs or contexts,
maps them in BAR2 VA space and writes the VA into channel
instance block.
- Can be triggered only due to kernel bug
- Recovery will not help, need quiesce
d. FBUF_CRC_FAIL
- Triggered when the CRC entry read from the method fault buffer
does not match the computed CRC from the methods contained in
the buffer.
- This indicates memory corruption and is a fatal interrupt which
at least requires the LCE to be reset before operations can
start again, if not the entire GPU.
- Better to quiesce on memory corruption
CE Engine reset (via recovery) will not help.
e. FBUF_MAGIC_CHK_FAIL
- Triggered when the MAGIC_NUM entry read from the method fault
buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL
- This indicates memory corruption and is a fatal interrupt
- Better to quiesce on memory corruption
f. STALLING_DEBUG
- Only triggered with SW write for debug purposes
- Debug interrupt, currently ignored
Move launch error handling from GP10b to GV11b HAL as -
1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not
defined on Pascal
2. We do not support GP10b on dev-main ToT
JIRA NVGPU-8102
Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
160 lines
4.7 KiB
C
160 lines
4.7 KiB
C
/*
|
|
* Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.
|
|
*
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
* to deal in the Software without restriction, including without limitation
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
*
|
|
* The above copyright notice and this permission notice shall be included in
|
|
* all copies or substantial portions of the Software.
|
|
*
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
|
* DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
#ifndef UNIT_NVGPU_CE_H
|
|
#define UNIT_NVGPU_CE_H
|
|
|
|
struct gk20a;
|
|
struct unit_module;
|
|
|
|
/** @addtogroup SWUTS-ce
|
|
* @{
|
|
*
|
|
* Software Unit Test Specification for CE
|
|
*/
|
|
|
|
/**
|
|
* Test specification for: test_ce_setup_env
|
|
*
|
|
* Description: Do basic setup before starting other tests.
|
|
*
|
|
* Test Type: Other (setup)
|
|
*
|
|
* Input: None
|
|
*
|
|
* Steps:
|
|
* - Initialize reg spaces used by tests.
|
|
* - Initialize required data for cg, mc modules.
|
|
*
|
|
* Output:
|
|
* - UNIT_FAIL if encounters an error creating reg space
|
|
* - UNIT_SUCCESS otherwise
|
|
*/
|
|
int test_ce_setup_env(struct unit_module *m,
|
|
struct gk20a *g, void *args);
|
|
|
|
/**
|
|
* Test specification for: test_ce_free_env
|
|
*
|
|
* Description: Do basic setup before starting other tests.
|
|
*
|
|
* Test Type: Other (setup)
|
|
*
|
|
* Input: None
|
|
*
|
|
* Steps:
|
|
* - Free reg spaces
|
|
*
|
|
* Output: UNIT_SUCCESS always.
|
|
*/
|
|
int test_ce_free_env(struct unit_module *m, struct gk20a *g, void *args);
|
|
|
|
/**
|
|
* Test specification for: test_ce_init_support
|
|
*
|
|
* Description: Validate CE init functionality.
|
|
*
|
|
* Test Type: Feature
|
|
*
|
|
* Targets: gops_ce.ce_init_support, nvgpu_ce_init_support
|
|
*
|
|
* Input: test_ce_setup_env must have been run.
|
|
*
|
|
* Steps:
|
|
* - Setup necessary mock HALs to do nothing and return success as appropriate.
|
|
* - Call nvgpu_ce_init_support and verify success is returned.
|
|
* - Set set_pce2lce_mapping and init_prod_values HAL function pointers to NULL
|
|
* for branch coverage.
|
|
* - Call nvgpu_ce_init_support and verify success is returned.
|
|
*
|
|
* Output: Returns PASS if expected result is met, FAIL otherwise.
|
|
*/
|
|
int test_ce_init_support(struct unit_module *m, struct gk20a *g, void *args);
|
|
|
|
/**
|
|
* Test specification for: test_ce_stall_isr
|
|
*
|
|
* Description: Validate stall interrupt handler functionality.
|
|
*
|
|
* Test Type: Feature
|
|
*
|
|
* Targets: gops_ce.isr_stall, gv11b_ce_stall_isr, gp10b_ce_stall_isr
|
|
*
|
|
* Input: test_ce_setup_env must have been run.
|
|
*
|
|
* Steps:
|
|
* - Set all CE interrupt sources pending in the interrupt status reg for each
|
|
* instance.
|
|
* - Call gops_ce.isr_stall.
|
|
* - Verify all (and only) the stall interrupts are cleared.
|
|
* - Set no CE interrupt sources pending in the interrupt status reg for each
|
|
* instance.
|
|
* - Call gops_ce.isr_stall.
|
|
* - Verify no interrupts are cleared.
|
|
*
|
|
* Output: Returns PASS if expected result is met, FAIL otherwise.
|
|
*/
|
|
int test_ce_stall_isr(struct unit_module *m, struct gk20a *g, void *args);
|
|
|
|
/**
|
|
* Test specification for: test_get_num_pce
|
|
*
|
|
* Description: Validate function to get number of PCEs.
|
|
*
|
|
* Test Type: Feature
|
|
*
|
|
* Targets: gops_ce.get_num_pce, gv11b_ce_get_num_pce
|
|
*
|
|
* Input: test_ce_setup_env must have been run.
|
|
*
|
|
* Steps:
|
|
* - Loop through all possible 16 bit values for the PCE Map register.
|
|
* - For each value, write to the PCE Map register.
|
|
* - Call gops_ce.get_num_pce and verify the correct number of PCEs is
|
|
* returned.
|
|
*
|
|
* Output: Returns PASS if expected result is met, FAIL otherwise.
|
|
*/
|
|
int test_get_num_pce(struct unit_module *m, struct gk20a *g, void *args);
|
|
|
|
/**
|
|
* Test specification for: test_init_prod_values
|
|
*
|
|
* Description: Validate prod value init functionality.
|
|
*
|
|
* Test Type: Feature
|
|
*
|
|
* Targets: gops_ce.init_prod_values, gv11b_ce_init_prod_values
|
|
*
|
|
* Input: test_ce_setup_env must have been run.
|
|
*
|
|
* Steps:
|
|
* - Clear the LCE Options register for all instances.
|
|
* - Call gops_ce.init_prod_values.
|
|
* - Verify all instances of the LCE Options register are set properly.
|
|
*
|
|
* Output: Returns PASS if expected result is met, FAIL otherwise.
|
|
*/
|
|
int test_init_prod_values(struct unit_module *m, struct gk20a *g, void *args);
|
|
|
|
#endif /* UNIT_NVGPU_CE_H */
|