Commit Graph

8 Commits

Author SHA1 Message Date
Tejal Kudav
b80b2bdab8 gpu: nvgpu: Add CE interrupt handling
a. LAUNCH_ERR
    - Userspace error.
    - Triggered due to faulty launch.
    - Handle using recovery to reset CE engine and teardown the
      faulty channel.

b. An INVALID_CONFIG -
    - Triggered when LCE is mapped to floorswept PCE.
    - On iGPU, we use the default PCE 2 LCE  HW mapping.
      The default mapping can be read from NV_CE_PCE2LCE_CONFIG
      INIT value in CE refmanual.
    - NvGPU driver configures the mapping on dGPUs (currently only on
      Turing).
    - So, this interrupt can only be triggered if there is
      kernel or HW error
    - Recovery ( which is killing the context + engine reset) will
      not help resolve this error.
    - Trigger Quiesce as part of handling.

c. A MTHD_BUFFER_FAULT -
    - NvGPU driver allocates fault buffers for all TSGs or contexts,
      maps them in BAR2 VA space and writes the VA into channel
      instance block.
    - Can be triggered only due to kernel bug
    - Recovery will not help, need quiesce

d. FBUF_CRC_FAIL
    - Triggered when the CRC entry read from the method fault buffer
      does not match the computed CRC from the methods contained in
      the buffer.
    - This indicates memory corruption and is a fatal interrupt which
      at least requires the LCE to be reset before operations can
      start again, if not the entire GPU.
    - Better to quiesce on memory corruption
      CE Engine reset (via recovery) will not help.

e. FBUF_MAGIC_CHK_FAIL
    - Triggered when the MAGIC_NUM entry read from the method fault
      buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL
    - This indicates memory corruption and is a fatal interrupt
    - Better to quiesce on memory corruption

f. STALLING_DEBUG
    - Only triggered with SW write for debug purposes
    - Debug interrupt, currently ignored

Move launch error handling from GP10b to GV11b HAL as -
1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not
   defined on Pascal
2. We do not support GP10b on dev-main ToT

JIRA NVGPU-8102

Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-14 17:12:14 -07:00
Tejal Kudav
3fe70bf86e gpu: nvgpu: Update CE Intr code as per Orin HSIs
Below CE interrupts do not have any users(usecases) on safety build;
disable them only on safety build.
   1. BLOCKPIPE stall intr: Not used by GFX(VKSC) and CUDA on safety.
   2. NONBLOCK_PIPE nonstall intr: Non-stall intrs are not supported
          on safety build. Also, this one is not used by GFX(VKSC)
          and CUDA.
   3. STALLING_DEBUG intr: Added in Orin tree. It is only needed for
          debugging. Disable on safety build as there is no current
          usage in driver.
   4. POISON_ERROR intr: Poison is a fault containment and not
	  supported on GA10b.
   5. INVALID_CONFIG intr: Floor sweeping not supported on functional
          safety SKU.

Bug 3548082

Change-Id: I8d97ccb38f138b2c04a780e1c255a64d28723405
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671927
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-08 11:41:26 -08:00
Richard Zhao
e81a36e56a gpu: nvgpu: hal: fix compile error of new compile flags
It's preparing to add bellow CFLAGS:
    -Werror -Wall -Wextra \
    -Wmissing-braces -Wpointer-arith -Wundef \
    -Wconversion -Wsign-conversion \
    -Wformat-security \
    -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough

Jira GVSCI-11640

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ia16ef186da1e97badff9dd0bf8cbd6700dd77b15
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555057
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-13 12:36:19 -08:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
Tejal Kudav
9f43914933 gpu: nvgpu: Move Intr handling common code to CIC
CIC (Central Interrupt controller) will be responsible for the
interrupt handling. common.cic unit is the placeholder for all
interrupt related code. Move interrupt related defines and
Public APIs present in common.mc to common.cic.
Note: The common.mc interrupts related struct definitions are
not moved as part of this patch.

Adapt the code to use interrupt handling related defines and public
APIs migrated from common.mc to common.cic

JIRA NVGPU-6899

Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-31 19:37:31 -07:00
Sagar Kamble
6fe794bc98 gpu: nvgpu: prepare ce_app.h header
In preparation for SWUD of CG unit, separate CE app related APIs
into separate header ce_app.h.

JIRA NVGPU-4143

Change-Id: I9be8a4f2eee3aaf3af71f5843f957052064d9651
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2221660
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Philip Elcan
9169e8c048 gpu: nvgpu: mc: move mc declarations to mc.h
Move declarations that belong to mc from gk20a.h to mc.h where they
belong.

JIRA NVGPU-2532

Change-Id: I91934ff60e2735c61d16459c04507fed6e1c96d7
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2214421
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Thomas Fleury
1160f083d4 gpu: nvgpu: move ce code to common/ce and hal/ce
Merged gk20a_ce_delete_context and gk20a_ce_delete_context_priv.

Renamed
- gk20a_init_ce_support -> nvgpu_ce_init_support
- gk20a_ce_destroy -> nvgpu_ce_destroy
- gk20a_ce_suspend -> nvgpu_ce_suspend
- gk20a_ce_create_context -> nvgpu_ce_create_context
- gk20a_ce_delete_context -> nvgpu_ce_delete_context
- gk20a_ce_execute_ops -> nvgpu_ce_execute_ops
- gk20a_ce_prepare_submit -> nvgpu_ce_prepare_submit
- gk20a_ce_put_fences -> nvgpu_ce_put_fences
- gk20a_ce_delete_gpu_context -> nvgpu_ce_delete_gpu_context
- gk20a_ce_get_method_size -> nvgpu_ce_get_method_size
- gk20a_gpu_ctx -> nvgpu_ce_gpu_ctx
- gk20a_gpu_ctx_from_list -> nvgpu_ce_gpu_ctx_from_list
- gk20a_ce_app -> nvgpu_ce_app
- gk20a_ce_debugfs_init -> nvgpu_ce_debugfs_init
- gk20a_get_valid_launch_flags -> nvgpu_ce_get_valid_launch_flags
- gk20a_ce2_isr -> gk20a_ce2_stall_isr
- gp10b_ce_isr -> gp10b_ce_stall_isr
- gv11b_ce_isr -> gv11b_ce_stall_isr

Inlined
- ce*_nonblockpipe_isr
- ce*_blockpipe_isr
- ce*_launcherr_isr

Added ce_priv.h for ce private definitions.

Moved files to common/ce and hal/fifo/ce
- ce2.c -> common/ce2/ce.c
- ce2_gk20a.c -> hal/ce/ce2_gk20a.c
- ce2_gk20a.h -> hal/ce/ce2_gk20a.h
- ce_gp10b.c -> hal/ce/ce_gp10b.c
- ce_gp10b.h -> hal/ce/ce_gp10b.h
- ce_gv11b.c -> hal/ce/ce_gv11b.c
- ce_gv11b.h -> hal/ce/ce_gv11b.h

Updated makefiles and #include directives

Jira NVGPU-1992

Change-Id: Ia6064bf51b7a254085be43a112d056cb6fb6c3b2
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093503
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-19 13:55:11 -07:00