Commit Graph

89 Commits

Author SHA1 Message Date
Tejal Kudav
724f49f6eb gpu: nvgpu: Remove dependency on DGPU CONFIG
The error injection code was enabled only when CONFIG_NVGPU_DGPU = n
so that the dGPUs do not attempt any error injection callback
function registration. But, this introduced dependency on DGPU
config when needs to be explicitly set to n for error injection to
be enabled.
Remove the dependency by moving the error injection callback
registration and deregistration to a HAL which is enabled only
on GA10b.

Bug 3819160

Change-Id: I4f4eb99189b1af3502d719536a91cc5e5d866bce
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2787202
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-10-06 17:25:52 -07:00
rmylavarapu
30e7a5e5ed gpu: nvgpu: gsp sched: create and enable gsp virtual memory access
Changes
- Initialize virtual memory for gsp. This space is used for creating
  queues for ctrl fifo. Also can be used to ro map sync-pt to this
  instance where gsp firmware can poll the sync-pt with sync-pt id.
- Enabled gsp context interface and written the instance block pointer
  to nxtctx register for the gsp firmware to access created virtual memory.
- Added required gsp registers for this feature.

NVGPU-8730
Bug 3770916

Change-Id: If538f615eca3f9b7840ffe2787826528b4808886
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2764649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-10-06 17:16:21 -07:00
Sagar Kamble
693305c0fd gpu: nvgpu: subcontext add/remove support
Subcontext PDBs and valid mask in the instance blocks of the channels
in various subcontexts has to be updated when new subcontext is
created or a subcontext is removed.

Replayable fault state is cached in the channel structure. Replayable
fault state for subcontext is set based on first channel's bind
parameter. It was earlier programmed in function channel_setup_ramfc.

init_inst_block_core is updated to setup TSG level pdb map and mask.

Added new hal gv11b_channel_bind to enable the subcontext on channel
bind.

Bug 3677982

Change-Id: I58156c5b3ab6309b6a4b8e72b0e798d6a39c1bee
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719994
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
2022-09-08 21:00:20 -07:00
Debarshi Dutta
143034daab gpu: nvgpu: modify wait_pending
The wait_pending HAL is now modified to simply
check the pending status of a given runlist.
The while loop is removed from this HAL.

A new function nvgpu_runlist_wait_pending_legacy() is
added that emulates the older wait_pending() HAL.

nvgpu_runlist_tick() is modified to accept a 64 bit
"preempt_grace_ns" value.

These changes prepare for upcoming control-fifo parser
changes.

Jira NVGPU-8619

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: If3f288eb6f2181743c53b657219b3b30d56d26bc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-08-30 23:45:43 -07:00
Dinesh T
68976fbd22 gpu: nvgpu: gv11b+: set live pes mask
This change is reading the live pes from the register
"gr_gpc0_gpm_pd_live_physical_pes_r" and set it to
"gr_gpc0_swdx_pes_mask_r".

Every PES needs at least a TPC to work. If any of the TPCs
are floorswept,the live PES mask is read from
"gr_gpc0_gpm_pd_live_physical_pes_r" and  the corresponding
active PES mask is updated in "gr_gpc0_swdx_pes_mask_r".

Bug 3677421

Change-Id: I899ac41c4a82beb3ce75c84ad57dcad262a49ba1
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2736560
(cherry picked from commit 85f2ceb3db6eeef925b49553f445d8cc31ec39da)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2759135
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-08-12 11:05:35 -07:00
atanand
eae4593343 gpu: nvgpu: add ioctl to configure implicit ERRBAR
Add ioctl support to configure implicit ERRBAR by setting/unsetting
NV_PGRAPH_PRI_GPCS_TPCS_SM_SCH_MACRO_SCHED register.

Add gpu characteritics flag: NVGPU_SCHED_EXIT_WAIT_FOR_ERRBAR_SUPPORTED
to allow userspace driver to determine if implicit ERRBAR ioctl is
supported.

Bug: 200782861

Change-Id: I530a4cf73bc5c844e8d73094d3e23949568fe335
Signed-off-by: atanand <atanand@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718672
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-08-05 23:10:18 -07:00
Dinesh T
fb466b5b25 gpu: nvgpu: Enable ptimer
This is enabling ptimer in mme_config and
mme_fe1_config by setting the corresponding
field.
Debugger is expected to make use of ptimer.
So this is required for nvgpu to enable ptimer
in the register.

Bug 3637441

Change-Id: Id596a87081753bcaf945e54444a8abbd025b3f76
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2710632
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-07-07 07:30:52 -07:00
Antony Clince Alex
cc19bc8ac9 gpu: nvgpu: disable NVGPU_ERRATA_3288192 on non-auto platforms
NVGPU_ERRATA_3288192 is only applicable for automotive platforms, so have
it disabled on other platforms.

Bug 3604690

Change-Id: Id45622a629ebf9e2dfc10365981c3e1bde1e2941
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700380
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Tested-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-06 12:41:53 -07:00
Sagar Kamble
5b55088970 gpu: nvgpu: skip subctx pdb init during as-channel bind
While creating a new channel, ioctls are called in the below sequence:
  1. GPU_IOCTL_OPEN_CHANNEL
  2. AS_IOCTL_BIND_CHANNEL
  3. TSG_IOCTL_BIND_CHANNEL_EX
  4. CHANNEL_ALLOC_GPFIFO_EX
  5. CHANNEL_ALLOC_OBJ_CTX.

subctx pdbs and valid mask are programmed in the channel instance block
in the channel ioctls AS_IOCTL_BIND_CHANNEL & CHANNEL_ALLOC_GPFIFO_EX.

Programming them in the ioctl AS_IOCTL_BIND_CHANNEL is redundant.
Remove related hal g->ops.mm.init_inst_block_for_subctxs.

The hal init_inst_block will program context pdb and big page size.
The hal init_inst_block_core will program context pdb, big page size
and subctx 0 pdb. This is used by h/w units (fecs, pmu, hwpm, bar1,
bar2, sec2, gsp, perfbuf etc.).

For user channels, subctx pdbs are programmed as part of ramfc setup.

Bug 3677982

Change-Id: I6656b002d513404c1fd7c3d349933e80cca7e604
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680907
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-06-28 23:33:31 -07:00
prsethi
9890a185e0 gpu: nvpgu: update ZBC table values for ga10b+
- Patch updates the ZBC table values as per the POR values for safety
build.
- Fix the color table default values initialization for standard build
which was being done in floating point format for CROP while it should
be in FB format. As per the documentation "CROP ZBC table should be
programmed exactly the way the L2 table is programmed".

Bug 3585766

Change-Id: I47d11b6a230189ee0c818f850d36b93c0aea0e54
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2724935
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-06-16 15:52:42 -07:00
prsethi
697215afd3 gpu: nvpgu: configure static ZBC table
Patch defines a ZBC static table and configure it at sw layer. Later
existing API read this sw configuration and program it to hw.

This is applicable only for ga10b safety build and for other chips/
configuration it will be supported in the legacy way.

Bug 3585766

Change-Id: I00d79162c0b096616e3f555da965e82e47c014d1
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2713821
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-29 10:56:58 -07:00
atanand
2ebc0bdf98 gpu: nvgpu: add broadcast to unicast expansion
Add broadcast to unicast expansion for NV_PLTCG_LTCS_MISC_LTC_PM and
PMM*_[GPC|FBP]SROUTER broadcast registers for non-resident regops.

Bug: 3442801

Change-Id: I88dcf00f4f6e910f0342d3968970070e0248a786
Signed-off-by: atanand <atanand@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704951
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-28 08:59:44 -07:00
Dinesh T
6e4c3275bf gpu: nvgpu: Set max_ways_evict_cache to maximum
This is setting evict_max_ways for L2 cache to the maximum
supported value for safety.

In normal build L2 cache MAX_EVICT_LAST is configure via
KMD and RegOps. RegOps is enabled only on standard build
with CONFIG_DEBUGGER flag. This method we cant use it for
safety build. Safety we can make use of the patch buffer
to patch the register while creating the context.

JIRA NVGPU-8227

Change-Id: Iec5d73197239b9cad31c6b593ca2b87c224aad5e
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708702
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-18 22:57:54 -07:00
Richard Zhao
7a39d8e9ce gpu: nvgpu: hal: remove .fb.mem_unlock for ga10b
.fb.mem_unlock is specific to dGPU.

Jira GVSCI-9976

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Id1a52becaa5c2b0a00899c3bc92aa18058a1f97d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708394
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-18 00:59:05 -07:00
Sagar Kamble
d3b417ce2c gpu: nvgpu: address priv_ring unit code inspection gaps
1. Hardcoded constants are defined using #define are converted to
   const.
2. set_ppriv_timeout_settings HAL is not applicable from gm20b.
   Hence remove it completely.

JIRA NVGPU-6903

Change-Id: Ic096c5dc87aa45db0aa05482947cd032ae72bdd4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552581
(cherry picked from commit c5fb38a54208330f24754fed33d7242903dbac59)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-17 08:40:46 -07:00
Sagar Kamble
d82400d2b8 gpu: nvgpu: fix MISRA Rule 5.1 violation
BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault
started reporting below MISRA issue.

kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321:
  1. misra_c_2012_rule_5_1_violation: Declaration with identifier
     "nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous.
kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349:
  2. other_declaration: The first 31 characters of identifiers
     "nvgpu_tsg_unbind_channel_check_ctx_reload" and
     "nvgpu_tsg_unbind_channel_check_hw_state" are identical.

Do below renames to fix the issue. Doing both for consistency.

s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check
s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check

JIRA NVGPU-6772

Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152
(cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-11 04:18:12 -07:00
Antony Clince Alex
83fe3fd35e gpu: nvgpu: add errata NVGPU_ERRATA_3524791
Update PES, ROP exception handling for NVGPU_ERRATA_3524791. Enable the
errata for all Volta+ chips.

ROP, PES exceptions are being reported using the physical-id,
where logical-id should have been used. All ESR status registers are
reported using logical-id, so this matches with the SW expectation.
To address the (1), update ROP, PES exception handler translate from
physical to logical-id before reading the status registers.

Bug 3524791

Change-Id: Ieacbfb306bb0e69cf0113dc92f18e401573722e3
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680029
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-04-13 02:32:30 -07:00
Antony Clince Alex
62d6f753d2 gpu: nvgpu: add support for PES, ROP floorsweeping
Volta+ chips supports PES floorsweeping and Ampere+(iGPU) chips supports
ROP floorsweeping. At present, the driver isn't aware of PES, ROP
floorsweeping, make the driver PES, ROP floorsweeping aware by introducing the
following fields in nvgpu_gr_config:
- gpc_(rop/pes)_mask: Contains the bit mask of non FSed ROP/PES units per GPC.
- gpc_(rop/pes)_logical_id_map: Translates per GPC ROP/PES physical id to
  logical id.

Introduce the following HAL functions to read PES/ROP FS data:
- gops_fuse.fuse_status_opt_(pes/rop)_gpc: This fuction gets the FS
  config from the fuse.
- gops_top.get_max_(pes/rop)_per_gpc: Gets the maximum number of PES/ROP
  units that can be present in a GPC.

In addition, introduce the enabled flag NVGPU_SUPPORT_PES_FS to identify chips
which support PES floorsweeping, piggyback on NVGPU_SUPPORT_ROP_IN_GPC
enabled flag to identify ROP floorsweeping.

Bug 3524791

Change-Id: I065bab6c02618fe38892c8c890b069c340b85301
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679570
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-04-13 02:32:14 -07:00
Antony Clince Alex
19a8adeae1 gpu: nvgpu: prof: add new resource type
Add new profiler resource type NVGPU_PROFILER_PM_RESOURCE_TYPE_PC_SAMPLER.
Introduce regops HAL get_hwpm_pc_sampler_register_ranges to get
allowlist for PC_SAMPLER resources. Re-generate allowlist files to include
register ranges for PC_SAMPLER resources.

Update uapi header to advertise new resource type
NVGPU_PROFILER_PM_RESOURCE_ARG_PC_SAMPLER.

Bug 3408536

Change-Id: I7009ef822665771eed727da48ef1e89dcc6b9c4b
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689057
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-04-12 16:30:52 -07:00
mkumbar
0b9dc3dbc3 gpu: nvgpu: PMU NVRISCV engine HSI support
Below listed HSI are handled with PMU ISR handler and
all these triggers interrupt from individual unit upon issue.

-Add ECC check for IMEM, DMEM, DCLS, REG, and MPU as per
 HSI req
-Add MEMERR check for GPU_PMU_ACCESS_TIMEOUT_UNCORRECTED
 PMU HSI id
-Add IOPMP check for GPU_PMU_ILLEGAL_ACCESS_UNCORRECTED
 PMU HSI id
-Add WDT check for GPU_PMU_WDT_UNCORRECTED PMU HSI id

Bug 3491596
Bug 3366818

Change-Id: I751d653e447017ac62a2459da2c6bb9da506f438
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686566
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-25 13:25:39 -07:00
Antony Clince Alex
c6f07c14a9 Revert "nvgpu: ga10b: disable errata NVGPU_ERRATA_200601972"
This reverts commit 7194fe5f1f.

Bug 3514215

Change-Id: Ief98d8d34d72d233432d0ff0c1997c7f62b8b7d6
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2674181
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-25 13:13:58 -07:00
Divya
201b5c1c7f gpu: nvgpu: add SLCG support for GSP and CTRL unit
Add SLCG register programming for GSP and CTRL units

Bug 3452217

Change-Id: I69e414a82b5c12f26ff3b6626c328b5c0aa9e04c
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678782
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-18 07:54:48 -07:00
Antony Clince Alex
40231858a5 gpu: nvgpu: add enable flag for gpu emulate mode
Introduce enable flag NVGPU_SUPPORT_EMULATE_MODE, and bring emulate mode
feature under this flag. At present, gpu emulate mode is only support on
ga10b.

Jira NVGPU-8120

Change-Id: I85269992926c3cf8f2d1dd70882979e1c4656984
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2681613
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-17 19:48:03 -07:00
mpoojary
c1a995403a gpu: nvgpu: Add ACR error reporting to SDL
-Add check for ECC parity errors in IMEM, DMEM, EMEM, DCLS, REG
for ACR running in GSP engine.
The EXTIRQ3 external interrupt is set from ACR pointing towards host.
-Add function to check error type when ACR or Bootrom  execution fails
and report accordingly to SDL with relevant error codes.

This is a part of HSI safety requirements.

Bug 3564039
Jira NVGPU-8108

Change-Id: I65407371f7a1d1ba50a10bdf443ef6b903eeaa36
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-15 17:33:42 -07:00
Tejal Kudav
b80b2bdab8 gpu: nvgpu: Add CE interrupt handling
a. LAUNCH_ERR
    - Userspace error.
    - Triggered due to faulty launch.
    - Handle using recovery to reset CE engine and teardown the
      faulty channel.

b. An INVALID_CONFIG -
    - Triggered when LCE is mapped to floorswept PCE.
    - On iGPU, we use the default PCE 2 LCE  HW mapping.
      The default mapping can be read from NV_CE_PCE2LCE_CONFIG
      INIT value in CE refmanual.
    - NvGPU driver configures the mapping on dGPUs (currently only on
      Turing).
    - So, this interrupt can only be triggered if there is
      kernel or HW error
    - Recovery ( which is killing the context + engine reset) will
      not help resolve this error.
    - Trigger Quiesce as part of handling.

c. A MTHD_BUFFER_FAULT -
    - NvGPU driver allocates fault buffers for all TSGs or contexts,
      maps them in BAR2 VA space and writes the VA into channel
      instance block.
    - Can be triggered only due to kernel bug
    - Recovery will not help, need quiesce

d. FBUF_CRC_FAIL
    - Triggered when the CRC entry read from the method fault buffer
      does not match the computed CRC from the methods contained in
      the buffer.
    - This indicates memory corruption and is a fatal interrupt which
      at least requires the LCE to be reset before operations can
      start again, if not the entire GPU.
    - Better to quiesce on memory corruption
      CE Engine reset (via recovery) will not help.

e. FBUF_MAGIC_CHK_FAIL
    - Triggered when the MAGIC_NUM entry read from the method fault
      buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL
    - This indicates memory corruption and is a fatal interrupt
    - Better to quiesce on memory corruption

f. STALLING_DEBUG
    - Only triggered with SW write for debug purposes
    - Debug interrupt, currently ignored

Move launch error handling from GP10b to GV11b HAL as -
1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not
   defined on Pascal
2. We do not support GP10b on dev-main ToT

JIRA NVGPU-8102

Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-14 17:12:14 -07:00
Deepak Nibade
a1ef716f9d gpu: nvgpu: set graphics specific PRI values for graphics contexts
Add new HAL gops.gr.init.set_default_gfx_regs() to set graphics specific
PRI values for graphics contexts in function nvgpu_gr_obj_ctx_alloc().

Add new HAL gops.gr.init.capture_gfx_regs() to capture and save init
values for the PRIs. Add new struct nvgpu_gr_obj_ctx_gfx_regs to hold the
PRI init values.

Define HAL functions gv11b_gr_init_set_default_gfx_regs() and
gv11b_gr_init_capture_gfx_regs(). Set the HAL functions for
gv11b and ga10b.

Register accessors required to set PRIs are auto-generated.

Bug 3506078

Change-Id: I4c2843a274f3c924e402541e600e104ed0c9ed1c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671598
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Jonathan Mccaffrey <jmccaffrey@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-14 13:17:05 -07:00
Dinesh T
e4cf52123f gpu: nvgpu: Add ce halt function
This is adding CE halt fuction to reset CE properly
by setting stall req and waiting for stallack.

Bug 200641946

Change-Id: I501ccf68a4f6fe95911e73fa2eb65bde93a9f3e9
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678366
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-11 20:44:38 -08:00
Tejal Kudav
3fe70bf86e gpu: nvgpu: Update CE Intr code as per Orin HSIs
Below CE interrupts do not have any users(usecases) on safety build;
disable them only on safety build.
   1. BLOCKPIPE stall intr: Not used by GFX(VKSC) and CUDA on safety.
   2. NONBLOCK_PIPE nonstall intr: Non-stall intrs are not supported
          on safety build. Also, this one is not used by GFX(VKSC)
          and CUDA.
   3. STALLING_DEBUG intr: Added in Orin tree. It is only needed for
          debugging. Disable on safety build as there is no current
          usage in driver.
   4. POISON_ERROR intr: Poison is a fault containment and not
	  supported on GA10b.
   5. INVALID_CONFIG intr: Floor sweeping not supported on functional
          safety SKU.

Bug 3548082

Change-Id: I8d97ccb38f138b2c04a780e1c255a64d28723405
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671927
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-08 11:41:26 -08:00
mpoojary
e7c082aa66 gpu: nvgpu: Enable falcon debug flag for safety debug
Falcon safety debug flag was previously disabled for safety debug
profile. This patch enables the flag support for safety debug.

copy_from_dmem function is required to copy the debug info from
dmem debug buffer whenever there's an error generated.
Hence, moved copy_from_dmem function to fusa file from non-fusa
and added ifdef condition to only enable when non-fusa or falcon debug
flag is set.

Also, some fixes for type conversion error in falcon_debug.c during
compilation.

Bug 3482988

Change-Id: Ic0ea32b3227b84d4ba0835e6e1aeb40f58ec7327
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673900
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-07 06:25:35 -08:00
Sagar Kamble
2b2beb7fb6 gpu: nvgpu: ga10b: restore the ptimer isr hal
Below commit replaced ga10b_ptimer_isr with gk20a_ptimer_isr.

    commit 1528170f1c ("gpu: nvgpu: ga10b: update pri_hub and
                          ptimer error handling")

However, ga10b needs separate hal as timer_pri_timeout_save_0_addr_v()
definition is different for ga10b.

JIRA NVGPU-7986

Change-Id: I9593c90a41c5abdcad2989eb0867b921288064af
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676699
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
2022-03-07 02:25:52 -08:00
Sagar Kamble
1528170f1c gpu: nvgpu: ga10b: update pri_hub and ptimer error handling
Replace ga10b_ptimer_isr with gk20a_ptimer_isr.

Remove GPU_PRI_ACCESS_VIOLATION reporting from gp10b hal as only
ga10b should be reporting these errors.

GPU_PRI_TIMEOUT_ERROR was only reported from ptimer ISR. However,
it is to be reported when error code is 0xbadf10xx that can be
seen through priv_ring ISR as well. Hence report this error
from ga10b_priv_ring_decode_error_code called from both bus
and priv_ring isr.

For other error cases GPU_PRI_ACCESS_VIOLATION is reported.

Other updates for priv_ring error handling are given below:

1. Add extra info decode functions for error codes:
 - 0xbad001xx, 0xbad002xx, 0xbad0daxx - decode_host_pri_error
 - 0xbadf13xx - decode_fecs_floorsweep_error
 - 0xbadf24xx, 0xbadf25xx, 0xbadf26xx - decode_gcgpc_error &
					decode_pri_local_decode_error
 - 0xbadf20xx, 0xbadf22xx - decode_fecs_pri_orphan_error
 - 0xbadf52xx - decode_pri_indirect_access_violation
 - 0xbadf60xx - decode_pri_lock_sec_sensor_violation

2. Add more info prints to decode_pri_falcom_mem_violation.

3. Add entry for extra info corresponding to 0x41 to
pri_client_error_extra_4x.

4. Separate extra info decode function for error 0xbadf50xx.

JIRA NVGPU-7986

Change-Id: I519a66e8a7a158de23ced5a092a2ebfd62c305be
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671337
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 11:59:25 -08:00
Dinesh T
ef2a2be44f gpu: nvgpu: Add compression support with added contig memory pool
This is adding compression support for Ampere gpus by
the given contig memory pool.

Bug 3426194

Change-Id: I1c2400094296eb5448fe18f76d021a10c33ef861
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673581
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-27 18:10:41 -08:00
Antony Clince Alex
ca27a7d841 gpu: nvgpu: ga10b: move grmgr.load_timestamp_prod HAL
The timestamp control register in the SMCARB should be configured to have
the NV_PSMCARB_TIMESTAMP_CTRL_DISABLE_TICK field cleared, otherwise the PTIMER
ticks will not be sent to GR engine.  Hence, remove the pre-processor checks
around grmgr.load_timestamp_prod call.

Bug 3510460
Bug 3500065

Change-Id: I223cea1aca28a9215287f540eb961a16e3fe6626
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671021
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 05:03:01 -08:00
Antony Clince Alex
94255220f7 gpu: nvgpu: ga10b: add TPC interleaved priv segment support
The ctxsw ucode saves all the ctxsw'ed TPC priv registers in the TPC
priv segment of the ctxsw image. In ga10b, these registers can be stored
in either of the two arrangements:
- INTERLEAVED: means the format is sorted by address first, then by TPC number
- MIGRATION: exact opposite of interleaved.

Update HAL functions gr_ga10b_process_context_buffer_priv_segment,
gr_ga10b_find_priv_offset_in_buffer to detect the register layout and
calculate the register offset accordingly.

Bug 200737000
Bug 3532165

Change-Id: I305509cf89498cb0c2c5bfa1d867272bdf5f42b3
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2665491
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 00:07:12 -08:00
Rajesh Devaraj
0699220b85 gpu: nvgpu: compile-out unused apis from safety build
This patch does the following changes:
- Compiles-out unused error reporting APIs and the related
  data structures from safety build. For this purpose, it
  introduces the new flag: CONFIG_NVGPU_INTR_DEBUG
- Updates nvgpu_report_err_to_sdl() API with one more argument,
  hw_unit_id. This aids in finding whether an error to be reported
  is corrected or uncorrected from LUT.
- Triggers SW quiesce, if an uncorrected error is reported to
  Safety_Services, in safety build.
- Renames files in cic folder by replacing gv11b with ga10b,
  since error reporting for gv11b is not supported in dev-main.

JIRA NVGPU-8002

Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-14 22:00:33 -08:00
Debarshi Dutta
7db5f0d339 gpu: nvgpu: add perfmon Hals
Add following HALs for Ga100 and Ga10b. These will
be used for calculating chiplet offsets corresponding
to GPC/FBP perf register.

get_pmmgpcrouter_per_chiplet_offset
get_pmmfbprouter_per_chiplet_offset

get_hwpm_fbp_perfmon_regs_base
get_hwpm_gpc_perfmon_regs_base
get_hwpm_fbprouter_perfmon_regs_base
get_hwpm_gpcrouter_perfmon_regs_base

Bug 200712091

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Iec1a16ef4a3c26dca054c30d95bef991983dc2b7
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648832
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-11 13:29:02 -08:00
Rajesh Devaraj
7dc013d242 gpu: nvgpu: merge error reporting apis
In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to
Safety_Services. So, there is no need to have distinct APIs for
reporting errors from units like GR, MM, FIFO to SDL unit. All
these error reporting APIs will be replaced with a single API. To
meet this objective, this patch does the following changes:
- Replaces nvgpu_report_*_err with nvgpu_report_err_to_sdl.
- Removes the reporting of error messages.
- Replaces nvgpu_log() with nvgpu_err(), for error reporting.
- Removes error reporting to Safety_Services from nvgpu_report_*_err.

However, nvgpu_report_*_err APIs and their related files are not
removed. During the creation of nvgpu-mon, they will be moved under
nvgpu-rm, in debug builds.

Note:
- There will be a follow-up patch to fix error IDs.
- As discussed in https://nvbugs/3491596 (comment #12), the high
level expectation is to report only errors.

JIRA NVGPU-7450

Change-Id: I428f2a9043086462754ac36a15edf6094985316f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:41:18 -08:00
Ramesh Mylavarapu
2a98d20263 nvgpu: ga10b: gsp: implement runlist submit apis
- implemented device info cmd to send device info to the gsp for
  runlist submission. Currently GSP scheduler support only GR
  engine '0' instance.
- implemented runlist submit cmd. GSP firmware will submit the
  corresponding runlist by writing into submit registers. This
  command is direct replacement of hw_submit ga10b hal for GR engine.

NVGPU-6790

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I5dc573a6ad698fe20b49a3466a8e50b94cae74df
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2608923
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:38:56 -08:00
Vedashree Vidwans
9513679796 gpu: nvgpu: modify vab implementation
Currently, VAB implementation is using fixed number of access bits. This
value can be computed using fb_mmu_vidmem_access_bit_size_f() value.
- Modify VAB implementation to compute number of access bits.
- Modify nvgpu_vab structure to hold VAB entry size corresponding to
number of access bits.
- Information given by nvgpu_vab structure is more related to the GPU
than nvgpu_mm structure. Move nvgpu_vab struct element to gk20a struct.
- Add fb.set_vab_buffer_address to update vab buffer address in hw
registers.
- Rename gr.vab_init HAL to gr.vab_reserve to avoid any confusion about
when this HAL should be used.
-Replace gr.vab_release and gr.vab_recover with gr.vab_configure HAL.

Bug 3465734

Change-Id: I1b67bfa9be7728be5bda978c6bb87b196d55ab65
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2659467
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-04 05:43:27 -08:00
Antony Clince Alex
40397ac0c4 gpu: nvgpu: update CBC init sequence
At present, for each resume cycle the driver sends the
"nvgpu_cbc_op_clear" command to L2 cache controller, this causes the
contents of the compression bit backing store to be cleared, and results
in corrupting the metadata for all the compressible surfaces already allocated.
Fix this by updating cbc.init function to be aware of resume state and
not clear the compression bit backing store, instead issue
"nvgpu_cbc_op_invalide" command, this should leave the backing store in a
consistent state across suspend/resume cycles.

The updated cbc.init HAL for gv11b is reusable acrosss multiple chips, hence
remove unnecessary chip specific cbc.init HALs.

Bug 3483688

Change-Id: I2de848a083436bc085ee98e438874214cb61261f
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660075
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-01 06:03:33 -08:00
Richard Zhao
a3f3249c76 nvgpu: move .load_timestamp_prod to NON_FUSA and MIG
.load_timestamp_prod was defined protected by CONFIG_NVGPU_HAL_NON_FUSA
and CONFIG_NVGPU_MIG. This patch moves the implementation of
.load_timestamp_prod to the same macros.

Jira GVSCI-9976

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I3204f3e7085d4098be2ab73e3b5300214ef04cfa
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2659002
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-27 07:51:51 -08:00
Deepak Nibade
4fd0f11e9c gpu: nvgpu: define gops.gr.init.set_default_compute_regs for Orin safety
gops.gr.init.set_default_compute_regs() HAL configures compute specific
settings in safety build and this eliminates need of using SW methods.

Define this HAL for Orin safety build and configure sked check related
registers from the HAL. Other settings done on gv11b are no more
applicable for ga10b safety.

Bug 3456240

Change-Id: Ic125cdf414a5402511949015e3424b8cb2dab1e0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646284
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-23 13:14:58 -08:00
Martin Radev
b67a3cd053 gpu: nvgpu: ga10b: Correct VAB implementation
This patch performs the following improvements for VAB:
1) It avoids an infinite loop when collecting VAB information.
   Previously, nvgpu incorrectly assumed that the valid bit would
   be eventually set for the checker when polling. It may not be set
   if a VAB-related fault has occurred.
2) It handles the VAB_ERROR mmu fault which may be caused for various
   reasons: invalid vab buffer address, tracking in protected mode,
   etc. The recovery sequence is to set the vab buffer size to 0 and
   then to the original size. This clears the VAB_ERROR bit. After
   reseting, the old register values are again set in the recovery
   code sequence.
3) Use correct number of VAB buffers. There's only one VAB buffer on
   ga10b, not two.
4) Simplify logic.

Bug 3374805
Bug 3465734
Bug 3473147

Change-Id: I716f460ef37cb848ddc56a64c6f83024c4bb9811
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621290
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-22 08:22:13 -08:00
Shashank Singh
41c0df629b gpu: nvgpu: enable intr_get_unit_info hal for safety orin
Bug 3456240

Change-Id: Id2f0b8f4954669b063135666613f0ffc7891ec04
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2642611
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-18 02:51:44 -08:00
Sagar Kamble
6a6562cd4d gpu: nvgpu: ga10x: fix LTC ecc handling
Notable differences from GV11B are below:
  1. RSTG/TSTG uncorrected errors are supported.
  2. PLTS_INTR doesn't report SEC/DED errors. Instead, PLTS_INTR3 will
     indicate the SEC/DED errors through CORRECTED_ERR_DSTG and
     UNCORRECTED_ERR_DSTG fields respectively.
  3. DSTG_ECC_ADDRESS and DSTG_ECC_REPORT are deprecated.

Bug 3446731

Change-Id: I60018d1b3825adcbb287dea05bc96a87f559c969
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2633959
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-17 14:36:51 -08:00
Sagar Kamble
449a4823d4 gpu: nvgpu: compile out non fusa LTC functionality
nvgpu_ltc_sync_enabled functionality is used only in the kernel mode
submit path and for debugging. en_illegal_compstat functionality is
used for debugging .

Compile them out under CONFIG_NVGPU_NON_FUSA.

JIRA NVGPU-6982

Change-Id: I404d4b74b2e60ba4c2173ba0bfb643b1ecb6ba7c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2605011
(cherry picked from commit f4bcafe73c8f7184b5e125e3ff6e55ceccaf87eb)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632547
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-17 14:36:40 -08:00
Divya
9446cfa320 gpu: nvgpu: update golden image flag for RG seq
The flag pmu->pg->golden_image_initialized is set to
true during initial GPU context creation and is not
cleared while the GPU goes into pm_suspend (during railgate).
Hence, when the GPU resumes after un-railgate it retains
the previous value which can cause ELPG to kick in immediately.
Due to this, when ELPG and Railgating are enabled, IDLE_SNAP
is seen for read access of gr_gpc0_tpc0_sm_arch_r reg.

To resolve this, if golden image is ready set the
pmu->pg->golden_image_initialized to suspend state during railgate,
to delay the early enable of ELPG. Add a new
pmu_init_golden_img_state hal in the NVGPU_INIT_TABLE_ENTRY.
This will be called after all the GR access is done and GPU resumes
completely after un-railgate. This hal will then check if
golden_image_initialized flag is in suspend state, it will set it
to ready state and then re-enable ELPG.

Bug 3431798

Change-Id: I1fee83e66e09b6b78d385bbe60529d0724f79e79
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639188
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-12-11 14:06:49 -08:00
Mahantesh Kumbar
ce7d589a4d gpu: nvgpu: ga10b: add PMU interrupt check hal
-GA10B PMU IRQ registers are not accessible when NVRISCV PRIV lockdown
 is engaged, so need to skip accessing IRQ registers.

NVGPU-7061

Change-Id: If5233e502a9bef838839376c412582e08d729a99
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2636964
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-06 04:37:39 -08:00
Dinesh T
ad09e3e3cc gpu: nvgpu: Enable sm_l1tag_surface_cut_collector
This is enabling sm_l1tag_surface_cut_collector at gpu boot.
This is done with adding new hal "set_sm_l1tag_surface_collector"
that sets l1tag_surface_cut_collector in the sm_l1tag_ctrl
register.

Bug 2557724

Change-Id: I869e3bfa563db204259e7a464657229632f182d9
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2634878
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-06 04:36:56 -08:00
dt
e1d6b8af8d gpu: nvgpu: ga10x: compute gnic_stride
GNIC register stride calculation is fixed by adding new hal to compute
the stride by getting the difference of gpc1 and gpc0 xbar_gnic strides
for ga10x GPUs.

Bug 200782045

Change-Id: Iaa84109bd9f1a974ef1af6fee136ca1fcc89bbb1
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2624848
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-01 08:40:36 -08:00