Commit Graph

1191 Commits

Author SHA1 Message Date
mpoojary
c1a995403a gpu: nvgpu: Add ACR error reporting to SDL
-Add check for ECC parity errors in IMEM, DMEM, EMEM, DCLS, REG
for ACR running in GSP engine.
The EXTIRQ3 external interrupt is set from ACR pointing towards host.
-Add function to check error type when ACR or Bootrom  execution fails
and report accordingly to SDL with relevant error codes.

This is a part of HSI safety requirements.

Bug 3564039
Jira NVGPU-8108

Change-Id: I65407371f7a1d1ba50a10bdf443ef6b903eeaa36
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-15 17:33:42 -07:00
Dinesh T
358f62a9d7 gpu: nvgpu: Add compression for safety
This is adding compression support for qnx-safety by
- Adding the compression related files under FUSA.
- Adding new posix contig-pool.c for user space compilation.

Bug 3426194

Change-Id: Ib3c8e587409dc12099c1196f55a87858d4dc520e
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2652963
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-15 17:30:57 -07:00
Tejal Kudav
b80b2bdab8 gpu: nvgpu: Add CE interrupt handling
a. LAUNCH_ERR
    - Userspace error.
    - Triggered due to faulty launch.
    - Handle using recovery to reset CE engine and teardown the
      faulty channel.

b. An INVALID_CONFIG -
    - Triggered when LCE is mapped to floorswept PCE.
    - On iGPU, we use the default PCE 2 LCE  HW mapping.
      The default mapping can be read from NV_CE_PCE2LCE_CONFIG
      INIT value in CE refmanual.
    - NvGPU driver configures the mapping on dGPUs (currently only on
      Turing).
    - So, this interrupt can only be triggered if there is
      kernel or HW error
    - Recovery ( which is killing the context + engine reset) will
      not help resolve this error.
    - Trigger Quiesce as part of handling.

c. A MTHD_BUFFER_FAULT -
    - NvGPU driver allocates fault buffers for all TSGs or contexts,
      maps them in BAR2 VA space and writes the VA into channel
      instance block.
    - Can be triggered only due to kernel bug
    - Recovery will not help, need quiesce

d. FBUF_CRC_FAIL
    - Triggered when the CRC entry read from the method fault buffer
      does not match the computed CRC from the methods contained in
      the buffer.
    - This indicates memory corruption and is a fatal interrupt which
      at least requires the LCE to be reset before operations can
      start again, if not the entire GPU.
    - Better to quiesce on memory corruption
      CE Engine reset (via recovery) will not help.

e. FBUF_MAGIC_CHK_FAIL
    - Triggered when the MAGIC_NUM entry read from the method fault
      buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL
    - This indicates memory corruption and is a fatal interrupt
    - Better to quiesce on memory corruption

f. STALLING_DEBUG
    - Only triggered with SW write for debug purposes
    - Debug interrupt, currently ignored

Move launch error handling from GP10b to GV11b HAL as -
1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not
   defined on Pascal
2. We do not support GP10b on dev-main ToT

JIRA NVGPU-8102

Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-14 17:12:14 -07:00
Tejal Kudav
15739c52e9 gpu: nvgpu: Fix NULL ptr deref during quiesce
g->fifo.runlists[] has size of g->fifo.max_runlists. During quiesce,
U32_MAX bitmask is passed to g->ops.runlist.write_state() HAL to
disable all the runlist. The Ga10b HAL implementation of
g->ops.runlist.write_state() references into runlists[] structure
for all the bits set in input runlist mask. For mask=U32_MAX,
there is NULL pointer dereference when runlist_id exceeds
g->fifo.max_runlists.
Add runlist_id boundary check before dereferencing the runlists[]
structure.

Update Gk20a HAL too with similar guard to make sure incorrect mask
doesn't get written to the register.

JIRA NVGPU-8102

Change-Id: Ic613aa38361b8b23d953c76d6924aba6bf6d5ea9
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680847
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-14 17:12:01 -07:00
Deepak Nibade
a1ef716f9d gpu: nvgpu: set graphics specific PRI values for graphics contexts
Add new HAL gops.gr.init.set_default_gfx_regs() to set graphics specific
PRI values for graphics contexts in function nvgpu_gr_obj_ctx_alloc().

Add new HAL gops.gr.init.capture_gfx_regs() to capture and save init
values for the PRIs. Add new struct nvgpu_gr_obj_ctx_gfx_regs to hold the
PRI init values.

Define HAL functions gv11b_gr_init_set_default_gfx_regs() and
gv11b_gr_init_capture_gfx_regs(). Set the HAL functions for
gv11b and ga10b.

Register accessors required to set PRIs are auto-generated.

Bug 3506078

Change-Id: I4c2843a274f3c924e402541e600e104ed0c9ed1c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671598
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Jonathan Mccaffrey <jmccaffrey@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-14 13:17:05 -07:00
Dinesh T
e4cf52123f gpu: nvgpu: Add ce halt function
This is adding CE halt fuction to reset CE properly
by setting stall req and waiting for stallack.

Bug 200641946

Change-Id: I501ccf68a4f6fe95911e73fa2eb65bde93a9f3e9
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678366
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-11 20:44:38 -08:00
Tejal Kudav
3bfab5df3f gpu: nvgpu: Disable fault mthd buf intrs on safety
Below CE interrupts are disabled on safety build as fault and
switch mechanism is not supported on safety:
NV_CE_LCE_INTR_STATUS_MTHD_BUFFER_FAULT
NV_CE_LCE_INTR_STATUS_FBUF_CRC_FAIL
NV_CE_LCE_INTR_STATUS_FBUF_MAGIC_CHK_FAIL

Bug 3548082

Change-Id: I400cd02a8c9888b7ef0d71bbc1f7d792b48e8227
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679052
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-10 16:04:37 -08:00
srajum
390df709ca gpu: nvgpu: fixing static analysis violation
- MISRA Rule 17.7
  The value returned by a function having non-void return type
  shall be used

JIRA NVGPU-5955

Change-Id: I59539042d05afa9e74272fc8645b2fe1fa8e42aa
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572085
(cherry picked from commit bd75b6196b7fac67fbf7a458e6bed9e3c7076ee8)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678671
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-10 16:03:28 -08:00
srajum
069fe05dca gpu: nvgpu: remove whitelisting for wrongly reported violations by tool
- Earlier we whitelisted wrongly reported static analysis violations
  by tool, raised coverity tool bugs for these cases.

- These bugs are fixed with new version of tool, so no need fo whitelisting.

JIRA NVGPU-7119

Change-Id: Ib2341db0d46fa7fac4c0cc9a6c1bdc8704377ef1
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2604365
(cherry picked from commit dc2d8ddaa409aefe0e04e0bacb3a8a977f6dbd64)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2677523
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-10 16:01:06 -08:00
Rajesh Devaraj
185dbf9192 gpu: nvgpu: add error ids for pmu, gsp
This patch does the following:
- Adds error IDs for GSP ACR and GSP SCHED.
- Updates error IDs for PMU.
- Removes reporting of DMEM ECC_CORRECTED since DMEM RAMs in PWR is
  protected only with parity mechanism, (ref: T23x_UPROC_Safety_IAS)
- Removes reporting of IMEM ECC_CORRECTED since IMEM RAMs for PROC in
  PWR is protected only with parity mechanism, (ref: T23x_UPROC_Safety_IAS)

JIRA NVGPU-8094

Change-Id: I127e78b1aa76b552758d1fff5bc7a01b5f8f3e54
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2677589
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-09 21:12:17 -08:00
Richard Zhao
cf43371073 nvgpu: vgpu: ga10b: enable compression
- contiguous mempool has been added on server side.
- init cbc support only on compression flag enabled
- enable compression flag only on silicon

Jira GVSCI-12883

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I339f25b81224b55124928231be65070660e27080
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676951
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-09 21:08:57 -08:00
Rajesh Devaraj
329807b8f9 gpu: nvgpu: update error ids for pgraph
This patch updates PGRAPH related error IDs for ga10b.
Since sub error type is not supported in Safety_Services 6.0, dedicated
error IDs have been allocated for all sub-errors in PGRAPH.

JIRA NVGPU-8094

Change-Id: Ic8de5815c5ea63e290d11ffca598e58812573603
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678289
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-09 04:42:36 -08:00
Dinesh T
162ad1bebf gpu: nvgpu: Add new errorid for GA10B
This is adding new error ids for GA10B and removing
some unused error ids.

Change-Id: Id5e360b9da9b6e352167575810b460e743cf8eb7
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676757
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-09 04:41:31 -08:00
Tejal Kudav
9b7c8cdd8c gpu: nvgpu: Update GR intr code as per Orin HSIs
Most SM RAMs are protected with parity (except L1 D-cache TAG mem
which is protected with SEC-DED ECC). The memory corruption errors
reported by these RAMs are therefore uncorrected errors only.
Remove the code to handle corrected errors from GR SM ECC.

The SM RAMS ECC errors currently report error to SDL using ID
GPU_SM_L1_TAG_ECC_(UN)CORRECTED. Update the error reporting to
use the newly created error IDs for Drive 6.0.

JIRA NVGPU-7987

Change-Id: Ic426d45f851d87aafaa7963b937535582cdafadf
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2674389
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-08 11:42:32 -08:00
Tejal Kudav
3fe70bf86e gpu: nvgpu: Update CE Intr code as per Orin HSIs
Below CE interrupts do not have any users(usecases) on safety build;
disable them only on safety build.
   1. BLOCKPIPE stall intr: Not used by GFX(VKSC) and CUDA on safety.
   2. NONBLOCK_PIPE nonstall intr: Non-stall intrs are not supported
          on safety build. Also, this one is not used by GFX(VKSC)
          and CUDA.
   3. STALLING_DEBUG intr: Added in Orin tree. It is only needed for
          debugging. Disable on safety build as there is no current
          usage in driver.
   4. POISON_ERROR intr: Poison is a fault containment and not
	  supported on GA10b.
   5. INVALID_CONFIG intr: Floor sweeping not supported on functional
          safety SKU.

Bug 3548082

Change-Id: I8d97ccb38f138b2c04a780e1c255a64d28723405
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671927
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-08 11:41:26 -08:00
srajum
585c3ab1c1 gpu: nvgpu: fixing MISRA violations
- Rule 4.12
  Dynamic memory allocation shall not be used.

- Rule 8.6
  "gp10b_device_info_parse_data" is declared but never defined

- Rule 5.7
  A tag name shall be a unique identifier

JIRA NVGPU-6536

Change-Id: I2f234d4aadd217f13b51e4dcadfa13d284a3750f
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582076
(cherry picked from commit 7394eedcdfd606a4687adba1ce82e96b5d6e23f8)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2677542
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-08 05:31:42 -08:00
Antony Clince Alex
c0f4723339 gpu: nvgpu: perbuf: update PMA buffer mapping
The PMA unit can only access GPU VAs within a 4GB window, hence both
the user allocated PMA buffer and the kernel allocated bytes available
buffer should lie in the same 4GB window. This is accomplished by
carving out and reserving a 4GB VA space in perbuf.vm and using fixed
GPU VAs to ensure that both buffers are bound within the same 4GB window.

In addition, update ALLOC_PMA_STREAM to use pma_buffer_offset,
pma_buffer_map_size fields correctly.

Bug 3503708

Change-Id: Ic5297a22c2db42b18ff5e676d565d3be3c1cd780
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671637
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-07 15:17:35 -08:00
mpoojary
e7c082aa66 gpu: nvgpu: Enable falcon debug flag for safety debug
Falcon safety debug flag was previously disabled for safety debug
profile. This patch enables the flag support for safety debug.

copy_from_dmem function is required to copy the debug info from
dmem debug buffer whenever there's an error generated.
Hence, moved copy_from_dmem function to fusa file from non-fusa
and added ifdef condition to only enable when non-fusa or falcon debug
flag is set.

Also, some fixes for type conversion error in falcon_debug.c during
compilation.

Bug 3482988

Change-Id: Ic0ea32b3227b84d4ba0835e6e1aeb40f58ec7327
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673900
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-07 06:25:35 -08:00
Sagar Kamble
2b2beb7fb6 gpu: nvgpu: ga10b: restore the ptimer isr hal
Below commit replaced ga10b_ptimer_isr with gk20a_ptimer_isr.

    commit 1528170f1c ("gpu: nvgpu: ga10b: update pri_hub and
                          ptimer error handling")

However, ga10b needs separate hal as timer_pri_timeout_save_0_addr_v()
definition is different for ga10b.

JIRA NVGPU-7986

Change-Id: I9593c90a41c5abdcad2989eb0867b921288064af
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676699
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
2022-03-07 02:25:52 -08:00
Sagar Kamble
a776f7b1d9 gpu: nvgpu: make global allowlist and range arrays static const
Allowlist and register ranges are declared global. Sparse throws warning
for them as:

 - allowlist_ga100.c:351:5: warning: symbol
   'ga100_cau_register_offset_allowlist' was not declared.
   Should it be static?
 - allowlist_ga100.c:389:47: warning: symbol
   'ga100_hwpm_pma_trigger_register_ranges' was not declared.
   Should it be static?

Make these arrays static global const.

Bug 3528472

Change-Id: I319f36c1579c630632b994295677c5831c1bff6b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676591
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-03 01:41:25 -08:00
Sagar Kamble
1528170f1c gpu: nvgpu: ga10b: update pri_hub and ptimer error handling
Replace ga10b_ptimer_isr with gk20a_ptimer_isr.

Remove GPU_PRI_ACCESS_VIOLATION reporting from gp10b hal as only
ga10b should be reporting these errors.

GPU_PRI_TIMEOUT_ERROR was only reported from ptimer ISR. However,
it is to be reported when error code is 0xbadf10xx that can be
seen through priv_ring ISR as well. Hence report this error
from ga10b_priv_ring_decode_error_code called from both bus
and priv_ring isr.

For other error cases GPU_PRI_ACCESS_VIOLATION is reported.

Other updates for priv_ring error handling are given below:

1. Add extra info decode functions for error codes:
 - 0xbad001xx, 0xbad002xx, 0xbad0daxx - decode_host_pri_error
 - 0xbadf13xx - decode_fecs_floorsweep_error
 - 0xbadf24xx, 0xbadf25xx, 0xbadf26xx - decode_gcgpc_error &
					decode_pri_local_decode_error
 - 0xbadf20xx, 0xbadf22xx - decode_fecs_pri_orphan_error
 - 0xbadf52xx - decode_pri_indirect_access_violation
 - 0xbadf60xx - decode_pri_lock_sec_sensor_violation

2. Add more info prints to decode_pri_falcom_mem_violation.

3. Add entry for extra info corresponding to 0x41 to
pri_client_error_extra_4x.

4. Separate extra info decode function for error 0xbadf50xx.

JIRA NVGPU-7986

Change-Id: I519a66e8a7a158de23ced5a092a2ebfd62c305be
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671337
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 11:59:25 -08:00
srajum
d19fd554b2 gpu: nvgpu: fixing MISRA 8.6 violation
- misra_c_2012_rule_8_6_violation:
 "gp10b_ltc_set_enabled" is declared but never defined.

JIRA NVGPU-7057

Change-Id: I981e9bbf1c9dcc864ea2404110567c28593880d3
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2610727
(cherry picked from commit 0aabd261f24846d8da7b90afbf7f2363368a0b82)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673695
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 06:08:06 -08:00
srajum
41a1212744 gpu: nvgpu: fixing MISRA 10.1 and 10.3 violations
- MISRA Rule 10.1
  The expression "g->syncpt_size" of non-boolean essential type is
  being interpreted as a boolean value for the operator "? :".

- MISRA Rule 10.3
  Implicit conversion of "(tmp <= 4294967295UL) ? tmp : 4294967295UL"
  from essential type "unsigned 64-bit int" to different or narrower
  essential type "unsigned 32-bit int"

JIRA NVGPU-6536

Change-Id: I56f01a13f3a8877317213d6fc846330ff3dfd700
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582289
(cherry picked from commit 4a51cad9b016a17ddec00cd6b35ec6c931a3c5c4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2674865
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 06:07:53 -08:00
srajum
a1ad3ccc83 gpu: nvgpu: remove unused function declarations
- "gv11b_ltc_get_err_desc", "gv11b_ltc_inject_ecc_error" API's
  are declared but not defined, so removing these

JIRA NVGPU-7119

Change-Id: Id2ef6bffbaf62c7e41be4bdc8b7f6b2354bc58b3
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2603822
(cherry picked from commit ae01e11e087a8a7fde18be765330ed90a8db3ae8)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673520
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-01 06:07:10 -08:00
Sagar Kamble
79b37d7832 gpu: nvgpu: dump debug info on semaphore acquire timeout
Channel RAMFC has details about the semaphore operation the channel
is performing. Getting this can be helpful in debugging the semaph-
ore acquire timeout.

Add gk20a_debug_dump to pbdma interrupt handler for this case.

Bug 3430929

Change-Id: Ia5e3b191a77a7e54d02f45ed2d1beb266905b564
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2675344
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-28 17:38:19 -08:00
Dinesh T
ef2a2be44f gpu: nvgpu: Add compression support with added contig memory pool
This is adding compression support for Ampere gpus by
the given contig memory pool.

Bug 3426194

Change-Id: I1c2400094296eb5448fe18f76d021a10c33ef861
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673581
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-27 18:10:41 -08:00
shashank singh
29019dff6e gpu: nvgpu: remove round_up usage in safety build
- In function gv11b_tsg_init_eng_method_buffers() PAGE_ALIGN can be used
  instead of round_up macro.
- In function nvgpu_posix_find_next_bit() rounding up of start does not
  seem to serve any purpose.

JIRA NVGPU-7057

Change-Id: I4a3a21e95a0f3aa38f7007de1f6959f1d878e511
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2614326
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2672107
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-23 11:08:31 -08:00
mkumbar
930c218810 gpu: nvgpu: ga10b: fix priv error for nvriscv bcr reg read
Read nvriscv bcr regsiter only if priv lockdown is released.

Reading bcr during priv lockdown triggers priv violation error.

Bug 3541062

Change-Id: Ib63f1ad634a945e0f9c573b4703217dbf887a776
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2672196
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-22 05:57:31 -08:00
Shashank Singh
5ec241a1d8 gpu: nvgpu: remove non stall intr from top handler for safety
On safety nonstall interrupt is not used and should be compiled out to
rule out any chance of interference with safety code. Remove top handler
support of nonstall interrupt for safety which is currently not
applicable to linux.

Jira NVGPU-7066
Jira NVGPU-4078

Change-Id: I278efc8da6ddd0f22c128af6630cfd1b20ba4784
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589006
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671586
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-21 02:31:38 -08:00
Antony Clince Alex
ca27a7d841 gpu: nvgpu: ga10b: move grmgr.load_timestamp_prod HAL
The timestamp control register in the SMCARB should be configured to have
the NV_PSMCARB_TIMESTAMP_CTRL_DISABLE_TICK field cleared, otherwise the PTIMER
ticks will not be sent to GR engine.  Hence, remove the pre-processor checks
around grmgr.load_timestamp_prod call.

Bug 3510460
Bug 3500065

Change-Id: I223cea1aca28a9215287f540eb961a16e3fe6626
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671021
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 05:03:01 -08:00
Shashank Singh
19a3b86f06 gpu: nvgpu: remove unused code from common.nvgpu on safety build
- remove unused code from common.nvgpu unit on safety build. Also,
remove the code which uses them in other places.
- document use of compiler intrinsics as mandated in code inspection
  checklist.

Jira NVGPU-6876

Change-Id: Ifd16dd197d297f56a517ca155da4ed145015204c
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561584
(cherry picked from commit 900391071e9a7d0448cbc1bb6ed57677459712a4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561583
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 04:58:32 -08:00
Antony Clince Alex
94255220f7 gpu: nvgpu: ga10b: add TPC interleaved priv segment support
The ctxsw ucode saves all the ctxsw'ed TPC priv registers in the TPC
priv segment of the ctxsw image. In ga10b, these registers can be stored
in either of the two arrangements:
- INTERLEAVED: means the format is sorted by address first, then by TPC number
- MIGRATION: exact opposite of interleaved.

Update HAL functions gr_ga10b_process_context_buffer_priv_segment,
gr_ga10b_find_priv_offset_in_buffer to detect the register layout and
calculate the register offset accordingly.

Bug 200737000
Bug 3532165

Change-Id: I305509cf89498cb0c2c5bfa1d867272bdf5f42b3
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2665491
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 00:07:12 -08:00
Antony Clince Alex
39db69a2dc gpu: nvgpu: ga10b: update final netlist to NETC
Update final netlist name to NETC for the ctxsw ucode with
HWCL: 52777872

Bug 200737000

Change-Id: I5699426c498235d33f7106b85fbffee30c35defc
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2654052
GVS: Gerrit_Virtual_Submit
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
2022-02-17 00:06:35 -08:00
Rajesh Devaraj
0699220b85 gpu: nvgpu: compile-out unused apis from safety build
This patch does the following changes:
- Compiles-out unused error reporting APIs and the related
  data structures from safety build. For this purpose, it
  introduces the new flag: CONFIG_NVGPU_INTR_DEBUG
- Updates nvgpu_report_err_to_sdl() API with one more argument,
  hw_unit_id. This aids in finding whether an error to be reported
  is corrected or uncorrected from LUT.
- Triggers SW quiesce, if an uncorrected error is reported to
  Safety_Services, in safety build.
- Renames files in cic folder by replacing gv11b with ga10b,
  since error reporting for gv11b is not supported in dev-main.

JIRA NVGPU-8002

Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-14 22:00:33 -08:00
Debarshi Dutta
10c3c0ddbb gpu: nvgpu: add FBP index conversion infra for MIG
Add a mapping between local ids and logical ids for FBPs.
This is enabled to support conversion for FBP local ids to
logical ids when memory partition is enabled for SMC.

Bug 200712091

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Iba33327a98bf427b21f37cbf7f2d5ee5619e7ae5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651964
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-11 13:29:14 -08:00
Debarshi Dutta
7db5f0d339 gpu: nvgpu: add perfmon Hals
Add following HALs for Ga100 and Ga10b. These will
be used for calculating chiplet offsets corresponding
to GPC/FBP perf register.

get_pmmgpcrouter_per_chiplet_offset
get_pmmfbprouter_per_chiplet_offset

get_hwpm_fbp_perfmon_regs_base
get_hwpm_gpc_perfmon_regs_base
get_hwpm_fbprouter_perfmon_regs_base
get_hwpm_gpcrouter_perfmon_regs_base

Bug 200712091

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Iec1a16ef4a3c26dca054c30d95bef991983dc2b7
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648832
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-11 13:29:02 -08:00
Debarshi Dutta
3d01b89e68 gpu: nvgpu: expose physical masks for GPCS/FBPs for MIG
Following changes are added
1) nvgpu_gr_config->gpc_tpc_mask_physical is now indexed by physical
gpc id instead of logical id.
2) Removed the conversion of logical fbp ids and replace them with
physical ids.
3) nvgpu_gpu_instance->fbp_en_mask now contains the mask of physical fbp ids.
4) gk20a_ctrl_ioctl_gpu_characteristics returns gpu.gpc_mask returns mask
of physical ids.

Bug 200712091

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0e066df76e07203ff4a5be5bfff2cef8566b425d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648831
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-11 13:28:50 -08:00
Martin Radev
3e4fb49270 gpu: nvgpu: Exit early on VAB_ERROR MMU fault
This patch updates the interaction between the VAB
packet polling code and the VAB_ERROR MMU fault handling
code. A shared atomic flag is used to determine if a
VAB_ERROR MMU fault has happened while polling, which will
result in polling be terminated immediately instead of
waiting on a timeout to happen. This allows testing VAB_ERROR
MMU fault handling in environments where a timeout may never
happen or happen very slowly.

The sequence for this to work is the following:
1) before requesting a VAB dump, which may trigger a fault,
   the atomic flag is atomically reset to 0.
2) polling eventually starts which atomically checks the flag
   in the loop. If flag is set, polling exits because the VAB
   result will never be available.
3) If a VAB_ERROR MMU fault is raised, this sets the flag to 1
   atomically.

Note that while there could be a race in this sequence if the
VAB_ERROR MMU fault handling is somehow delayed, the chance is
extremely slim because:
1) the race could only happen if the VAB dump code is re-entered
   before the earlier VAB_ERROR MMU fault is still pending.
2) the polling code has a large timeout
3) re-entering means a new ioctl/devctl

Bug 3425981

Change-Id: I422b15b581b0c3417abd4c66fbcdde9a0ff8cd9b
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2664103
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-10 20:31:10 -08:00
Rajesh Devaraj
7dc013d242 gpu: nvgpu: merge error reporting apis
In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to
Safety_Services. So, there is no need to have distinct APIs for
reporting errors from units like GR, MM, FIFO to SDL unit. All
these error reporting APIs will be replaced with a single API. To
meet this objective, this patch does the following changes:
- Replaces nvgpu_report_*_err with nvgpu_report_err_to_sdl.
- Removes the reporting of error messages.
- Replaces nvgpu_log() with nvgpu_err(), for error reporting.
- Removes error reporting to Safety_Services from nvgpu_report_*_err.

However, nvgpu_report_*_err APIs and their related files are not
removed. During the creation of nvgpu-mon, they will be moved under
nvgpu-rm, in debug builds.

Note:
- There will be a follow-up patch to fix error IDs.
- As discussed in https://nvbugs/3491596 (comment #12), the high
level expectation is to report only errors.

JIRA NVGPU-7450

Change-Id: I428f2a9043086462754ac36a15edf6094985316f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:41:18 -08:00
Ramesh Mylavarapu
2a98d20263 nvgpu: ga10b: gsp: implement runlist submit apis
- implemented device info cmd to send device info to the gsp for
  runlist submission. Currently GSP scheduler support only GR
  engine '0' instance.
- implemented runlist submit cmd. GSP firmware will submit the
  corresponding runlist by writing into submit registers. This
  command is direct replacement of hw_submit ga10b hal for GR engine.

NVGPU-6790

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I5dc573a6ad698fe20b49a3466a8e50b94cae74df
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2608923
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:38:56 -08:00
Ramesh Mylavarapu
9302b2efee gpu: nvgpu: gsp units separation
Separated gsp unit into three unit:
- GSP unit which holds the core functionality of GSP RISCV core,
  bootstrap, interrupt, etc.
- GSP Scheduler to hold the cmd/msg management, IPC, etc.
- GSP Test to hold stress test ucode specific support.

NVGPU-7492

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I12340dc776d610502f28c8574843afc7481c0871
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660619
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:38:21 -08:00
Vedashree Vidwans
9513679796 gpu: nvgpu: modify vab implementation
Currently, VAB implementation is using fixed number of access bits. This
value can be computed using fb_mmu_vidmem_access_bit_size_f() value.
- Modify VAB implementation to compute number of access bits.
- Modify nvgpu_vab structure to hold VAB entry size corresponding to
number of access bits.
- Information given by nvgpu_vab structure is more related to the GPU
than nvgpu_mm structure. Move nvgpu_vab struct element to gk20a struct.
- Add fb.set_vab_buffer_address to update vab buffer address in hw
registers.
- Rename gr.vab_init HAL to gr.vab_reserve to avoid any confusion about
when this HAL should be used.
-Replace gr.vab_release and gr.vab_recover with gr.vab_configure HAL.

Bug 3465734

Change-Id: I1b67bfa9be7728be5bda978c6bb87b196d55ab65
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2659467
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-04 05:43:27 -08:00
Antony Clince Alex
40397ac0c4 gpu: nvgpu: update CBC init sequence
At present, for each resume cycle the driver sends the
"nvgpu_cbc_op_clear" command to L2 cache controller, this causes the
contents of the compression bit backing store to be cleared, and results
in corrupting the metadata for all the compressible surfaces already allocated.
Fix this by updating cbc.init function to be aware of resume state and
not clear the compression bit backing store, instead issue
"nvgpu_cbc_op_invalide" command, this should leave the backing store in a
consistent state across suspend/resume cycles.

The updated cbc.init HAL for gv11b is reusable acrosss multiple chips, hence
remove unnecessary chip specific cbc.init HALs.

Bug 3483688

Change-Id: I2de848a083436bc085ee98e438874214cb61261f
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660075
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-01 06:03:33 -08:00
Sagar Kamble
29a0a146ac gpu: nvgpu: fix coverity defects
Fix following coverity defects:
  ioctl_prof.c resource leak
  ioctl_dbg.c logically dead code
  global_ctx.c identical code for branches
  therm_dev.c resource leak
  pmu_pstate.c unused value
  nvgpu_mem.c dead default in switch
  tsg.c Dereference before null check
  nvlink_gv100.c logically dead code
  nvlink.c Out-of-bounds write
  fifo_vgpu.c Dereference null return value
  pmu_pg.c Dereference before null check
  fw_ver_ops.c Identical code for different branches
  boardobjgrp.c Dereference after null check
  boardobjgrp.c Dereference before null check
  boardobjgrp.c Dereference after null check
  engines.c Dereference before null check
  nvgpu_init.c Unused value

CID 10127875
CID 10127820
CID 10063535
CID 10059311
CID 10127863
CID 9875900
CID 9865875
CID 9858045
CID 9852644
CID 9852635
CID 9852232
CID 9847593
CID 9847051
CID 9846056
CID 9846055
CID 9846054
CID 9842821

Bug 3460991

Change-Id: I91c215a545d07eb0e5b236849d5a8440ed6fe18d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2657444
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-28 04:50:12 -08:00
Richard Zhao
a3f3249c76 nvgpu: move .load_timestamp_prod to NON_FUSA and MIG
.load_timestamp_prod was defined protected by CONFIG_NVGPU_HAL_NON_FUSA
and CONFIG_NVGPU_MIG. This patch moves the implementation of
.load_timestamp_prod to the same macros.

Jira GVSCI-9976

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I3204f3e7085d4098be2ab73e3b5300214ef04cfa
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2659002
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-27 07:51:51 -08:00
Rajesh Devaraj
878235e914 gpu: nvgpu: remove report error callback
In DRIVE 6.0, NvGPU needs to support error reporting in QNX-Safety,
QNX-Standard, and Linux. To support error reporting in all these
platform variants, SDL unit will be moved from QNX to common code.
As part of this refactoring activity, this patch removes ops assignment
for report error. Also, it removes API calls that are used to take
time-stamp for stall interrupt thread. This time-stamp APIs will be
brought back later, if required to support periodic diagnostics.

JIRA NVGPU-7353

Change-Id: I38536019dc7165e6a97674863b37d009854af948
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2655958
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-24 02:06:24 -08:00
Antony Clince Alex
6bfa11c327 gpu: nvgpu: ga10b: update regops allowlist
Update regops allowlist using the latest hw headers.

Bug 3455929
Jira NVGPU-7365

Change-Id: I4f866b81de2a7d689f1b633a498a8c0c9a26a226
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651169
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-17 05:35:57 -08:00
Chris Johnson
c78998f99b gpu: nvgpu: leave LTC evicted_cb intr disabled
The evicted_cb interrupt is occurring more frequently than
expected and has no SW action that can be taken to avoid it.

This interrupt is being disabled which is consistent with
the HW POR value and the setting used on previous chips.

Bug 3464717

Signed-off-by: Chris Johnson <cwj@nvidia.com>
Change-Id: Ibc87f4bf287eeef158e46126a5e7f8a3cc575390
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2654678
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-16 23:13:31 -08:00
Seshendra Gadagottu
6935867a5e Revert "gpu: nvgpu: t234: update gating registers to avoid priv errors"
This reverts commit 319f4f6fe1.

Reason for revert: <priv errors are fixed by lowering PLM mask for
ltc registers NV_PLTCG_LTCS_CGATE_PRIV_LEVEL_MASK and
NV_PLTCG_LTCS_LTSS_CGATE_PRIV_LEVEL_MASK in acr-firmware>

Bug 3469873
Bug 3423549
Bug 3452217

Change-Id: I1237f65bfeab07d2287465b40798043d9edb209a
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2645803
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-15 06:08:26 -08:00
mpoojary
4e98b53944 gpu: nvgpu: ga10b: Update ga10b_is_pmu_supported
Update ga10b_is_pmu_supported function to add support
for pre-si platforms along with silicon.

JIRA NVGPU-4701

Change-Id: If7eec7753c01135c9c9c20d49278b3f1fe9332ae
Signed-off-by: mpoojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2652871
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-13 19:30:46 -08:00