Commit Graph

8569 Commits

Author SHA1 Message Date
Vedashree Vidwans
a3e2283cf2 gpu: nvgpu: ga10b: Use active ltcs count for cbc init
This patch fixes a bug in the cbc initialization code for ga10b,
where it was erroneously assumed that a fixed ltc count of only one
should be used for historical reasons. For volta and later, the full
ltc count should be used in cbc-related computation.
Ensure
- CBC base address is 64K aligned
- CBC start address lies within CBC allocated memory

Check CBC is marked safe only for silicon platform.

Bug 3353418

Change-Id: I5edee2a78dc9e8c149e111a9f088a57e0154f5c2
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2585778
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-10 16:00:25 -07:00
deepak goyal
cc7b048641 gpu: nvgpu: non-zero blob size for rail-gating.
Ucode blob size 0 is passed currently for rail-gating.
Ucode blob size 0 is not supported by ACR yet.
ACR will copy UCODE blob again
to SYSMEM for GPU Rail-gating cycles.

Bug 3361416

Change-Id: I1fdb3993cda7e5d62507d83f9c0a8645dc5f7fc7
Signed-off-by: deepak goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2588207
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-09 09:16:37 -07:00
David Li
b27524916a gpu: nvgpu: ga10b fix zcull sm_num_rcp_conservative
-calculate sm_num_rcp_conservative correctly using TPC total from all GPCs
-register manual says use SM count but it's actually TPC count

bug 3370219

Change-Id: I4422fb09d3a59879394e0e1abc5513efc6355b5b
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586399
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Gangzheng Tong <gtong@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Gangzheng Tong <gtong@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-09 09:13:31 -07:00
Debarshi Dutta
a53ebf02d1 gpu: nvgpu: update error message to info.
These errors are now actually expected from code that counts number of
sys/gpc/fbp perfmons after first context creation. Nvgpu tries to count
them by register offset lookup in context image and counts perfmons until
invalid offset is found.

nvgpu_gr_hwmp_map_find_priv_offset no longer prints an error message.
The correct error condition is moved to gr_exec_reg_ops

Bug 200755537

Change-Id: Ib5c6ccd39275b2b06e3f8bce4878a3234478a780
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586228
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-09 09:13:03 -07:00
Antony Clince Alex
ab4aa0afba gpu: nvgpu: remove incorrect usage of CONFIG_NVGPU_NEXT
Remove incorrect usage of CONFIG_NVGPU_NEXT introuduced in patch:
https://git-master.nvidia.com/r/#/c/linux-nvgpu/+/2499571/

JIRA NVGPU-6574

Change-Id: I9bf0f0ee5d9762b79dd7913402678b0dd87f21ee
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2567353
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-08 06:50:49 -07:00
Sagar Kadamati
dd9b4364aa gpu: nvgpu: add nvgpu-next infrastructure
* As of now, working on multiple chip bringup in nvgpu-next repo has
   an issue because we end with losing control on source code (hard to
   find which part of the code belongs to which chip) and it's valuable
   history this affects chip migration on release.

 * To support multiple chip bringup simultaneously, we need new
   guidelines to avoid losing control on source code and make migration
   easier. This change adds links to nvgpu-next repo.

 * Updated return code to ENODEV for consistency
 * Updated ACR unittest to work with ENODEV return code

NOTE:
     These are the initial set of infrastructure changes, guidelines
     will evolve, and source code will get updated accordingly.

     Based on future chip features, Which part of the source code falls
     under nvgpu-next repo is decided.

JIRA NVGPU-6574

Change-Id: I81827e35d189c55554df00e255b527a4473e0338
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556793
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-08 06:50:38 -07:00
Konsta Hölttä
9ffcb0fade gpu: nvgpu: log submit error reasons
For each common error that may happen in the submit path, log the
failure reason at info level if not already logged. Various mistakes may
cause -EINVAL, and getting to know what is wrong is helpful when writing
tests.

Change-Id: I8ac2a40441e0bf3d8afdb40526b607537eb5105c
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587360
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 16:00:50 -07:00
Divya Singhatwaria
b6ab227016 gpu: nvgpu: Enable pmu interrupt
- For secure RISCV boot, enable pmu interrupt
  during pmu_rtos_init
- As interrupts are enabled, PMU intr can be received
  before driver has changed the pmu firmware state. This
  can cause the RISCV boot to fail.
- To resolve this, first change the pmu firmware state
  from off to PMU_FW_STATE_STARTING and then wait
  for pmu priv lockdown release.

Change-Id: Ib2e8b033fec6320bf9ccff02696192a48172464b
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586325
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-07 16:00:05 -07:00
dt
152d7c9edd gpu: nvgpu: Fix for pes_tpc_mask programming
After CONFIG_UBSAN kernel compilation flag to know any shifting
cause overflow or not enablement ,this is identified.
The register "gr_fe_tpc_fs_r(gpc_index)" is read only after
Volta. The gops where we are computing the index is not needed.

Bug 200727116

Change-Id: Ib2306103389ba9df77fd59d012ec70e775104989
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573296
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 15:59:48 -07:00
dt
4034de5756 gpu: nvgpu: Fix for smid programming
As number of available tpc/gpc is more than 4 in new dgpu,
this fix is needed for correct sm_id config programming.

After CONFIG_UBSAN kernel compilation flag to know any shifting
cause overflow or not enablement , this is identified where the
shift is overflowing u32 when number of available TPCs is more
than four.

Bug 200727116

Change-Id: I9169a00614e4a648afe4a2d2f8e76c178e8c19eb
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2571823
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-07 10:29:03 -07:00
dt
9355345610 gpu: nvgpu: Add IPA-PA cache to increase the performance
When GPU need to programmed with PA(physical address),
given IPA need to be converted to PA by querying Hypervisor.
As this is an IPC between OSes, the call will reduce the
performance badly. So this is adding a IPA-PA cache to improve
the performance. This will be more helpful in passthr config.

Bug 3277194

Change-Id: I6a3230d858977313a0ed0f33068055a3b516330a
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2571814
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 10:28:58 -07:00
Ramesh Mylavarapu
ffd0d3962f gpu: nvgpu: gsp: gsp isr and debug trace support
- Created GSP NVRISCV interrupt handle and
  respective functions and register reads.
- Created Debug trace support for GSP firmware.

NVGPU-7084

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I2728150c4db00403aa6e3c043bc19c51677dd9cf
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589430
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 05:37:51 -07:00
Antony Clince Alex
2afd601a40 gpu: nvgpu: update FS mask sysfs entries to RDONLY
Repurpose (gpc,fbp,tpc)_fs_mask sysfs nodes to only report active
physical chiplets after floorsweeping.

StaticPG'ing of chiplets will be handled by (gpc,fbp,tpc)_pg_mask sysfs
nodes. The user will be able to the write valid PG masks for respective
chiplets prior to poweron, which can then be verified using
(gpc,fbp_tpc)_fs_mask nodes.

Bug 3364907

Change-Id: Ia4132f9c1939b2cb4a8f55f9d99a2b0a5b02184c
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587926
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Chris Dragan <kdragan@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-07 05:35:09 -07:00
Rajesh Devaraj
4d46e9e07a gpu: nvgpu: update doxygen for SDL
This patch updates doxygen for the following functions in SDL:
- nvgpu_report_ctxsw_err()
- nvgpu_report_ecc_err()
- nvgpu_report_host_err()
- nvgpu_report_pmu_err()
- nvgpu_report_pri_err()
- gr_intr_report_ctxsw_err()
- nvgpu_report_mmu_err()
- nvgpu_report_gr_err()

JIRA NVGPU-7001

Change-Id: Ie21908cacaf4add1143d68d9f9a4d2d1315dfdd8
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
(cherry picked from commit c1dc3e7c35d585faed8ed3b9c61f6afe044f7263)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2588991
Reviewed-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-03 14:40:32 -07:00
Debarshi Dutta
33740b41b6 gpu: nvgpu: free memory during module removal
Following pointers(allocated via Kmalloc/DMA) aren't freed during
module removal.

struct nvgpu_gr_config -> gpc_tpc_mask_physical
struct nvgpu_netlist_vars -> ctxsw_regs.etpc.l
struct mm_gk20a -> sysmem_flush
struct nvgpu_pmu_pg -> pg_buf
SGTable corresponding to VPR secure buffer.

Added appropriate free calls.

Bug 3364181

Change-Id: I2105c1f3256b1910f0f514d98f0ee3ae2e34aff7
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586244
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-02 15:43:07 -07:00
Sagar Kamble
79fb97100d gpu: nvgpu: implement GET_BUFFER_INFO ioctl
Userspace applications will need to query buffer information such as
size, comptags allocation status, user associated metadata etc. for
enabling newer IPC mechanisms. Add support for this new ioctl.

Bug 200586313

Change-Id: I87607eb306afa0cce1bec7a1fb2925ec3bc33e50
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480763
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-02 11:42:13 -07:00
Sagar Kamble
ed16377983 gpu: nvgpu: allocate comptags and store metadata in REGISTER_BUFFER ioctl
To enable userspace query about comptags allocation status of a buffer,
comptags are to be allocated only during buffer registration done by
nvrm_gpu. Earlier, they were allocated during map.

nvrm_gpu will be sending metadata blob to be associated with the buffer.
This will have to be stored in the dmabuf privdata for all the buffers
registered by nvrm_gpu.

This patch moves the privdata allocation to buffer registration ioctl.

Remove g->mm.priv_lock as it is not needed now. This lock was added
to protect dmabuf private data setup. That private data is now
handled through dmabuf->ops and setup of dmabuf->ops is done
under dmabuf->lock.

To support legacy userspace, this patch still allocates comptags on
demand on map calls for unregistered buffers.

Bug 200586313

Change-Id: I88b2ca04c733dd02a84bcbf05060bddc00147790
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480761
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-02 11:42:08 -07:00
Jon Hunter
8a4b72a4aa gpu: nvgpu: Fix crash when reading CE_APP debugfs
The CE_APP debugfs nodes are created when the NVGPU driver is probed,
however, the 'ce_app' structure which contains the variables exposed
via the debugfs, is not allocated until nvgpu_finalize_poweron() is
called. Therefore, if the user attempts to access the CE_APP debugfs
nodes before the NVGPU has been powered on, for example, right after
Linux has booted, then this results in a NULL pointer dereference crash.
Fix this by moving the creation of the CE_APP debugfs nodes to
nvgpu_finalize_poweron_linux() which is called after
nvgpu_finalize_poweron().

Bug 200747304

Change-Id: Icd28952112f86887a1d6b6f8beb382f5189461a9
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572106
(cherry picked from commit 35a0c18d93e97265611c3bbfae41b39d9cd183e3)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587367
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-02 07:23:53 -07:00
Jon Hunter
d1b34e50e2 gpu: nvgpu: Fix build for Linux v5.14-rc2
Upstream Linux kernel commits b7eb335e26a9 ("Makefile: Enable
-Wimplicit-fallthrough for Clang") and d936eb238744 ('Revert "Makefile:
Enable -Wimplicit-fallthrough for Clang"') have the net effect of
updating the compiler flag -Wimplicit-fallthrough from
-Wimplicit-fallthrough= to -Wimplicit-fallthrough=5. This causes the
following build error to be seen ...

 nvgpu/drivers/gpu/nvgpu/common/pmu/clk/clk_prog.c:1042:15: error:
 	this statement may fall through [-Werror=implicit-fallthrough=]
 1042 |    step_count = (freq_step_size_mhz == 0U) ? 0U :
      |    ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1043 |      (u8)(p1xmaster->super.freq_max_mhz -
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1044 |       *pfreqmaxlastmhz - 1U) /
      |       ~~~~~~~~~~~~~~~~~~~~~~~~
 1045 |      freq_step_size_mhz;
      |      ~~~~~~~~~~~~~~~~~~
 nvgpu/drivers/gpu/nvgpu/common/pmu/clk/clk_prog.c:1048:3: note: here
 1048 |   case CTRL_CLK_PROG_1X_SOURCE_ONE_SOURCE:
      |   ^~~~
 cc1: all warnings being treated as errors
 scripts/Makefile.build:271: recipe for target
 	'nvgpu/drivers/gpu/nvgpu/common/pmu/clk/clk_prog.o' failed

Per commit d936eb238744 ('Revert "Makefile: Enable -Wimplicit-fallthrough
for Clang"'), by setting -Wimplicit-fallthrough=5 [0], the explicit
'fall-through' comments in the code are not recognised by the compiler
and cause the above error to be seen. This could be fixed by simply
replacing the 'fall-through' comment with the 'fallthrough;' statement.
However, this requires newer versions of GCC that support it. The
simplest way to fix this error is by ensuring that
-Wimplicit-fallthrough=3 for NVGPU so that fallthrough comments are
recognised by the compiler. Note that we still need to check that GCC
supports this option because older versions do not. It should be noted
that -Wimplicit-fallthrough=3 is the default set by -Wextra. See the
GCC warnings options document [0] for more details.

Bug 3340525

Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html [0]
Change-Id: Ia56e4343143185460a37f8a7b0dd229f005acbb9
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2567440
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582509
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Rohit Khanna <rokhanna@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Tested-by: Rohit Khanna <rokhanna@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-01 18:49:59 -07:00
Sagar Kamble
7410784b0b gpu: nvgpu: fix clk_arb completion file private data access race
clk_arb completion file descriptor can get closed immediately after
poll finishes in the work item gp10b_clk_arb_run_arbiter_cb. In
that case, the refcount for nvgpu_clk_dev can become zero in
the work item and can lead to invalid access while removing
nvgpu_clk_dev from the lists.

Remove nvgpu_clk_dev from the list before dropping the reference to
it.

Also, delete the nvgpu_clk_dev in completion file release handler
within the session and requests spinlocks to avoid race with
gp10b_clk_arb_run_arbiter_cb using it.

bug 200757277

Change-Id: I054eee547f2a6fa633d7ef55df216ec36647a826
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2569522
(cherry picked from commit ce8548ec05)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587070
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-01 09:50:11 -07:00
Debarshi Dutta
6fc27766ed gpu: nvgpu: fix issues due to a previous patch
608decf gpu: nvgpu: add support for powering off gpu
The above commit accidentally removed nvgpu_quiesce from
nvgpu_pci_remove path. Add that back.

Bug 3365659

Change-Id: I287972c426738a950ace2907610e02b774ab1eff
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586240
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-01 01:37:17 -07:00
deepak goyal
77d1e765f5 gpu: nvgpu: ga10b: Fix logic for BROM pass status
Current code assumes riscv brom passed if it does not times out.
This patch explicitly checks for brom pass/fail or timeout.

Bug 3361416

Change-Id: I399a6cf9d32be92b24990532f81892642513ba54
Signed-off-by: deepak goyal <dgoyal@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2585786
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-31 08:54:35 -07:00
Seshendra Gadagottu
d255c64f50 gpu: nvgpu: ga10x: update pdiv_duration for thermal
To keep pdiv_duration at 15usec between steps at 102MHz
utilsclk, update stepping duration value from 0xBF4 to
0x5FA for ga10x.

Bug 200757274

Change-Id: I333a5b0b35307402a734a7eafc4ab13d20316cd1
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584539
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-30 19:35:54 -07:00
Ramesh Mylavarapu
88293ee42d gpu: nvgpu: read temperature from therm_i2cs_sensor_00_r
Currently reading temperature value depeads on therm pstate
board objects. In absence of pstate reading temperature
from therm get status will be failed which will cause GVS
failure in NvRmGpuTest_Device_GetTemperature test.
This change will add support to read temperature from
therm sensor_00 register but this will have following
limitation:
 - NV_THERM_I2CS_SENSOR_00 doesn't support fractional
   precision.
 - It doesn't support negative temperatures.

BUG-200736830

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I25e577dac9029fcd787a6f71957dbeefd6fe43dd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584269
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-28 06:56:24 -07:00
Ramesh Mylavarapu
a96c04d097 gpu: nvgpu: disable pstate support for tu104
Disabling pstates on TU104 which is no more a POR.

BUG-200736830

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I36a0d5fac5d1294802e5150dcebd5dcb54ad5f2e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584268
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-28 06:56:19 -07:00
Seshendra Gadagottu
135e056e9e gpu: nvgpu: ga10b: set can_slcg/blcg/elcg to true
Add capability to enable/disable clock gating power
features by setting can_xxcg capabilities to
true. The cg features are disabled on tot and will be
enabled once verification is done.

Jira NVGPU-7033
Bug 200766930

Change-Id: I2d2aa25b7c84f3c4de0b12fd6d845a8f792bfd2d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584540
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-27 20:45:58 -07:00
Antony Clince Alex
bb5bffe571 gpu: nvgpu: enhance CE error reporting documentation
Update documentation for function nvgpu_report_ce_err to include
fine granular implemenation details. In additiona, remove redundant
descrptions from error reporting functions.

Jira NVGPU-6948

Change-Id: Ie1675b0260809bfbc6fdeab6748c48347b5f3d7d
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554573
(cherry picked from commit a5f84edde5943358549534b8f736ee931a28c1ad)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555909
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-27 04:17:15 -07:00
Deepak Nibade
3c97f3b932 gpu: nvgpu: disallow binding more channels than MAX channels supported per TSG
There is HW specific limit on number of channel entries that can be
added for each TSG entry in runlist. Right now there is no checking
to enforce this from SW and hence if User binds more than supported
channels to same TSG, invalid TSG formation error interrupts are
generated.

Fix this by adding appropriate checks in below steps :

- Add new field ch_count to struct nvgpu_tsg to keep track of
  channels bound to TSG.
- Define new hal gops.runlist.get_max_channels_per_tsg() to retrieve
  HW specific maximum channel count per TSG.
- Implement the HAL for gk20a and gv11b chips, and assign new HALs for
  all chips appropriately.
- Increment ch_count while binding the channel to TSG and decrement it
  while unbinding.
- While binding channel to TSG, Check if current channel count is
  already equal to max channel count. If yes, print an error and bail
  out.

Bug 200763991

Change-Id: Ic5f17a52e0fb171d1c020bf4f085f57cdb95f923
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582095
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-25 09:47:47 -07:00
Debarshi Dutta
608decf1e6 gpu: nvgpu: add support for powering off gpu
Add support for powering off IGPU for switching between
legacy to SMC mode/vice-versa or changing SMC configuration.
The power off can be issued as follows

echo 0 > /dev/nvgpu/igpu0/power

The following steps are done during a poweroff.
1) Deterministic channel idle
2) Acquire write_lock on l->busy semaphore.
3) Wait till power_usage decrements to indicate 0 active jobs.
4) Invoke pm_runtime_put_sync_suspend()
5) Invoke nvgpu_gr_remove_support() to clear existing GR memory.
6) Release write_lock on l->busy
7) Deterministic channel unidle.

Part of the sequence matches that of the gk20a_do_idle code.
The common parts are extracted into new functions
gk20a_block_new_jobs_and_idle() and gk20a_unblock_jobs()

For joint-rail case, the current implementation, does a railgate
and then sets pm_runtime_set_autosuspend_delay(-1) to disable
regular runtime resume/suspend.

Remove clearing of NVGPU_SUPPORT_MIG status during state change
ias it leads to inconsistencies.

Jira NVGPU-6920

Change-Id: I0b3eb3278176122ac061c1e8a94ebfb3c17c3925
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578501
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:50 -07:00
Debarshi Dutta
2e3c3aada6 gpu: nvgpu: fix deinit of GR
Existing implementation of GR de-init doesn't account for multiple
instances of struct nvgpu_gr. As a fix, below changes are added.

1) nvgpu_gr_free is unified for VGPU as well as native.
2) All the GR instances are freed.
3) Appropriate NULL checks are added when freeing GR memories.
4) 2D, 3D, I2M and ZBC etc are explicitely disabled when MIG is set.
5) In ioctl_ctrl, checks are added to not return error when zbc is NULL
   for VGPU as requests are rerouted to RMserver.

Jira NVGPU-6920

Change-Id: Icaa40f88f523c2cdbfe3a4fd6a55681ea7a83d12
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578500
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:45 -07:00
Mahantesh Kumbar
b9696ee643 gpu: nvgpu: ga10b: update NVRISCV LSPMU
- Set NVRISCV LSPMU app version to 0.
- Setting app version to 0 helps to load and boot
  multiple LSPMU ucode's without modifying the
  NVGPU driver.
- Add support for PMU NVRISCV prod and dbg bin's.
- This is corresponding change to LSPMU MPSK CL
  https://git-master.nvidia.com/r/c/tegra/kernel-firmware-t18x/+/2576049

JIRA NVGPU-7061

Change-Id: I800953ca97af3badde1983aa99e09b4fe7453203
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2575341
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-22 11:05:03 -07:00
Seshendra Gadagottu
a743596697 gpu: nvgpu: ga10b: handle floor-swept gpc clock gracefully
If a GPC is floor-swept, then gpcclk enable for that GPC will
return error. For gpu booting, ignore this error and continue
with other clocks enable. More robust mechanism with floor-sweeping
check before enabling clocks will be added in follow-up patches.

Bug 3362403

Change-Id: I0b64c94918a1c00086a146408e6c4913788249ec
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2579569
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-20 14:56:30 -07:00
Seshendra Gadagottu
8ed1487860 gpu: nvgpu: ga10b: Enable clock arb support
Enable clock arbitration support for silicon.

Bug 200764879

Change-Id: I40d47f7f15197a8dd55ca0866e177fd42b8c4e9d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2579556
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-19 14:07:47 -07:00
Richard Zhao
d8e847c90d gpu: nvgpu: vgpu: fix force preemption from debugfs
check whether there's any force_preemption_gfxp or force_preemption_cilp
set in debugfs when alloc obj_ctx.

Jira GVSCI-4658

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I87fc7e195c9b0f7ed29ec6c37c8f46b456625fea
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2579218
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-19 14:06:44 -07:00
Richard Zhao
eaa508b2e6 gpu: nvgpu: vgpu: set .set_long_timeslice for ga10b
Jira GVSCI-4658

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I99c2f43504b68b8616b4327edcd1389b29912900
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578287
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-18 14:34:45 -07:00
Vedashree Vidwans
e13ab1f9ea gpu: nvgpu: pmu: remove hw access from remove_pmu_support
GPU HW registers are locked before remove_pmu_support.
Remove functions accessing HW registers.

Bug 3357477

Change-Id: I34a1923bfdb3afacd462f2646e2821569573a81a
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2577627
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-17 09:45:42 -07:00
Richard Zhao
d7a8ef3285 gpu: nvgpu: vgpu: set flag NVGPU_CLK_ARB_ENABLED
NVGPU_CLK_ARB_ENABLED was not set correctly. The flag will only be set
when the guest has clock control.

Jira GVSCI-4658

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I3046be53e6d58cb2e8c6130cdbb89fad2d8e6d13
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2576941
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-17 09:14:45 -07:00
Vedashree Vidwans
913a2d519f gpu: nvgpu: ga10b: correct cbc base/top reg value
CBC base and top values need to be left shifted by cbc_alignment factor
to store in the CBC_BASE and CBC_TOP registers respectively. Fix cbc
calculations accordingly.
Update cbc information debug prints to print with gpu_dbg_info flag.

Bug 3353418

Change-Id: I858c46a9dab1e5f810cabb327ba1797f15a2960e
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2574119
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Bitan Biswas <bbiswas@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-14 06:24:40 -07:00
Seshendra Gadagottu
342da8158a gpu: nvgpu: ga10b: disable frequency scaling
To disable frequency scaling for gpc clocks,
set both devfreq_governor and qos_notify to NULL
in platform data.

Jira NVGPU-7059

Change-Id: I2142195d89758d21f2c6e070196645aca9bc0a24
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573476
Tested-by: Mark Mendez <mmendez@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-12 18:16:01 -07:00
Seshendra Gadagottu
78d7a7fdde gpu: nvgpu: ga10b: add errata for disable CBU ECC
Add NVGPU_ERRATA_200761358 errata for CBU ECC disable
in nvgpu driver.

Bug 200761358

Change-Id: I51fcddb47946e84b1cdf39ab908e2185bc112c83
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2574530
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antoine Chauveau <achauveau@nvidia.com>
Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-12 02:25:43 -07:00
Sagar Kamble
f4571194b0 gpu: nvgpu: stop ELPG init thread during unload
ELPG initialization thread creation can fail when the process is killed.
That leads to driver resume failure.

That thread was stopped on suspend and re-created on resume. To avoid
the issue above, don't stop the ELPG thread in suspend and let the
first created thread handle the ELPG state transitions always.
And stop the ELPG thread during unload.

Also fix couple of instances of config flag as:

s/CONFIG_PMU_POWER_PG/CONFIG_NVGPU_POWER_PG

bug 3345977

Change-Id: I8952edf8d1664ed258f238e265002e716d1bf5c2
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573763
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-11 01:55:46 -07:00
Sagar Kamble
40064ef1ec gpu: nvgpu: fix ecc counter free
ECC counter structures are freed without removing the node from the
stats_list. This can lead to invalid access due to dangling pointers.

Update the ecc counter free logic to set them to NULL upon free, to
remove them from stats_list and free them by validation.

Also updated some of the ecc init paths where error was not propa-
gated to callers and full ecc counters deallocation was not done.

Now, calling unit ecc_free from any context (with counters alloc-
ated or not) is harmless as requisite checks are in place.

bug 3326612
bug 3345977

Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-11 01:55:08 -07:00
Tejal Kudav
2887d06e3b gpu: nvgpu: Disable debugger print
Debugger not being attached is not fatal condition, so convert the
print to indicate that debugger is not attached into a conditional
print using log_mask bit(gpu_dbg_gpu_dbg).

Change-Id: I8fcf58f320b5459cb206d25dfac21b1dbe5e4bb2
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573123
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Mark Mendez <mmendez@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-09 13:05:28 -07:00
Tejal Kudav
524d222149 gpu: nvgpu: Print error msg on GR ECC errors
Currently, the ECC errors go unnoticed. ECC errors in GR are handled
by incrementing the ecc counters and clearing the intr registers. No
action is taken bassed on the ECC counters. The ECC intr prints are
also not printed by default.
Ideally, the ECC errors should trigger recovery if they cross the
error threshold. While we wait for the ideal ECC handling, convert
the conditional ECC prints to error messages.

JIRA NVGPU-7058

Change-Id: I2e54aeb5aef1d76dd6d4162eedd21cd529089b54
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573029
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-09 10:04:34 -07:00
mkumbar
de267c034c gpu: nvgpu: ga10b: Enable PKC support
-Enable PKC support in ACR and LS-PMU
-Update the PMU f/w version.
-Enable PMU support by default.

Change-Id: I42bbe1b64ddc6ead9641c97d1ed27a9f4020510a
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2568609
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Deepak Goyal <dgoyal@nvidia.com>
Tested-by: Krishna Reddy <vdumpa@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-08 14:23:36 -07:00
Seshendra Gadagottu
13a77ce843 gpu: nvgpu: ga10b: don't wait for ctxsw wdt ack
Currently, ctxsw is not sending watchdog timeout ack that results in
GPU timeout and failure on silicon.

Bug 3354738

Change-Id: Idc8fbe3bcc8c539a8b391f19c5bfa3207d1a3e45
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2570595
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-08-07 21:23:13 -07:00
Vedashree Vidwans
a7a2e1e263 gpu: nvgpu: ga10b: update cbc divisor and top reg
Currently, cbc init and compression tests are failing because MMU marks
cbc to be not safe.
- Modify cbc.get_base_divisor hal to use ltc_count = 1 for Tegra devices
- Update fb.cbc_configure to write compbit_backing_size value to
fb_mmu_cbc_top register.
- After config confirm that CBC is marked safe.

Bug 3353418

Change-Id: I1e9c27f47f7bfcf476f2499231951382e2e8653d
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2570550
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-05 22:33:56 -07:00
Seshendra Gadagottu
00e67e0798 gpu: nvgpu: ga10b: disable elpg
Engine Level Power Gating(ELPG) for ga10b is enabled
on tot for silicon. elpg needs to be enabled
only after verification on silicon and after stress
testing the feature. To avoid issues during ga10b bring-up
with unverified ELPG feature, disable it by setting both
can_elpg_init and elpg_enable to false in ga10b platform
data.

Jira NVGPU-7033

Change-Id: I664d6e031339aa912b78769bd58a4e6d77dca1d0
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2564197
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Krishna Reddy <vdumpa@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-28 12:03:48 -07:00
Sagar Kamble
0f59efb2cd gpu: nvgpu: return tpc exceptions error properly
It is observed that recovery on receiving the ESR MMU NACK exception
does not get triggered as the error returned by tpc level handler
is masked.

NACK is marked handled but recovery is not done and subsequent fb
intr handler does not trigger recovery since NACK is handled.

This leaves the HW engines in bad state.

Fix the tpc error return logic to trigger recovery during ESR MMU
NACK exception.

Bug 3318939

Change-Id: I475826f734e4366e853607e1e0338290ee28249b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2564764
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-07-26 05:13:41 -07:00
Vedashree Vidwans
5eec60510b gpu: nvgpu: ga10b: gr vab addr reg config
Configure VAB range checker registers in GR.

Bug 2999621

Change-Id: Ice00ad98ec575f74b098c1ac3a6c0dbcdbe677e8
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2564261
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-25 12:13:18 -07:00