Commit Graph

189 Commits

Author SHA1 Message Date
Deepak Nibade
3c97f3b932 gpu: nvgpu: disallow binding more channels than MAX channels supported per TSG
There is HW specific limit on number of channel entries that can be
added for each TSG entry in runlist. Right now there is no checking
to enforce this from SW and hence if User binds more than supported
channels to same TSG, invalid TSG formation error interrupts are
generated.

Fix this by adding appropriate checks in below steps :

- Add new field ch_count to struct nvgpu_tsg to keep track of
  channels bound to TSG.
- Define new hal gops.runlist.get_max_channels_per_tsg() to retrieve
  HW specific maximum channel count per TSG.
- Implement the HAL for gk20a and gv11b chips, and assign new HALs for
  all chips appropriately.
- Increment ch_count while binding the channel to TSG and decrement it
  while unbinding.
- While binding channel to TSG, Check if current channel count is
  already equal to max channel count. If yes, print an error and bail
  out.

Bug 200763991

Change-Id: Ic5f17a52e0fb171d1c020bf4f085f57cdb95f923
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582095
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-25 09:47:47 -07:00
Sagar Kamble
40064ef1ec gpu: nvgpu: fix ecc counter free
ECC counter structures are freed without removing the node from the
stats_list. This can lead to invalid access due to dangling pointers.

Update the ecc counter free logic to set them to NULL upon free, to
remove them from stats_list and free them by validation.

Also updated some of the ecc init paths where error was not propa-
gated to callers and full ecc counters deallocation was not done.

Now, calling unit ecc_free from any context (with counters alloc-
ated or not) is harmless as requisite checks are in place.

bug 3326612
bug 3345977

Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-11 01:55:08 -07:00
Divya Singhatwaria
842bef7124 gpu: nvgpu: Support GPC and FBP Floorsweeping
- Add gops_fbp_fs and gops_gpc_pg struct
- Add HALs to write to NV_FUSE_CTRL_OPT_FBP and
  NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping
- Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask
  respectively during gpu probe
- Add sysfs node: fbp_fs_mask and gpc_fs_mask to store
  FBP and GPC floorsweeping mask sent from userspace
- Move the floorsweeping programming early in NVGPU’s GPU init
  function and then issue a PRI init.

JIRA NVGPU-6433

Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 06:17:25 -07:00
Divya Singhatwaria
9f30609550 gpu: nvgpu: Rename TPC powergating mutex
Rename tpc_pg_lock to static_pg_lock and
have_tpc_pg_lock to have_static_pg_lock as it
is used for tpc/gpc/fbp power gating.

JIRA NVGPU-6433

Change-Id: I4c56b9710e303ad9e872bad4b5ed9a167acb9dd6
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2537489
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-18 02:46:25 -07:00
mkumbar
87984ea344 gpu: nvgpu: support nvriscv debug feature
Enable nvriscv debug buffer feature in NVGPU.
Debug buffer is a feature to print the debug log from ucode onto console
in real time.
Debug buffer feature uses the DMEM, queue and SWGEN1 interrupt to share
ucode debug data with NVGPU.
Ucode writes debug message to DMEM and updates offset in queue to trigger
interrupt to NVGPU.
NVGPU copies the debug message from DMEM to local buffer to process and
print onto console.

Debug buffer feature is added under falcon unit and required engine
can utilize the feature by providing required param through public
functions.

Currently GA10B NVRISCV NS/LS PMU ucode has support for this feature
and enabled support on NVGPU side by adding required changes, with this
feature enabled, it is now possible to see prints in real time.

JIRA NVGPU-6959

Change-Id: I9d46020470285b490b6bc876204f62698055b1ec
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548951
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-17 12:45:00 -07:00
Divya Singhatwaria
77e3a8c5e4 gpu: nvgpu: ga10b: Add request_idle ce ops
Issue observed:
- In GA10B, it was observed that after recovery happens
  ELPG does not engage.
- It was because, after CE reset, when nvgpu_submit_twod test
  was run to engage ELPG, IDLE_FLIPPED_PWR_OFF signal was asserted.
- This means that when ELPG was engaged (engine is in PWR_OFF),
  some idle signal flips (becomes non-idle) and this causes
  IDLE_SNAP. After IDLE_SNAP is hit, ELPG will not engage further.
- After debugging from WAVES, it was observed that:
  LCE0/LCE1 are not done with the reset sequence.
- The state of these LCE is RESET0. A pri request (pri read
  to NV_CE_PCE_MAP register in CE) is seen that kicks it out of
  RESET0. After this state, it goes through few states to update
  some internal states (states RESET1/RESET2/PCE_MAP etc) and then
  eventually settles down to IDLE state.

Solution:
- Read ce_pce_map_r register in recovery sequence (after ce reset).
- It is observed that when this read is added recovery is complete
  and post that when nvgpu_submit_two test is executed, ELPG is engaging.
- This means that a pri read is needed after CE reset so that it settles
  to idle state properly and post that ELPG can engage properly.

Bug 200734258

Change-Id: I5bb84921ca62a740fde81ffe6c29ccde4ebb341b
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554493
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-15 10:05:02 -07:00
Deepak Nibade
4edf952e3e gpu: nvgpu: fix rule 5.1 misra violations in common.gr
Fix rule 5.1 misra violations in common.gr by renaming below functions :

nvgpu_gr_config_get_gpc_tpc_mask_base ->
  nvgpu_gr_config_get_base_mask_gpc_tpc

nvgpu_gr_config_get_gpc_tpc_count_base ->
  nvgpu_gr_config_get_base_count_gpc_tpc

gm20b_ctxsw_prog_set_priv_access_map_config_mode ->
  gm20b_ctxsw_prog_set_config_mode_priv_access_map

gm20b_ctxsw_prog_set_priv_access_map_addr ->
  gm20b_ctxsw_prog_set_addr_priv_access_map

gm20b_gr_falcon_read_fecs_ctxsw_mailbox ->
  gm20b_gr_falcon_read_mailbox_fecs_ctxsw

gm20b_gr_falcon_read_fecs_ctxsw_status0 ->
  gm20b_gr_falcon_read_status0_fecs_ctxsw

gm20b_gr_falcon_read_fecs_ctxsw_status1 ->
  gm20b_gr_falcon_read_status1_fecs_ctxsw

gv11b_gr_intr_get_sm_hww_warp_esr_pc ->
  gv11b_gr_intr_get_warp_esr_pc_sm_hww

gv11b_gr_intr_get_sm_hww_warp_esr ->
  gv11b_gr_intr_get_warp_esr_sm_hww

Jira NVGPU-6779

Change-Id: Icbe23a7b022373785968fc417ee247e2d80cfcc6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554521
(cherry picked from commit 1432650774506f2a7e45f70b084f498736d0d0c5)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555330
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-13 09:20:41 -07:00
Antony Clince Alex
f51a43b579 gpu: nvgpu: ga10b: fix fetching of FBP_L2 FS mask
On all chips except ga10b, the number of ROP, L2 units per FBP
were in sync, hence, their FS masks could be represented by a single
fuse register NV_FUSE_STATUS_OPT_ROP_L2_FBP. However, on ga10b, the ROP
unit was moved out from FBP to GPC and it no longer matches the number
of L2 units, so the previous fuse register was broken into two -
NV_FUSE_CTRL_OPT_LTC_FBP, NV_FUSE_CTRL_OPT_ROP_GPC.

At present, the driver reads the NV_FUSE_CTRL_OPT_ROP_GPC register
and reports incorrect L2 mask. Introduce HAL function
ga10b_fuse_status_opt_l2_fbp to fix this.

In addition, rename fields and functions to exclusively fetch L2 masks,
this should help accommadate ga10b and future chips in which L2 and ROP units
are not in same. As part of this, the following functions and
fields have been renamed.
- nvgpu_fbp_get_rop_l2_en_mask => nvgpu_fbp_get_l2_en_mask
- fuse.fuse_status_opt_rop_l2_fbp => fuse.fuse_status_opt_l2_fbp
- nvgpu_fbp.fbp_rop_l2_en_mask => nvgpu_fbp.fbp_l2_en_mask

The HAL ga10b_fuse_status_opt_rop_gpc is removed as rop mask is not
used anywhere in the driver nor exposed to userspace.

Bug 200737717
Bug 200747149

Change-Id: If40fe7ecd1f47c23f7683369a60d8dd686590ca4
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551998
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-07 05:48:56 -07:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
Deepak Nibade
9034b1676e gpu: nvgpu: compile out GFxP support in safety
GFxP preemption for graphics contexts is not supported in safety.
But the support was enabled along with CONFIG_NVGPU_GRAPHICS since GFxP
preemption was protected under same config.

Add a separate config CONFIG_NVGPU_GFXP to protect all GFxP specific
code, enum values, and HALs.

Disable the config in safety profile.

Jira NVGPU-6893

Change-Id: Iebb5f754a1025dfa6e05a94704bdb8a7123b599a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2534986
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-28 15:17:36 -07:00
Tejal Kudav
e0a1fcf5f5 gpu: nvgpu: Add Central Intr Controller unit
Add a new Central Interrupt Controller(CIC) unit in common code.
The interrupt handling is done in a distributed manner currently.
The error handling policy for different errors resides in each unit's
ISR code. The goal is to converge this data under one central place -
the CIC unit.

This patch creates framework for CIC unit and moves the gv11b QNX
safety LUT to CIC unit. All the error reporting APIs from different
units are also moved to CIC.

New APIs are exposed by CIC unit to access its internal data like:
  1. Struct err_desc - the static err handling /injection data per
                       error id
  2. Num_hw_modules  - the number of error reporting HW units
                       supported by CIC

Init and deinit of CIC unit:
  1. CIC unit should be initialized earlyon during boot so that it
     is available for any interrupt handling.
  2. Initialize CIC just before the interrupts are enabled during
     boot.
  3. Similarly, CIC is disabled late during deinit cycle; right
     after the interrupts are masked.

LUT:
  1. LUT is currently used only for reporting error to safety
     services in gv11b QNX safety build.
  2. This error handling policy LUT currently has only two levels
     of handing - correctable and quiecse.
  3. Once, the error handling policy decision is moved from leaf
     unit nodes to CIC, LUT will be updated to have additional levels
     like fast recovery and full recovery.
  4. Also, then a separate LUT will be added for each platform/build.
  5. In current framework, the LUT is set to NULL for all
     configurations except gv11b.

report_err() ops is added to report error to safety services.
This ops is only effective for gv11b qnx build; and set to NULL for
other configurations.

NVGPU-6521
NVGPU-6523
NVGPU-6750
NVGPU-6758
NVGPU-6760
NVGPU-6754

Change-Id: I24be7836a96d787741e37b732e19863ed8014635
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683
Reviewed-by: Ajesh K V <akv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-25 14:28:04 -07:00
Vedashree Vidwans
aba26fa082 gpu: nvgpu: handle chip specific erratas
Currently, there are few chip specific erratas present in nvgpu code.
For better traceability of the erratas and corresponding fixes,
introduce flags to indicate existing erratas on a chip. These flags
decide if a corresponding solution is applied to the chip(s).

This patch introduces below functions to handle errata flags:
- nvgpu_init_errata_flags
- nvgpu_set_errata
- nvgpu_is_errata_present
- nvgpu_print_errata_flags
- nvgpu_free_errata_flags

nvgpu_print_errata_flags: print below details of erratas present in chip
1. errata flag name
2. chip where the errata was first discovered
3. short description of the errata

Flags corresponding to erratas present in a chip are set during chip hal
init sequence.

JIRA NVGPU-6510

Change-Id: Id5a8fb627222ac0a585aba071af052950f4de965
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2498095
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-28 19:14:44 -07:00
absalam
3ec369d60a gpu: nvgpu: Disable Clock Arbitor for TU104
This patch is to disable the clock arbitor for TU104.
TU104 is not a POR for Drive 6.0 so disabling it to easy migration
of clk arb for GA100.
As a first step all the NVRM Clock tests will be skipped by setting
NVGPU_SUPPORT_CLOCK_CONTROLS to false for TU104.
Then clk arbitor will be rewritten for GA100 and enabled back.
This patch implements by adding a new flag NVGPU_CLK_ARB_ENABLED which
holds the status of clk arbitor for each platform and disables them for
TU104

Bug 200699763

Change-Id: I51cd5c7821bdc0b48080c17a70735925b278ddf5
Signed-off-by: absalam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515086
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-20 07:47:38 -07:00
Mayur Poojary
6277d57936 gpu: nvgpu: Add new api for setting longer timeslice on dbg node
Add new ioctl api for setting longer timeslice and get timeslice
inside 'dbg' dev node.
Update ioctl gpu_get_characteristic to pass the max timeslice value
Add debugfs to access and change the max timeslice value

Bug 1842244

Change-Id: I7e80f59162cf5d90496f9752fc128f5fa8dcc7d2
Signed-off-by: Mayur Poojary <mpoojary@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2471569
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-06 04:37:38 -07:00
Alex Waterman
77c0b9ffdc gpu: nvgpu: Update runlist_update() to take runlist ptr
Update the nvgpu_runlist_update_for_channel() function:

  - Rename it to nvgpu_runlist_update()
  - Have it take a pointer to the runlist to update instead
    of a runlist ID. For the most part this makes the code
    better but there's a few places where it's worse (for
    now).

This starts the slow and painful process of moving away from
the non-runlist code using runlist IDs in many places it should
not.

Most of this patch is just fixing compilation problems with
the minor header updates.

JIRA NVGPU-6425

Change-Id: Id9885fe655d1d750625a1c8aceda9e67a2cbdb7a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470304
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-01-29 09:51:44 -08:00
Sagar Kamble
cf287a4ef5 gpu: nvgpu: retry tsg unbind if NEXT is set
The NEXT bit can remain set for the channel if timeslice expires before
scheduler clears it. Due to this nvgpu fails TSG unbind and in turn
nvrm_gpu fails channel close. In this case, checking the channel hw
state after some time can help see NEXT bit cleared by scheduler.

Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again.

Bug 3144960

Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-01-18 23:11:57 -08:00
mkumbar
c62cfa2efb gpu: nvgpu: get PMU NEXT core irqmask
-Add new PMU ops to get NEXT core irq mask
-Add support to handle NEXT core interrupt request.

Bug 200659053
Bug 3199589

Change-Id: I78738f074a425f8934bbba28bf6996eeec7ab05a
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2457077
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:48 -06:00
Joshua Widen
60f44506a3 Revert "gpu: nvgpu: get PMU NEXT core irqmask"
This reverts commit 4ff427c51619cecdcc74fdbb388d82421cf45655.

Reason for revert: Testing for regression seen in GVS.

Bug 3198736

Change-Id: If12da341c3e13907bdcbb778c8fb4118cd5e3803
Signed-off-by: jwiden <jwiden@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2456791
Reviewed-by: svcguardwords <svcguardwords@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:48 -06:00
mkumbar
8284832300 gpu: nvgpu: get PMU NEXT core irqmask
-Add new PMU ops to get NEXT core irq mask
-Add support to handle NEXT core interrupt request.

Bug 200659053

Change-Id: I8b1c9b9d74ed59b4130fea712f970b4a31a8b4fe
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2429042
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:48 -06:00
tkudav
e962ec3fa0 gpu: nvgpu: Set PC sampling HAL to NULL for GP10b+
Pascal+ chips do not support updating PC sampling using register
NV_CTXSW_MAIN_IMAGE_PM (Unlike GM20B, bit 6 = PC_SAMPLING is not
present on GP10b, GV11b and TU104). To correct this in NVGPU, we
are setting the set_pc_sampling HAL to NULL.

We need to make sure devtools also does not call into
these APIs. Until the devtools team updates their code, we would
return success(0) from update_pc_sampling API even if the HAL is
set to NULL. Filed http://nvbugs/200671026 for devtools team.

Bug 200604892
Bug 200671026

Change-Id: I6334d4b2a84d7a0f676d7e2faad4befde5f76310
Signed-off-by: tkudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2437002
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
94bc3a8135 gpu: nvgpu: rearch zbc code and update hals
Update nvgpu_gr_zbc as:
struct nvgpu_gr_zbc {
   struct nvgpu_mutex zbc_lock;	/* Lock to access zbc table */
   struct zbc_color_table *zbc_col_tbl; /* SW zbc color table pointer */
   struct zbc_depth_table *zbc_dep_tbl; /* SW zbc depth table pointer */
   struct zbc_stencil_table *zbc_s_tbl; /* SW zbc stencil table pointer */
   u32 min_color_index;	/* Minimum valid color table index */
   u32 min_depth_index;	/* Minimum valid depth table index */
   u32 min_stencil_index;	/* Minimum valid stencil table index */
   u32 max_color_index;	/* Maximum valid color table index */
   u32 max_depth_index;	/* Maximum valid depth table index */
   u32 max_stencil_index;	/* Maximum valid stencil table index */
   u32 max_used_color_index; /* Max used color table index */
   u32 max_used_depth_index; /* Max used depth table index */
   u32 max_used_stencil_index; /* Max used stencil table index */
};

Add global struct nvgpu_gr_zbc_table_indices
struct nvgpu_gr_zbc_table_indices {
       u32 min_color_index;
       u32 min_depth_index;
       u32 min_stencil_index;
       u32 max_color_index;
       u32 max_depth_index;
       u32 max_stencil_index;
};

Currently, hw zbc table registers are written during both
gr_init_setup_sw() and gr_init_setup_hw().
- Modify nvgpu_gr_zbc_load_default_table() to
nvgpu_gr_zbc_load_default_sw_table() to only update sw copy of zbc table
during gr_init_setup_sw().
- Modify nvgpu_gr_zbc_load_table() to write zbc values stored in sw zbc
table to hw registers.

Re-structure zbc function as per zbc type i.e. color, depth and stencil.

Add gr.zbc.init_table_indices() hal to initialize zbc indices. Valid ZBC
table indices start from 1. HW indices start from 0 for color, depth and
stencil tables. Note that the corresponding format registers follow ZBC
index range starting at 1.
- void (*init_table_indices)(struct gk20a *g,
	struct nvgpu_gr_zbc_table_indices *zbc_indices);
- Add corresponding functions for legacy chips
- Add zbc color, depth and stencil table size hw defines
- Remove ltc.zbc_table_size() hal
- Update ltc.set_zbc_s_entry(), ltc.set_zbc_color_entry and
ltc.set_zbc_depth_entry() accordingly.

Bug 3122410
Bug 3122649

Change-Id: Ib799991ad35c6613534c0a6eb07f3bf24e600dc5
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417620
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
673cd507a8 gpu: nvgpu: add mm gops to get default va size
Currently, default va aperture size, user size and kernel size are
defined as fixed macros. However, max va bits can be chip specific.
Add below mm gops API to obtain default aperture, user and/or kernel
virtual memory size.
void (*get_default_va_sizes)(u64 *aperture_size,
		u64 *user_size, u64 *kernel_size);

JIRA NVGPU-5302

Change-Id: Ie0c60ca08ecff6613ce44184153bda066803d7d9
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414840
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
dd9298c959 gpu: nvgpu: move perf unit accesses to common.perf unit
Below HALs are implemented in common.gr unit, but they really belong
to common.perf unit since they access registers from perf unit.
gops.gr.init_hwpm_pmm_register()
gops.gr.get_num_hwpm_perfmon()
gops.gr.set_pmm_register()
gops.gr.reset_hwpm_pmm_registers()

Move them to common.perf unit, and update all the code accordingly
gops.perf.init_hwpm_pmm_register()
gops.perf.get_num_hwpm_perfmon()
gops.perf.set_pmm_register()
gops.perf.reset_hwpm_pmm_registers()

Add new HAL gops.gr.get_pm_ctx_buffer_offsets() and set it to
gr_gk20a_get_pm_ctx_buffer_offsets() for all chips.

Bug 2510974
Jira NVGPU-5360

Change-Id: Ib5e84ed5c8b6e72cc6923161e55fc2c3a6a4070e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418306
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Antony Clince Alex
7b7f42bd33 gpu: nvgpu: add gr ops find_priv_offset_in_buffer
Convert gr_gk20a_find_priv_offset_in_buffer into hal function
gops.gr.find_priv_offset_in_buffer. This is done in-order to facilitate
nvgpu-next to transition into a new ctxsw buffer layout.

Bug 2761598
Jira NVGPU-6008

Change-Id: Id294be628944daad7f9afa68214d98d87bbbf68c
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403708
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
db20451d0d gpu: nvgpu: fix pmm chiplet offsets
gr_gv100_init_hwpm_pmm_register() and gr_gv100_set_pmm_register() right
now assume common chiplet stride for all sys/fbp/gpc and use common API
g->ops.perf.get_pmm_per_chiplet_offset() to get the stride.

Chiplet strides are same for all partitions only by chance, and future
chip might change that.

Hence add and use below 3 separate HALs to get appropriate strides.
g->ops.perf.get_pmmsys_per_chiplet_offset()
g->ops.perf.get_pmmgpc_per_chiplet_offset()
g->ops.perf.get_pmmfbp_per_chiplet_offset()

Also store sys/fbp/gpc perfmon count in struct gk20a after first query
instead of querying them again and again. Querying the counts from HW
is time consuming.

Bug 2510974
Jira NVGPU-5360

Change-Id: I186009221009780d561617c0cd6f535854db585f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2413108
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
e0dd79cd43 gpu: nvgpu: rearch mc reset and enable hals
Remove current mc hals
- mc.reset()
- mc.enable()
- mc.disable()
- mc.reset_mask()
- mc.reset_engine()
- mc.reset_engine_enable()

Add new mc hals
- mc.enable_units(g, units, enable)
  > enable/disable given unit(s)
- mc.enable_dev(g, dev, enable)
  > enable/disable engine represented by given device pointer
- mc.enable_devtype(g, devtype)
  > enable/disable all engines of given devtype

Move common mc intr functions to common/mc/mc_intr.c.
Add below common mc functions
- nvgpu_mc_reset_units(g, units)
  > reset given logical OR of nvgpu unit bitmap
- nvgpu_mc_reset_dev(g, dev)
  > reset given single engine via dev
  > if engine is graphics, reset gpcs for nvgpu_next
- nvgpu_mc_reset_devtype(g, devtype)
  > reset all engines of given devtype
  > if devtype is graphics, reset gpcs for nvgpu_next

Bug 200648985
Bug 3109773

Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
3b746dce0c gpu: nvgpu: use a falcon flag instead of enabled bit
common.gr unit right now makes use of a capability bit
NVGPU_PMU_FECS_BOOTSTRAP_DONE to ensure the recovery path hits a
different routine. This is actually needless and a common check
cannot be used for all GR instances anyways.

Delete this capability bit. Add and use a new flag
coldboot_bootstrap_done added under struct nvgpu_gr_falcon

Jira NVGPU-5648

Change-Id: I46faea6f07cf054f17a3215d4cbbe0fc8a6382ae
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2409533
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Debarshi Dutta
cdacc2e2b2 gpu: nvgpu: reorganize HAL for gm20b, gp10b, tu104
Designated initializers with nested structs should not be used to
avoid a known problem in the qnx compiler that results in incorrect
values used for some fields.

Remove nested structs initialization and instead perform
runtime initialization for GM20B, GP10B and TU104 HAL assignments.

Change-Id: I6c94f85c7d6f7e279206bff7bd3535f56a377494
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401399
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
a2809088eb gpu: nvgpu: remove unnecessary hal gops.gr.gr_enable_hw()
gops.gr.gr_enable_hw() is a common function and not referred on vGPU.
Remove HAL pointer and directly use nvgpu_gr_enable_hw() instead.

Jira NVGPU-5648

Change-Id: Id031024ed01f9d890cffb5902cc433800810b219
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403548
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
8cccb49bd2 gpu: nvgpu: collapse nvgpu_gr_prepare_sw into nvgpu_gr_alloc
common.gr unit exports a separate API nvgpu_gr_prepare_sw to
initialize some SW pieces required for nvgpu_gr_enable_hw().
A separate API is really unnecessary since same initialization
can be performed in nvgpu_gr_alloc().

Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw().
Initialize falcon and interrupt structures in loop from
nvgpu_gr_alloc().

Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to
common init path since netlist parsing need not be done from
common.gr unit. It just needs to happen before nvgpu_gr_enable_hw().

Also, trigger nvgpu_gr_free() from gr_remove_support() instead
of OS specific paths. Also remove nvgpu_gr_free() calls from
probe error paths since nvgpu_gr_alloc is no longer called in
probe path.

Move interrupt and falcon data structure free calls to nvgpu_gr_free().

Also remove corresponding unit testing code that tests
nvgpu_gr_prepare_sw() specifically.
Update some unit tests to initialize ecc counters and netlist.
Disable some unit tests that fail for reasons unknown.

Jira NVGPU-5648

Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fba96fdc09 gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device
Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.

Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:

  - The enum_type that engine_info uses is defined in engines.h and
    has a bit of SW abstraction - in particular the GRCE type. The only
    place this seemed to be actually relevant (the IOCTL providing device
    info to userspace) the GRCE engines can be worked out by comparing
    runlist ID.
  - Addition of masks based on intr_id and reset_id; those can be
    computed easily enough using BIT32() but this is an area that
    could be improved on.

This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.

JIRA NVGPU-5421

Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
48f1da4dde gpu: nvgpu: Add bundle skip sequence in MIG mode
In MIG mode, 2D, 3D, I2M and ZBC classes are not supported
by GR engine. So skip those bundle programming sequence in MIG mode.

JIRA NVGPU-5648

Change-Id: I7ac28a40367e19a3e31e63f3e25991c0ed4d2d8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397912
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Lakshmanan M
2a6fcec078 gpu: nvgpu: add gr manager ops-2 and mig infra-2
This CL covers the code changes related to following support,
 - Enabled gr manager ops.
 - Added gr manager init/remove support.
 - Refactor in gpu instance config infra.
 - Refactor in gr syspipe gpcs config infra.

JIRA NVGPU-5645
JIRA NVGPU-5646

Change-Id: Ib2fab2796d76fe105fc5a08f2c5f9bfa36317f7c
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393550
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Seema Khowala
9ea21459b4 gpu: nvgpu: pascal+: trigger_suspend, wait_for/resume_from _pause set to NULL
- NvRmGpuDeviceSetSmDebugMode uses regops interface.
- NvRmGpuDeviceTriggerSuspend, NvRmGpuDeviceWaitForPause,
  and  NvRmGpuDeviceResumeFromPause should return error on Pascal+. Use
  regops interface to suspend/resume.
- On non-cilp devices(Maxwell), NvRmGpuDeviceTriggerSuspend,
  NvRmGpuDeviceWaitForPause, NvRmGpuDeviceResumeFromPause and
  NvRmGpuDeviceSetSmDebugMode are used when debugger(including coredump,
  memcheck) is attached or when CUDA application uses a syscall that
  requires traphandler(assert, cnp).

Bug 2558022
Bug 2559631
Bug 2706068
JIRA NVGPU-5502

Change-Id: I9eb2ab0c8c75c50f53523d8bf39c75f98b34f3f0
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376159
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
c99afa1766 gpu: nvgpu: add gr manager and mig infra
This CL covers the code changes related to following support,
 - Added gr manager infra.
 - Added grmgr_gops infra.
 - Added mig infra.
 - Added log mask for MIG verbose support.

JIRA NVGPU-5645
JIRA NVGPU-5646

Change-Id: Iec356e08e6cfee86ad9f59fdf6cfee9c38231359
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385111
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
9d723a5f1f gpu: nvgpu: add knob to control fecs_trace feature
Currently, NVGPU_SUPPORT_FECS_CTXSW_TRACE enabled flag is set to true
when fecs_trace s/w setup is executed successfully. Sometimes,
fecs_trace is required to be disabled for debugging. This change will
help disable/enable fecs_trace feature by modifying one of the enabled
flags.
Enable NVGPU_SUPPORT_FECS_CTXSW_TRACE during chip specific hal init.
Control fec_trace init and ctxsw dev open depending on
NVGPU_SUPPORT_FECS_CTXSW_TRACE flag status.

JIRA NVGPU-5616

Change-Id: Id0754a5af7cd95a67a1f0ae5de36115d44e1111b
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2357501
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
f34711d3de gpu: nvgpu: split perfbuf initialization
gk20a_perfbuf_map() allocates perfbuf VM, maps the user buffer into new
VM, and then triggers gops.perfbuf.perfbuf_enable(). This HAL then does
following :
- Allocate perfbuf instance block
- Initialize perfbuf instance block
- Reset stream buffer
- Program instance block address in PMA registers
- Program user buffer address into PMA registers

New profiler interface will have it's own API to setup PMA strem, and
it requires above setup to be done in two phases of perfbuf
initialization and then user buffer setup.

Split above functionalities into below functions
- nvgpu_perfbuf_init_vm()
  - Allocate perfbuf VM
  - Call gops.perfbuf.init_inst_block() to initialize perfbuf instance
    block

- gops.perfbuf.init_inst_block()
  - Allocate perfbuf instance block
  - Initialize perfbuf instance block
  - Program instance block address in PMA registers using
    gops.perf.init_inst_block()
  - In case of vGPU, trigger TEGRA_VGPU_CMD_PERFBUF_INST_BLOCK_MGT
    command to gpu server

- gops.perf.init_inst_block()
  - Reset stream buffer
  - Program user buffer address into PMA registers

Also add corresponding cleanup functions as below :
gops.perf.deinit_inst_block()
gops.perfbuf.deinit_inst_block()
nvgpu_perfbuf_deinit_vm()

Bug 2510974
Jira NVGPU-5360

Change-Id: I486370f21012cbb7fea84fe46fb16db95bc16790
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2372984
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
smadhavan
3b560b5757 gpu: nvgpu: set gr.falcon.bind_instblk ops to NULL
While booting LS falcons, gr.falcon.bind_instblk gops is
used to bind WPR VA to gr falcon. Only FECS_METHOD must be
used to bind instblks. But at this point FECS falcon is not loaded
and running. Hence FECS_METHOD cannot be used to bind this instblk.

Besides that, this code is not required
for successful falcon boot and functioning of chips other
than gm20b.

JIRA NVGPU-5323

Change-Id: I148ccc77d65d5f01adbba6261369e7a292dccfc3
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369736
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
359fc24aaf gpu: nvgpu: Rework engine management to work with vGPU
Currently the vGPU engine management rewrites a lot of the common
device agnostic engine management code.

With the new top HAL parsing one device at a time, it is now more
easily possible to tie the vGPU into the new common device framework
by implementing the top HAL but with the vGPU engine list backend.

This lets the vGPU inherit all the common engine and device
management code. By doing so the vGPU HAL need only implement a
trivial and simple HAL.

This also gets us a step closer to merging all of the CE init
code: logically it just iterates through all CE engines whatever
they may be. The only reason this differs between chips is because
of the swap from CE0-2 to LCEs in the Pascal generation. This could
be abstracted by the unit code easily enough.

Also, the pbdma_id for each engine has to be added to the device
struct. Eventually this was going to happen anyway, since the
device struct will soon replace the nvgpu_engine_info struct.
It's a little bit of an abuse but might be worth it long term. If
not, it should not be difficult to replace uses of dev->pbdma_id
with a proper lookup of PBDMA ID based on the device info.

JIRA NVGPU-5421

Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
04a179a161 gpu: nvgpu: del gr.get_lrf_tex_ltc_dram_override
Delete unused gr gops get_lrf_tex_ltc_dram_override().

Jira NVGPU-5755

Change-Id: Ic8f8e8de8066325109c0284f0f620accdd81db7b
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368974
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
08308bc936 gpu: nvgpu: rework pm resource reservation system
Current PM resource reservation system is limited to HWPM resources
only. And reservation tracking is done using boolean variables.

New upcoming profiler support requires reservation for all the PM
resources like SMPC and PMA stream. Using boolean variables is
not scalable and confusing. Plus the variables have to be replicated
on gpu server in case of virtualization.

Remove flag tracking mechanism and use list based approach to track
all PM reservations. Also, current HALs are defined on debugger object.
Implement new HALs in new pm_reservation object since it is really an
independent functionality.

Add new source file common/profiler/pm_reservation.c which implements
functions to reserve/release resources and to check if any resource
is reserved or not.
Add common/vgpu/pm_reservation_vgpu.c for vGPU which simply forwards
the request to gpu server.

Define new HAL object gops.pm_reservation and assign above functions
to below respective HALs :
g->ops.pm_reservation.acquire()
g->ops.pm_reservation.release()
g->ops.pm_reservation.release_all_per_vmid()

Last HAL above is only used for gpu server cleanup of guest OS.

Add below new common profiler functions that act as APIs to reserve/
release resources for rest of the units in nvgpu.
nvgpu_profiler_pm_resource_reserve()
nvgpu_profiler_pm_resource_release()

Initialize the meta data required for reservtion system in
nvgpu_pm_reservation_init() and call it during nvgpu_finalize_poweron.
Clean up the meta data before releasing struct gk20a.

Delete below HALs :
g->ops.debugger.check_and_set_global_reservation()
g->ops.debugger.check_and_set_context_reservation()
g->ops.debugger.release_profiler_reservation()

Bug 2510974
Jira NVGPU-5360

Change-Id: I4d9f89c58c791b3b2e63099a8a603462e5319222
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367224
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
1ff79b1d2c gpu: nvgpu: remove support for quad reg_op
quad type reg_ops were only needed on Kepler, and not for any other chip
beginning Maxweel.

HAL g->ops.gr.access_smpc_reg() was incorrectly set for Volta and Turing
whereas it was only applicable to Kepler. Delete it.

There is no register in the quad type whitelist since the type itself is
not supported anymore. Remove the empty whitelists for all chips and
also delete below HALs:
g->ops.regops.get_qctl_whitelist()
g->ops.regops.get_qctl_whitelist_count()

hal/regops/regops_gv100.* files are not used anymore. Delete the files
instead of just deleting quad HALs in these files.

Bug 200628391

Change-Id: I4dcc04bef5c24eb4d63d913f492a8c00543163a2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366035
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fbb6a5bc1c gpu: nvgpu: Remove fifo->pbdma_map
The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists.
This array allows other software to query what PBDMA(s) serves a given
runlist. The PBDMA map is read verbatim from an array of host registers.
These registers are stored in a kmalloc()'ed array.

This causes a problem for the device management code. The device
management initialization executes well before the rest of the FIFO
PBDMA initialization occurs. Thus, if the device management code
queries the PBDMA mapping for a given device/runlist, the mapping has
yet to be populated.

In the next patches in this series the engine management code is subsumed
into the device management code. In other words the device struct is
reused by the engine management and all host SW does is pull pointers to
the host managed devices from the device manager. This means that all
engine initialization that used to be done on top of the device
management needs to move to the device code.

So, long story short, the PBDMA map needs to be read from the registers
directly, instead of an array that gets allocated long after the device
code has run.

This patch removes the pbdma map array, deletes two HALs that managed
that, and instead provides a new HAL to query this map directly from
the registers so that the device code can use it.

JIRA NVGPU-5421

Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Alex Waterman
194fac7f3c gpu: nvgpu: Remove clutter in engine code
Remove the get_mask_on_id() HAL and replace it's usage with the
global nvgpu_engine_get_mask_on_id() function. There's no need
to have this function as a HAL.

JIRA NVGPU-5420

Change-Id: I4fc843beff8e65806da26a0addc83fa218d390ac
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361315
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
fb1433811c gpu: nvgpu: modify gr.falcon.dump_stats
- Add gm20b_gr_falcon_gpccs_dump_stats() to print gpccs context switch
mailbox register values for all gpcs.
- Make gm20b_gr_falcon_fecs_dump_stats() a static function
- Add gm20b_gr_falcon_dump_stats() to trigger
gm20b_gr_falcon_fecs_dump_stats() and gm20b_gr_falcon_gpccs_dump_stats()
- Update legacy chips gr.falcon.dump_stats() to
gm20b_gr_falcon_dump_stats().

JIRA NVGPU-5597

Change-Id: I992c6432f3c2e3049bacc953f9b53ff6c4aa2f36
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2357470
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seema Khowala <seemaj@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
319520ff57 gpu: nvgpu: Add a new device manager unit
This adds a new device management unit in the common code responsible
for facilitating the parsing of the GPU top device list and providing
that info to other units in nvgpu.

The basic idea is to read this list once from HW and store it in a
set of lists corresponding to each device type (graphics, LCE, etc).
Many of the HALs in top can be deleted and instead implemented using
common code parsing the SW representation.

Every time the driver queries the device list it does so using a
device type and instance ID. This is common code. The HAL is responsible
for populating the device list in such a way that the driver can
query it in a chip agnostic manner.

Also delete some of the unit tests for functions that no longer
exist. This code will require new unit tests in time; those should be
quite simple to write once unit testing is needed.

JIRA NVGPU-5421

Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Debarshi Dutta
f6298157bc gpu: nvgpu: update HAL function name
get_ecc_override_val is renamed to gp10b_gr_get_ecc_override_val to
follow the naming convention of the HAL functions correctly.

Change-Id: I80e495e8bec3483651e325b792708382e5327de0
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2357925
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sami Kiminki
d44960d424 gpu: nvgpu: add PDI reporting for GP10B (Linux)
Read the T186 SoC PDI fuse registers to retrieve the per-device
identifier for GP10B.

Bug 2957580

Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Change-Id: Ie5031a005ca251636614d27c2dc77bddfce0ea21
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2350930
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
2d94863cae gpu: nvgpu: move is_tpc_addr and get_tpc_num to common
gr.is_tpc_addr() and gr.get_tpc_num() are chip agnostic hals. Move these
hals to common code.

Jira NVGPU-5504

Change-Id: I50fa7ac876c8667de42df1830bd412b412538508
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2349272
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
2077df9b1a gpu: nvgpu: use set_syncpt only with nvhost
nvgpu_channel_set_syncpt() is not useful if nvhost and thus syncpts are
missing and semaphores are used for synchronization. Require
CONFIG_TEGRA_GK20A_NVHOST to be set for the set_syncpt hal.

Jira NVGPU-5496

Change-Id: Ief8b4a0fb29af631817aba55c04181b1a360ce56
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2344064
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00