Commit Graph

175 Commits

Author SHA1 Message Date
Divya
ae2d561c48 gpu: nvgpu: add platform support for Static PG
- Add support for taking static PG config values
  from DT nodes
- Check those values against valid set of values
  for GPC, TPC and FBP
- Store valid values in g->gpc_pg_mask, g->fbp_pg_mask
  and g->tpc_pg_mask[] array.

Bug 200768322
JIRA NVGPU-6433

Change-Id: Ifc87e7d369034b1daa13866bc16a970602514bf6
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2594802
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-25 15:47:25 -07:00
Divya
9266da636b gpu: nvgpu: update static pg support for pre-si
- On pre-silicon platform, static pg will be
  done by nvgpu driver. For this, retain structs
  and HALs of static pg.
- Add the static pg support under pre-silicon code.
- On silicon, the static pg will be done by BPMP.
- Rename variables used in static pg for better
  readability and consistency

Bug 200768322
JIRA NVGPU-6433

Change-Id: Ib31c0f83b751c2b1563a36bd51af78a0bd12a117
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2594801
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-21 13:40:11 -07:00
Debarshi Dutta
9328f057a7 gpu: nvgpu: fix use-after-free use case of CE APP.
The following issue is reported when running sudo modprobe -r nvgpu

[  134.066392] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[  134.066428] Mem abort info:
[  134.066431]   ESR = 0x96000004
[  134.066434]   EC = 0x25: DABT (current EL), IL = 32 bit
[  134.066450] [0000000000000058] pgd=0000000000000000, p4d=0000000000000000
[  134.066459] Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP

[  134.066639] pc : nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu]
[  134.066847] lr : nvgpu_cic_rm_wait_for_stall_interrupts+0x74/0xd0 [nvgpu]
[  134.067043] sp : ffff80001971ba80
[  134.067046] x29: ffff80001971ba80 x28: ffff000093b0da00
[  134.067054] x27: 0000000000000000 x26: ffff80001c28b990
[  134.067061] x25: ffff00008cd01000 x24: 0000000000000bb8
[  134.067067] x23: 0000000000000000 x22: ffff0000915b0000
[  134.067073] x21: ffff000093b0da00 x20: ffff0000915b0000
[  134.067079] x19: ffff0000915b0000 x18: 0000000000000036
[  134.067085] x17: 0000000000000000 x16: 0000000000000000
[  134.067091] x15: ffff8000126b5fd8 x14: 7373616c633d4d45
[  134.067097] x13: ffff8000098abef0 x12: 0000000000000000
[  134.067102] x11: ffff8000098ab5a0 x10: ffff8000098abef8
[  134.067108] x9 : ffff80001010e844 x8 : ffff80001971ba48
[  134.067115] x7 : 2222222222222222 x6 : ffff000093b0da00
[  134.067122] x5 : ffff8000098b1fd8 x4 : 0000000000000000
[  134.067127] x3 : 0000000000000000 x2 : 0000000000000000
[  134.067133] x1 : 0000000000000000 x0 : 0000000000000000
[  134.067138] Call trace:
[  134.067140]  nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu]
[  134.067328]  nvgpu_cic_rm_wait_for_deferred_interrupts+0x20/0xb0 [nvgpu]
[  134.067517]  nvgpu_channel_deferred_reset_engines+0x29c/0x920 [nvgpu]
[  134.067714]  nvgpu_channel_close+0x18/0x20 [nvgpu]
[  134.067904]  nvgpu_init_pramin+0x2ac/0x350 [nvgpu]
[  134.068092]  nvgpu_ce_app_destroy+0x94/0xe0 [nvgpu]
[  134.068279]  nvgpu_put+0x90/0x120 [nvgpu]
[  134.068465]  nvgpu_pci_shutdown+0x29c/0x18a0 [nvgpu]
[  134.068655]  pci_device_remove+0x44/0xe0
[  134.068665]  device_release_driver_internal+0x114/0x1f0
[  134.068701]  driver_detach+0x54/0xe0
[  134.068709]  bus_remove_driver+0x70/0x120
[  134.068733]  driver_unregister+0x34/0x60

The above issue occurs due to freeing of CIC resources earlier than
dependent users of interrupts e.g. CDE, CE etc.

As a solution, move CIC deinit sequence to end of nvgpu_put.
This handles deinit properly for VGPU/IGPU/DGPU.

Bug 200763510

Change-Id: I696e31d5e03a9468cccfe710048000dbf7cf0269
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592063
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-16 21:45:43 -07:00
Tejal Kudav
5a94007725 gpu: nvgpu: Remove redundant HAL from common.fbp
common.fbp has two interfaces to initialize FBP:
1. Public API nvgpu_fbp_init_support
2. HAL fbp.fbp_init_support

nvgpu_fbp_init_support() is only used to initialize HAL
fbp.fbp_init_support. Remove the HAL and use the API directly.

JIRA NVGPU-6644

Change-Id: I2c455e09dbcf5e4fb1dc370b284e4f0d5c678b40
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592047
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-16 05:59:00 -07:00
ajeshkv
118f8c1280 gpu: nvgpu: add support for gsp stress test
Add debugfs entries to support GSP stress test and other
functionalities to enable the test.

JIRA CORERM-3382

Change-Id: Iab20fcfe78807e76e91c64716502a2f036ed4d18
Signed-off-by: ajeshkv <akv@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589390
Reviewed-by: Amit Pabalkar <apabalkar@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-10 16:02:43 -07:00
dt
9355345610 gpu: nvgpu: Add IPA-PA cache to increase the performance
When GPU need to programmed with PA(physical address),
given IPA need to be converted to PA by querying Hypervisor.
As this is an IPC between OSes, the call will reduce the
performance badly. So this is adding a IPA-PA cache to improve
the performance. This will be more helpful in passthr config.

Bug 3277194

Change-Id: I6a3230d858977313a0ed0f33068055a3b516330a
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2571814
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 10:28:58 -07:00
Ramesh Mylavarapu
ffd0d3962f gpu: nvgpu: gsp: gsp isr and debug trace support
- Created GSP NVRISCV interrupt handle and
  respective functions and register reads.
- Created Debug trace support for GSP firmware.

NVGPU-7084

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I2728150c4db00403aa6e3c043bc19c51677dd9cf
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589430
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 05:37:51 -07:00
Debarshi Dutta
608decf1e6 gpu: nvgpu: add support for powering off gpu
Add support for powering off IGPU for switching between
legacy to SMC mode/vice-versa or changing SMC configuration.
The power off can be issued as follows

echo 0 > /dev/nvgpu/igpu0/power

The following steps are done during a poweroff.
1) Deterministic channel idle
2) Acquire write_lock on l->busy semaphore.
3) Wait till power_usage decrements to indicate 0 active jobs.
4) Invoke pm_runtime_put_sync_suspend()
5) Invoke nvgpu_gr_remove_support() to clear existing GR memory.
6) Release write_lock on l->busy
7) Deterministic channel unidle.

Part of the sequence matches that of the gk20a_do_idle code.
The common parts are extracted into new functions
gk20a_block_new_jobs_and_idle() and gk20a_unblock_jobs()

For joint-rail case, the current implementation, does a railgate
and then sets pm_runtime_set_autosuspend_delay(-1) to disable
regular runtime resume/suspend.

Remove clearing of NVGPU_SUPPORT_MIG status during state change
ias it leads to inconsistencies.

Jira NVGPU-6920

Change-Id: I0b3eb3278176122ac061c1e8a94ebfb3c17c3925
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578501
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:50 -07:00
Debarshi Dutta
2e3c3aada6 gpu: nvgpu: fix deinit of GR
Existing implementation of GR de-init doesn't account for multiple
instances of struct nvgpu_gr. As a fix, below changes are added.

1) nvgpu_gr_free is unified for VGPU as well as native.
2) All the GR instances are freed.
3) Appropriate NULL checks are added when freeing GR memories.
4) 2D, 3D, I2M and ZBC etc are explicitely disabled when MIG is set.
5) In ioctl_ctrl, checks are added to not return error when zbc is NULL
   for VGPU as requests are rerouted to RMserver.

Jira NVGPU-6920

Change-Id: Icaa40f88f523c2cdbfe3a4fd6a55681ea7a83d12
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578500
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:45 -07:00
Tejal Kudav
b33079d47e gpu: nvgpu: Move intr data members from MC to CIC
Move interrupt specific data-members from common.mc to common.cic
Some of these data members like sw_irq_stall_last_handled_cond need
To be initialized much earlier during the OS specific init/probe stage.
Also, some more members from struct nvgpu_interrupts(like stall_size,
stall_lines[]), which will soon be moved to CIC will also need to be
initialized early during the OS specific probe stage.
However, the chip specific LUT can only be initialized after the
hal_init stage where the HALs are all initialized.
Split the CIC init to accommodate the above initialization requirements.

JIRA NVGPU-6899

Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 18:06:28 -07:00
Divya Singhatwaria
842bef7124 gpu: nvgpu: Support GPC and FBP Floorsweeping
- Add gops_fbp_fs and gops_gpc_pg struct
- Add HALs to write to NV_FUSE_CTRL_OPT_FBP and
  NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping
- Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask
  respectively during gpu probe
- Add sysfs node: fbp_fs_mask and gpc_fs_mask to store
  FBP and GPC floorsweeping mask sent from userspace
- Move the floorsweeping programming early in NVGPU’s GPU init
  function and then issue a PRI init.

JIRA NVGPU-6433

Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 06:17:25 -07:00
Divya Singhatwaria
9f30609550 gpu: nvgpu: Rename TPC powergating mutex
Rename tpc_pg_lock to static_pg_lock and
have_tpc_pg_lock to have_static_pg_lock as it
is used for tpc/gpc/fbp power gating.

JIRA NVGPU-6433

Change-Id: I4c56b9710e303ad9e872bad4b5ed9a167acb9dd6
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2537489
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-18 02:46:25 -07:00
Ramesh Mylavarapu
d328bff79e gpu: nvgpu: gsp NVRISCV load and bootstrap
Changes:
- This change will only init gsp software
  state, nvgpu_gsp_bootstrap need to be called.
- CONFIG_NVGPU_GSP_SCHEDULER flag is created to
  compile out the gsp scheduler code when needed.
- Created GSP engine reset which is needed when
  ACR completed execution and need to load gsp fw.

NVGPU-6783

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I2ce43e512b01df59443559eab621ed39868ad158
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554267
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-15 17:21:03 -07:00
Vedashree Vidwans
43980bfe06 gpu: nvgpu: remove nvgpu_is_bpmp_running usage
BPMP driver doesn't support any API to check whether bpmp is running.
Remove use of nvgpu_is_bpmp_running.

Bug 200720732

Change-Id: Id266e65d4af598dd056cbdbaa219d0d53b7b3fb3
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556448
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-15 10:06:42 -07:00
Pekka Jylhä-Ollila
8a72068508 Revert "gpu: nvgpu: gsp NVRISCV load and bootstrap"
This reverts commit aef4b80acb.

Change-Id: I47e02bf97e6a3aaa9acdd7f5eec41518b31ee5dc
Signed-off-by: Pekka Jylhä-Ollila <pjylhaollila@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554105
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
2021-07-05 06:01:52 -07:00
Ramesh Mylavarapu
aef4b80acb gpu: nvgpu: gsp NVRISCV load and bootstrap
Changes:
- This change will only init gsp software
  state, nvgpu_gsp_bootstrap need to be called.
- CONFIG_NVGPU_GSP_SCHEDULER flag is created to
  compile out the gsp scheduler code when needed.
- Created GSP engine reset which is needed when
  ACR completed execution and need to load gsp fw.

NVGPU-6783

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I26263ee5bae07de056f676ed0fddc1193b5af82d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530438
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-04 13:34:51 -07:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
Richard Zhao
ff75647d59 gpu: nvgpu: unify power state management code
The management code of g->power_on_state on different OS are almost
same, so moved the code to the common place.

Jira GVSCI-10882

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I890015867b7bbdf3f749ab275ffd085ef76dfec2
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542846
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-06-23 09:26:49 -07:00
Lakshmanan M
19186c8a02 gpu: nvgpu: select map access type from dmabuf permission and user request
Add api to translate dmabuf's fmode_t to gk20a_mem_rw_flag
for read only/read write mapping selection.

By default dmabuf fd mapping permission should be a maximum
access permission associated to a particual dmabuf fd.

Remove bit flag MAP_ACCESS_NO_WRITE and add 2 bit values for
user access requests NVGPU_VM_MAP_ACCESS_DEFAULT|READ_ONLY|
READ_WRITE.

To unify map access type handling in Linux and QNX move the
parameter NVGPU_VM_MAP_ACCESS_* check to common function
nvgpu_vm_map.

Set MAP_ACCESS_TYPE enabled flag in common characteristics
init function as it is supported for Linux and QNX.

Bug 200717195
Bug 3250920

Change-Id: I1a249f7c52bda099390dd4f371b005e1a7cef62f
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2507150
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-21 14:48:32 -07:00
Lakshmanan M
7d473f4dcc gpu: nvgpu: Expose logical mask for MIG
1) Expose logical mask instead of physical mask when MIG is enabled.
   For legacy, NvGpu expose physical mask.
2) Added fb related info in struct nvgpu_gpu_instance().
4) Added utility api to get the logical id for a given local id
   nvgpu_grmgr_get_gr_gpc_logical_id()
5) Added grmgr api to get max_gpc_count
   nvgpu_grmgr_get_max_gpc_count().
5) Added grmgr's fbp api to get num_fbps and its enable masks.
   nvgpu_grmgr_get_num_fbps()
   nvgpu_grmgr_get_fbp_en_mask()
   nvgpu_grmgr_get_fbp_rop_l2_en_mask()
6) Used grmgr's fbp apis in ioctl_ctrl.c
7) Moved fbp_init_support() in nvgpu_early_init()
8) Added nvgpu_assert handling in grmgr.c
9) Added vgpu hal for get_max_gpc_count().

JIRA NVGPU-5656

Change-Id: I90ac2ad99be608001e7d5d754f6242ad26c70cdb
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2538508
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-06-10 03:05:21 -07:00
Tejal Kudav
9f43914933 gpu: nvgpu: Move Intr handling common code to CIC
CIC (Central Interrupt controller) will be responsible for the
interrupt handling. common.cic unit is the placeholder for all
interrupt related code. Move interrupt related defines and
Public APIs present in common.mc to common.cic.
Note: The common.mc interrupts related struct definitions are
not moved as part of this patch.

Adapt the code to use interrupt handling related defines and public
APIs migrated from common.mc to common.cic

JIRA NVGPU-6899

Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-31 19:37:31 -07:00
Tejal Kudav
e0a1fcf5f5 gpu: nvgpu: Add Central Intr Controller unit
Add a new Central Interrupt Controller(CIC) unit in common code.
The interrupt handling is done in a distributed manner currently.
The error handling policy for different errors resides in each unit's
ISR code. The goal is to converge this data under one central place -
the CIC unit.

This patch creates framework for CIC unit and moves the gv11b QNX
safety LUT to CIC unit. All the error reporting APIs from different
units are also moved to CIC.

New APIs are exposed by CIC unit to access its internal data like:
  1. Struct err_desc - the static err handling /injection data per
                       error id
  2. Num_hw_modules  - the number of error reporting HW units
                       supported by CIC

Init and deinit of CIC unit:
  1. CIC unit should be initialized earlyon during boot so that it
     is available for any interrupt handling.
  2. Initialize CIC just before the interrupts are enabled during
     boot.
  3. Similarly, CIC is disabled late during deinit cycle; right
     after the interrupts are masked.

LUT:
  1. LUT is currently used only for reporting error to safety
     services in gv11b QNX safety build.
  2. This error handling policy LUT currently has only two levels
     of handing - correctable and quiecse.
  3. Once, the error handling policy decision is moved from leaf
     unit nodes to CIC, LUT will be updated to have additional levels
     like fast recovery and full recovery.
  4. Also, then a separate LUT will be added for each platform/build.
  5. In current framework, the LUT is set to NULL for all
     configurations except gv11b.

report_err() ops is added to report error to safety services.
This ops is only effective for gv11b qnx build; and set to NULL for
other configurations.

NVGPU-6521
NVGPU-6523
NVGPU-6750
NVGPU-6758
NVGPU-6760
NVGPU-6754

Change-Id: I24be7836a96d787741e37b732e19863ed8014635
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683
Reviewed-by: Ajesh K V <akv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-25 14:28:04 -07:00
Seshendra Gadagottu
85efe929ca gpu: nvgpu: prod programming for slcg timer unit
Added init function for common.ptimer unit and called
this init function during nvgpu early init.
int nvgpu_ptimer_init(struct gk20a *g);

Added following helper function for programming
prod values for slcg timer unit:
void nvgpu_cg_slcg_timer_load_enable(struct gk20a *g);

Invoked prod programming for slcg timer unit from
nvgpu_ptimer_init.

Jira NVGPU-6026

Change-Id: I29e32380a4d05ec8276d7ebe59bc2733917f8184
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524037
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-19 04:06:43 -07:00
ajesh
b15bd97c08 gpu: nvgpu: fix misra violation in bug unit
Modify the callback interface from bug to quiesce unit to remove
a possible cyclic dependency in the bug unit. Make the list of
callbacks from bug unit, UT specific. The quiesce callback function
and argument are kept in separate variables, and in a normal run the
only callback that bug unit would invoke will be the quiesce specific
function. These changes will fix the violation of Rule 17.2 in bug unit.

JIRA NVGPU-6537

Change-Id: Icb6bc92077f8d26c87425768b09a7194a98e015d
Signed-off-by: ajesh <akv@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2527207
(cherry picked from commit 7696565648c5dd573a03be19ba9525856b781ea6)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530900
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-18 18:20:18 -07:00
Lakshmanan M
d956938d3f gpu: nvgpu: Add load_timestamp_prod in grmgr init
1) Moved load_timestamp_prod handling in nvgpu_init_gr_manager().
2) Moved fifo.reset_enable_hw in nvgpu_early_init() -
   In simulation/emulation/GPU standalone platform,
   XBAR, L2 and HUB are enabled during g->ops.fifo.reset_enable_hw().
   This introduces a dependency to get the MIG map conf information.
   (if nvgpu_is_bpmp_running() == false treated as
   simulation/emulation/GPU standalone platform).

Bug 3307879
JIRA NVGPU-6633

Change-Id: I4cba3a527de4723a6500f9658ec1dcadc23b37e3
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2528174
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-12 16:09:52 -07:00
Lakshmanan M
c041ad5b4b gpu: nvgpu: split nvgpu power on sequence into 2 stages
1) nvgpu poweron sequence split into two stages:
    - nvgpu_early_init() - Initializes the sub units
      which are required to be initialized before the grgmr init.
      For creating dev node, grmgr init and its dependency unit
      needs to move to early stage of GPU power on.
      After successful nvgpu_early_init() sequence,
      NvGpu can indetify the number of MIG instance required
      for each physical GPU.
    - nvgpu_finalize_poweron() - Initializes the sub units which
      can be initialized at the later stage of GPU power on sequence.

    - grmgr init depends on the following HAL sub units,
      * device - To get the device caps.
      * priv_ring - To get the gpc count and other
        MIG config programming.
      * fb - MIG config programming.
      * ltc - MIG config programming.
      * bios, bus, ecc and clk - dependent module of
        priv_ring/fb/ltc.

2) g->ops.xve.reset_gpu() should be called before GPU sub unit
   initialization. Hence, added g->ops.xve.reset_gpu() HAL in the
   early stage of dGPU power on sequence.

3) Increased xve_reset timeout from 100ms to 200ms.

4) Added nvgpu_assert() for gpc_count, gpc_mask and
   max_veid_count_per_tsg for identify the GPU boot
   device probe failure during nvgpu_init_gr_manager().

JIRA NVGPU-6633

Change-Id: I5d43bf711198e6b3f8eebcec3027ba17c15fc692
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521894
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-29 14:23:48 -07:00
Lakshmanan M
3f8c562004 gpu: nvgpu: Add nvgpu_early_poweron() support
1) NvGpu dev node needs to be created in gpu power on
early stage to avoid latency introduced by udevd.
For creating dev node, device and grmgr init
needs to move to early stage of GPU power on.
After grmgr init, NvGpu can identify the number of MIG
instance required for each physical GPU.
For that, added a new API nvgpu_early_poweron() to handle
early init which is required for before dev node creation.

2) Removed fifo dependency in nvgpu_init_gr_manager()

3) Used get_max_subctx_count() directly to query
the veid/subctx count.

JIRA NVGPU-6633

Change-Id: Ib9d7c3e184c71237b0da9305515ccd8ceda1d5ad
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2517173
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-22 15:00:54 -07:00
Seshendra Gadagottu
21e1328ea1 gpu: nvgpu: add fb gops for set_atomic_mode
Separated set_atomic_mode functionality from
init_fs_state/enable_nvlink and created new
fb gops for set_atomic_mode.

In gpu init sequence, set_atomic_mode is
called after acr_construct_execute to take care
of design changes required for nvgpu-next
architectures.

Updated fb_gv11b_init_test to use set_atomic_mode
gops along with init_fs_state.

Bug 3268664

Change-Id: I1ab9eb21cc4cce77f3325c4e8821a75b6e85fba2
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2508095
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-22 14:58:36 -07:00
absalam
3ec369d60a gpu: nvgpu: Disable Clock Arbitor for TU104
This patch is to disable the clock arbitor for TU104.
TU104 is not a POR for Drive 6.0 so disabling it to easy migration
of clk arb for GA100.
As a first step all the NVRM Clock tests will be skipped by setting
NVGPU_SUPPORT_CLOCK_CONTROLS to false for TU104.
Then clk arbitor will be rewritten for GA100 and enabled back.
This patch implements by adding a new flag NVGPU_CLK_ARB_ENABLED which
holds the status of clk arbitor for each platform and disables them for
TU104

Bug 200699763

Change-Id: I51cd5c7821bdc0b48080c17a70735925b278ddf5
Signed-off-by: absalam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515086
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-20 07:47:38 -07:00
Lili Sang
3f0ea98b73 gpu: nvgpu: Add get_gr_context support for Linux.
Implement the feature of retrieving gr context contents for all chips.
Two IOCTLs, NVGPU_DBG_GPU_IOCTL_GET_GR_CONTEXT_SIZE and _GET_GR_CONTEXT,
are added.

Bug 3102903

Change-Id: If11006f4e294f190785a2c3159ca491b9f3b5187
Signed-off-by: Lili Sang <lilis@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2449183
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Chris Johnson <cwj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:48 -06:00
Antony Clince Alex
c36752fe3d gpu: nvgpu: sim: make ring buffer independent of PAGE_SIZE
The simulator ring buffer DMA interface supports buffers of the following sizes:
4, 8, 12 and 16K. At present, it is configured to 4K and it  happens to match
with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once
4K is reached. However, this is not always true; for instance, take 64K pages.
Hence, replace PAGE_SIZE with SIM_BFR_SIZE.

Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace
latter with former.

Bug 200658101
Jira NVGPU-6018

Change-Id: I83cc62b87291734015c51f3e5a98173549e065de
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Prateek sethi
223baa5883 gpu: nvgpu: add support for ACB SLCG on gv11b
Register list for ACB SLCG is auto generated with scripts.
Add HAL operations to enable/disable ACB clock gating.

Bug 200647909

Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549
Signed-off-by: Prateek sethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
2ecb5feaad gpu: nvgpu: Skip graphics CB programming for MIG
Added logic to skip the following graphics CB allocation, map and
programming sequence when MIG is enabled.

Global CB:
1) NVGPU_GR_GLOBAL_CTX_CIRCULAR
2) NVGPU_GR_GLOBAL_CTX_PAGEPOOL
3) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE
4) NVGPU_GR_GLOBAL_CTX_CIRCULAR_VPR
5) NVGPU_GR_GLOBAL_CTX_PAGEPOOL_VPR
6) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE_VPR
7) NVGPU_GR_GLOBAL_CTX_RTV_CIRCULAR_BUFFER

CTX CB:
1) NVGPU_GR_CTX_CIRCULAR_VA
2) NVGPU_GR_CTX_PAGEPOOL_VA
3) NVGPU_GR_CTX_ATTRIBUTE_VA
4) NVGPU_GR_CTX_RTV_CIRCULAR_BUFFER_VA

JIRA NVGPU-5650

Change-Id: I38c2859ce57ad76c58a772fdf9f589f2106149af
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2423450
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Debarshi Dutta
38ce6fa717 gpu: nvgpu: change unnamed structs to named structs
Following changes are made in this patch.
1) Change unnamed structs within gpu_ops to named structs
with the prefix gops_*.

2) Each named struct gops_ are moved into a separate gops specific file
under include/nvgpu/gops/

3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h
and all other dependent struct gops_* are included in this header.

4) Direct references to include/nvgpu/gops are removed from files as its enough
to include gk20a.h.

Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
a2809088eb gpu: nvgpu: remove unnecessary hal gops.gr.gr_enable_hw()
gops.gr.gr_enable_hw() is a common function and not referred on vGPU.
Remove HAL pointer and directly use nvgpu_gr_enable_hw() instead.

Jira NVGPU-5648

Change-Id: Id031024ed01f9d890cffb5902cc433800810b219
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403548
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
8cccb49bd2 gpu: nvgpu: collapse nvgpu_gr_prepare_sw into nvgpu_gr_alloc
common.gr unit exports a separate API nvgpu_gr_prepare_sw to
initialize some SW pieces required for nvgpu_gr_enable_hw().
A separate API is really unnecessary since same initialization
can be performed in nvgpu_gr_alloc().

Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw().
Initialize falcon and interrupt structures in loop from
nvgpu_gr_alloc().

Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to
common init path since netlist parsing need not be done from
common.gr unit. It just needs to happen before nvgpu_gr_enable_hw().

Also, trigger nvgpu_gr_free() from gr_remove_support() instead
of OS specific paths. Also remove nvgpu_gr_free() calls from
probe error paths since nvgpu_gr_alloc is no longer called in
probe path.

Move interrupt and falcon data structure free calls to nvgpu_gr_free().

Also remove corresponding unit testing code that tests
nvgpu_gr_prepare_sw() specifically.
Update some unit tests to initialize ecc counters and netlist.
Disable some unit tests that fail for reasons unknown.

Jira NVGPU-5648

Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fba96fdc09 gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device
Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.

Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:

  - The enum_type that engine_info uses is defined in engines.h and
    has a bit of SW abstraction - in particular the GRCE type. The only
    place this seemed to be actually relevant (the IOCTL providing device
    info to userspace) the GRCE engines can be worked out by comparing
    runlist ID.
  - Addition of masks based on intr_id and reset_id; those can be
    computed easily enough using BIT32() but this is an area that
    could be improved on.

This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.

JIRA NVGPU-5421

Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
48f1da4dde gpu: nvgpu: Add bundle skip sequence in MIG mode
In MIG mode, 2D, 3D, I2M and ZBC classes are not supported
by GR engine. So skip those bundle programming sequence in MIG mode.

JIRA NVGPU-5648

Change-Id: I7ac28a40367e19a3e31e63f3e25991c0ed4d2d8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397912
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Tejal Kudav
71b005c1ef gpu: nvgpu: Enter Quiesce if GPU drops off the bus
Currently, we reboot the entire system using kernel_restart() if
the GPU registers become inaccessible due to GPU disappearing
from the bus. GPU hitting high temperatures is one of the reasons
we might end up in above scenario.
Replace kernel_restart() with quiesce call as a more graceful way
of notifying about GPU's unavailability. While entering quiesce
state, make sure we do not trigger any register accesses which are
bound to fail in this case.

Bug 2919899

Change-Id: Ia9d413e04c7d205752414ff3e892f055c4363cce
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398801
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
ae25924393 gpu: nvgpu: print enabled_flags after poweron
GPU enabled_flags indicate features supported by nvgpu.
Add nvgpu_print_enabled() to print GPU enabled_flags. Print flag value
after poweron complete to help during debug.
Add verbose function to print flag name and status if gpu_dbg_info is
set.

JIRA NVGPU-5838

Change-Id: I3b0ddb8c6872f4f3b6101050da087ff553c16f84
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2383531
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
010f818596 gpu: nvgpu: initialize gr struct in poweron path
struct nvgpu_gr is right now initialized during probe and from OS
specific code. To support multiple instances of graphics engine,
nvgpu needs to initialize nvgpu_gr after number of engine instances
have been enumerated in poweron path.
Hence move nvgpu_gr_alloc() to poweron path and after gr manager has
been initialized.

Some of the members of nvgpu_gr are initialized in probe path and they
too are in OS specific code. Move them to common code in
nvgpu_gr_alloc()

Add field fecs_feature_override_ecc_val to struct gk20a to store the
override flag read from device tree. This flag is later copied to
nvgpu_gr in poweron path.

Update tpc_pg_mask_store() to check for g->gr being NULL before
accessing golden image pointer.
Update tpc_fs_mask_store() to return error if g->gr is not initialized.
This path needs nvgpu_gr struct initialized. Also fix the incorrect
NULL pointer check in tpc_fs_mask_store() which breaks the write path
to this sysfs.

Jira NVGPU-5648

Change-Id: Ifa2f66f3663dc2f7c8891cb03b25e997e148ab06
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397259
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
2a6fcec078 gpu: nvgpu: add gr manager ops-2 and mig infra-2
This CL covers the code changes related to following support,
 - Enabled gr manager ops.
 - Added gr manager init/remove support.
 - Refactor in gpu instance config infra.
 - Refactor in gr syspipe gpcs config infra.

JIRA NVGPU-5645
JIRA NVGPU-5646

Change-Id: Ib2fab2796d76fe105fc5a08f2c5f9bfa36317f7c
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393550
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
08308bc936 gpu: nvgpu: rework pm resource reservation system
Current PM resource reservation system is limited to HWPM resources
only. And reservation tracking is done using boolean variables.

New upcoming profiler support requires reservation for all the PM
resources like SMPC and PMA stream. Using boolean variables is
not scalable and confusing. Plus the variables have to be replicated
on gpu server in case of virtualization.

Remove flag tracking mechanism and use list based approach to track
all PM reservations. Also, current HALs are defined on debugger object.
Implement new HALs in new pm_reservation object since it is really an
independent functionality.

Add new source file common/profiler/pm_reservation.c which implements
functions to reserve/release resources and to check if any resource
is reserved or not.
Add common/vgpu/pm_reservation_vgpu.c for vGPU which simply forwards
the request to gpu server.

Define new HAL object gops.pm_reservation and assign above functions
to below respective HALs :
g->ops.pm_reservation.acquire()
g->ops.pm_reservation.release()
g->ops.pm_reservation.release_all_per_vmid()

Last HAL above is only used for gpu server cleanup of guest OS.

Add below new common profiler functions that act as APIs to reserve/
release resources for rest of the units in nvgpu.
nvgpu_profiler_pm_resource_reserve()
nvgpu_profiler_pm_resource_release()

Initialize the meta data required for reservtion system in
nvgpu_pm_reservation_init() and call it during nvgpu_finalize_poweron.
Clean up the meta data before releasing struct gk20a.

Delete below HALs :
g->ops.debugger.check_and_set_global_reservation()
g->ops.debugger.check_and_set_context_reservation()
g->ops.debugger.release_profiler_reservation()

Bug 2510974
Jira NVGPU-5360

Change-Id: I4d9f89c58c791b3b2e63099a8a603462e5319222
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367224
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
lm
83cb8be984 nvgpu: linux: uapi: Add MIG new caps
1) In MIG mode, 2D, 3D, I2M and ZBC classes are not supported by
GR engine. NvGpu shall expose the HWCaps through
"struct nvgpu_gpu_characteristics".

2) NvGpu shall expose the following MIG related new caps through
"struct nvgpu_gpu_characteristics".
 * mig_enabled - Flag to indicate whether MIG is enabled/disabled.
 * gpu_instance_id - GPU instaces Id.
 * gr_instance_id - graphics execution unit id.
 * gr_sys_pipe_id - Sys pipe id of GR engine.

3) populate num_ppc_per_gpc - Pixel Processing cluster per GPC

4) populate max_veid_count_per_tsg - Maximum veid count per TSG

5) populate num_sub_partition_per_fbpa - Sub partition per FBPA.

JIRA NVGPU-5762

Change-Id: I06b5bcd3f568eb0b9c78c8fc6ce155b39aaeaba5
Signed-off-by: lm <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2352100
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
8c5972ac7f gpu: nvgpu: Move device de-init call
Move the device de-init call to when the gk20a struct is being freed;
the device list can live for as long as the gk20a struct does. This
will be a problem later, since the current location causes the device
structs to get freed and allocatoed over and over. That'll cause gross
corruption in the FIFO code when the engine_info struct is replaced
with pointers to the device structs.

JIRA NVGPU-5421

Change-Id: If4e08ea88dbcae7acd599e3fad29f72ece63b8e0
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361269
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
319520ff57 gpu: nvgpu: Add a new device manager unit
This adds a new device management unit in the common code responsible
for facilitating the parsing of the GPU top device list and providing
that info to other units in nvgpu.

The basic idea is to read this list once from HW and store it in a
set of lists corresponding to each device type (graphics, LCE, etc).
Many of the HALs in top can be deleted and instead implemented using
common code parsing the SW representation.

Every time the driver queries the device list it does so using a
device type and instance ID. This is common code. The HAL is responsible
for populating the device list in such a way that the driver can
query it in a chip agnostic manner.

Also delete some of the unit tests for functions that no longer
exist. This code will require new unit tests in time; those should be
quite simple to write once unit testing is needed.

JIRA NVGPU-5421

Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
4dcfbc19de gpu: nvgpu: Trigger quiesce on spurious FBPA intr
In Bug 200588835, the spurious FBPA interrupts are seen on couple
of boards. These interrupts were found to be EDC (Error detection
and Correction) interrupts which are triggered due to ECC errors.
The EDC registers are not exposed to the driver, so the interrupt
status register cannot be cleared; resulting in interrupt storm.
Also, it was concluded that only bad HW can cause this failure
scenario. So, in the ISR for FBPA interrupts, get the GPU into
quiesce state as we don't expect the GPU to be in usable state post
such unrecoverable errors.

Adapt the quiesce code for Linux build too.
1. On Linux, we cannot exit the nvgpu process after quiesce like we
   do on QNX. So, add nvgpu_disable_irqs() call to quiesce
   implementation which is done as part of process exit handler on
   QNX. Masking interrupts which is already done as part of quiesce
   would be sufficient in most cases, but to be fail-safe
   disable_irqs too.
3. Also, the IOCTL code looks at g->sw_ready, hence add
   nvgpu_start_gpu_idle() to set g->sw_ready to false along with
   setting NVGPU_DRIVER_IS_DYING = true.

We expect the nvgpu_sw_quiesce() call to finish before quiesce thread
wakes up from 50ms sleep. Hence, critical step like
nvgpu_start_gpu_idle() is added to nvgpu_sw_quiesce(), whereas the
somewhat redundant disable IRQs call is added to quiesce thread.

nvgpu_fifo_quiesce() was called twice by mistake; remove one of the
them.

Bug 2919899
Bug 200588835

Change-Id: I9beec688c2e1c0d8dfc1327ddf122684576f8684
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354537
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
fc5b45ea83 gpu: nvgpu: move init_ltc_support sequence
Currently, ltc fs_state is initialized during ltc init support. However,
ltc cbc_param and cbc_param2 registers do not seem to be providing
correct data if ltc.init_fs_state is called before fb.init_fs_state.
- Create fb.init_fb_support hal to initialize fb.
- Trigger init_fb_support before init_ltc_support.

Bug 2969956
Bug 2957808
JIRA NVGPU-4666

Change-Id: I54d697d27b9d9c6318c4ef459d215b6f82cd5571
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2345673
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
tkudav
957b19092f gpu: nvgpu: Enable Quiesce on all builds
Make Recovery and quiesce co-exist to support quiesce state
on unrecoverrable errors. Currently, the quiesce code is wrapped
under ifndef CONFIG_NVGPU_RECOVERY. Isolate the quiesce code from
recovery config, thereby enabling it on all builds.

On Linux, the hung_task checker(check_hung_uninterruptible_tasks()
in kernel/hung_task.c) complains that quiesce thread is stuck for
more than 120 seconds.

INFO: task sw-quiesce:1068 blocked for more than 120 seconds.

The wait time of more than 120 seconds is expected as quiesce
thread will wait until quiesce call is triggered on fatal
unrecoverable errors. However, the INFO print upsets the
kernel_warning_test(KWT) on Linux builds. To fix the failing
KWT, change the quiesce task to interruptible instead of
uninterruptible as checker only looks at uninterruptible tasks.

Bug 2919899
JIRA NVGPU-5479

Change-Id: Ibd1023506859d8371998b785e881ace52cb5f030
Signed-off-by: tkudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342774
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sami Kiminki
23cda4f4a9 gpu: nvgpu: add PDI for TU104 (Linux)
Add reporting for the per-device identifier (PDI) in the Linux GPU
characteristics. Implement PDI read for TU104.

Bug 2957580

Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Change-Id: I6ac0e4f74378564d82955b431d4c1fd6c0daeb13
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346933
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00