Commit Graph

983 Commits

Author SHA1 Message Date
Deepak Nibade
1a914b3699 gpu: nvgpu: support preemption mode API for specific GR instance
Get current GR instance pointer with nvgpu_gr_get_cur_instance_ptr() in
nvgpu_gr_setup_set_preemption_mode() and refer to other GR engine
specific data structures using this pointer.

Add/update debug prints to include gpu_dbg_gr flag.

Jira NVGPU-5648

Change-Id: I38f49b80c4969e9ae20ba1516898fa152786a984
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2419035
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Lakshmanan M
c0e2dc5b74 gpu: nvgpu: Add subctx programming for MIG
This CL covers the following code changes,
1) Added api to init inst_block for more than one subctxs.
2) Added logic to limit the subctx bind based on
   max. VEID count allocated to a gr instance.
3) Renamed nvgpu_grmgr_get_gr_runlist_id.

JIRA NVGPU-5647

Change-Id: Ifec8164a9e5f46fbd0538c3dd50e19ee63667a54
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418463
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
dd9298c959 gpu: nvgpu: move perf unit accesses to common.perf unit
Below HALs are implemented in common.gr unit, but they really belong
to common.perf unit since they access registers from perf unit.
gops.gr.init_hwpm_pmm_register()
gops.gr.get_num_hwpm_perfmon()
gops.gr.set_pmm_register()
gops.gr.reset_hwpm_pmm_registers()

Move them to common.perf unit, and update all the code accordingly
gops.perf.init_hwpm_pmm_register()
gops.perf.get_num_hwpm_perfmon()
gops.perf.set_pmm_register()
gops.perf.reset_hwpm_pmm_registers()

Add new HAL gops.gr.get_pm_ctx_buffer_offsets() and set it to
gr_gk20a_get_pm_ctx_buffer_offsets() for all chips.

Bug 2510974
Jira NVGPU-5360

Change-Id: Ib5e84ed5c8b6e72cc6923161e55fc2c3a6a4070e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418306
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
9652764b65 gpu: nvgpu: reset HWPM regs while binding HWPM in global mode
Add new HAL g->ops.gr.reset_hwpm_pmm_registers() to reset all HWPM regs
while binding HWPM in global mode in nvgpu_profiler_bind_hwpm()

Add below new HALs to get sys/gpc/fbp register list and count
g->ops.perf.get_hwpm_sys_perfmon_regs()
g->ops.perf.get_hwpm_gpc_perfmon_regs()
g->ops.perf.get_hwpm_fbp_perfmon_regs()

Auto generate all the HWPM regs in below arrays for gv11b/tu104
static const u32 hwpm_sys_perfmon_regs[]
static const u32 hwpm_gpc_perfmon_regs[]
static const u32 hwpm_fbp_perfmon_regs[]

Bug 2510974
Jira NVGPU-5360

Change-Id: I2ca5c04ed75c7b30ae942807bf018a24551d7ba0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414934
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
054fcf5635 gpu: nvgpu: Add gr VEID programming for MIG
This CL covers the following code changes,
1) Added api to get max VEID count per gpu/gr instance.
2) Added logic to limit the SW VEID bundle programming
   based on max. VEID count allocated to a gr instance.

JIRA NVGPU-5647

Change-Id: I5cbe98c505f81eaf29cc96707782f6350694e4c3
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417800
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
96dc116eed gpu: nvgpu: support context creation for specific GR instance
Get current GR instance pointer with nvgpu_gr_get_cur_instance_ptr() in
nvgpu_gr_setup_alloc_obj_ctx() and update all the code in this function
to use this GR instance pointer instead of globally accessing g->gr->*
data structures.

Add lots of GR engine specific debug prints in context creation path.

Jira NVGPU-5648

Change-Id: Ia8681d115ee88c5848621854f23e1cce4ff3deb2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2415239
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Lakshmanan M <lm@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sagar Kamble
df4d5f3c57 gpu: nvgpu: replace CONFIG_NVGPU_SUPPORT_TURING usage with CONFIG_NVGPU_DGPU
Also update the config check in pci_power.c for definitions of stubs
for pcie_attach|detach_controller callbacks.

Bug 200658918
Bug 200609273

Change-Id: Ie3f3b4de4cbcd520e54a3eb0590699c1a433e82d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414959
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Antony Clince Alex
7b7f42bd33 gpu: nvgpu: add gr ops find_priv_offset_in_buffer
Convert gr_gk20a_find_priv_offset_in_buffer into hal function
gops.gr.find_priv_offset_in_buffer. This is done in-order to facilitate
nvgpu-next to transition into a new ctxsw buffer layout.

Bug 2761598
Jira NVGPU-6008

Change-Id: Id294be628944daad7f9afa68214d98d87bbbf68c
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403708
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
221475f753 gpu: nvgpu: add profiler apis to manage PMA stream
Support new IOCTL to manage PMA stream meta data by adding below API
nvgpu_prof_ioctl_pma_stream_update_get_put()

Add nvgpu_perfbuf_update_get_put() to handle all the updates coming
from userspace and to pass all required information.

Add gops.perf.update_get_put() to handle all HW accesses required in
perf HW unit.

Add gops.perf.bind_mem_bytes_buffer_addr() to bind the available bytes
buffer while binding HWPM streamout.

Bug 2510974
Jira NVGPU-5360

Change-Id: Ibacc2299b845e47776babc081759dfc4afde34fe
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2406484
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
49c9f0c137 gpu: nvgpu: accept user vma size in vm init
Modify nvgpu_vm_init to accept low_hole, user_reserved and
kernel_reserved. This will simplify argument limit checks and make code
more legible.

JIRA NVGPU-5302

Change-Id: I62773dd7b06264a3b6cb8896239b24c49fa69f9b
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2394901
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
db20451d0d gpu: nvgpu: fix pmm chiplet offsets
gr_gv100_init_hwpm_pmm_register() and gr_gv100_set_pmm_register() right
now assume common chiplet stride for all sys/fbp/gpc and use common API
g->ops.perf.get_pmm_per_chiplet_offset() to get the stride.

Chiplet strides are same for all partitions only by chance, and future
chip might change that.

Hence add and use below 3 separate HALs to get appropriate strides.
g->ops.perf.get_pmmsys_per_chiplet_offset()
g->ops.perf.get_pmmgpc_per_chiplet_offset()
g->ops.perf.get_pmmfbp_per_chiplet_offset()

Also store sys/fbp/gpc perfmon count in struct gk20a after first query
instead of querying them again and again. Querying the counts from HW
is time consuming.

Bug 2510974
Jira NVGPU-5360

Change-Id: I186009221009780d561617c0cd6f535854db585f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2413108
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
6a69ea235e gpu: nvgpu: disable graphics specific init functions in MIG mode
MIG mode does not support graphics, ELPG, and use cases like TPC
floorsweeping. Skip all such initialization functions in common.gr
unit if MIG mode is enabled.

Set can_elpg to false if MIG mode is enabled.

Jira NVGPU-5648

Change-Id: I03656dc6289e49a21ec7783430db9c8564c6bf1f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2411741
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
7a937a6190 gpu: nvgpu: add debug logs for common.gr debugging
Add separate flag gpu_dbg_gr to enable common.gr specific debugging.
Add this flag to all the existing debug logs that use gpu_dbg_fn or
gpu_dbg_info for debugging. Also add many other debugging logs that
might be helpful in debugging.

Removing debug log in gv11b_gr_init_get_nonpes_aware_tpc() as it dumps
too much data that does not seem useful.

Batch all interrupt enable functions in gr_init_setup_hw() together for
readability.

Jira NVGPU-5648

Change-Id: I0b857650122cdb1f974b452d28c26e7f142baf61
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2411740
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sagar Kadamati
19e9b38385 gpu: nvgpu: added new argument to nvhost function
* Added struct gk20a as input argument, which help in chip detection
   nvgpu_nvhost_syncpt_unit_interface_get_byte_offset

Jira NVGPU-6068

Change-Id: I76f342b1c9f51632f72bc00f0328c7cac5991956
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410872
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
e0dd79cd43 gpu: nvgpu: rearch mc reset and enable hals
Remove current mc hals
- mc.reset()
- mc.enable()
- mc.disable()
- mc.reset_mask()
- mc.reset_engine()
- mc.reset_engine_enable()

Add new mc hals
- mc.enable_units(g, units, enable)
  > enable/disable given unit(s)
- mc.enable_dev(g, dev, enable)
  > enable/disable engine represented by given device pointer
- mc.enable_devtype(g, devtype)
  > enable/disable all engines of given devtype

Move common mc intr functions to common/mc/mc_intr.c.
Add below common mc functions
- nvgpu_mc_reset_units(g, units)
  > reset given logical OR of nvgpu unit bitmap
- nvgpu_mc_reset_dev(g, dev)
  > reset given single engine via dev
  > if engine is graphics, reset gpcs for nvgpu_next
- nvgpu_mc_reset_devtype(g, devtype)
  > reset all engines of given devtype
  > if devtype is graphics, reset gpcs for nvgpu_next

Bug 200648985
Bug 3109773

Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
bafeea3530 gpu: nvgpu: setup HW for each GR instance
Get number of SMs from GR instance specific nvgpu_gr_config pointer
instead of global SM count in below functions :
nvgpu_gr_fs_state_init()
gv11b_gr_init_sm_id_config()

Update nvgpu_gr_config_get_gpc_skip_mask() to return 0 in case gpc_index
is greater than available gpc_count. This is not MIG specific, but based
on code review possible even today for existing chips.
See gm20b_gr_init_pd_skip_table_gpc()

Update nvgpu_gr_get_override_ecc_val() to return GR instance specific
value.

Execute gr_init_setup_hw() for each GR instance.

Disable below failing unit tests:
nvgpu_gr_fs_state.test_gr_fs_state_error_injection
nvgpu_gr_init.test_gr_init_hal_config_error_injection

Jira NVGPU-5648

Change-Id: Ie8f1c0c304c634756786d85facf336a5c9ae8195
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2410702
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
3b746dce0c gpu: nvgpu: use a falcon flag instead of enabled bit
common.gr unit right now makes use of a capability bit
NVGPU_PMU_FECS_BOOTSTRAP_DONE to ensure the recovery path hits a
different routine. This is actually needless and a common check
cannot be used for all GR instances anyways.

Delete this capability bit. Add and use a new flag
coldboot_bootstrap_done added under struct nvgpu_gr_falcon

Jira NVGPU-5648

Change-Id: I46faea6f07cf054f17a3215d4cbbe0fc8a6382ae
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2409533
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Debarshi Dutta
db3764511b gpu: nvgpu: reorganize HAL for VGPU for GP10B and GV11B
Designated initializers with nested structs should not be used to
avoid a known problem in the qnx compiler that results in incorrect
values used for some fields.

Remove nested structs initialization and instead perform
runtime initialization for GP10B and GV11B VGPU HAL assignments.

Change-Id: I51e83aec16840abbddb542386d179a060d9521c9
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2407794
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Debarshi Dutta
cdacc2e2b2 gpu: nvgpu: reorganize HAL for gm20b, gp10b, tu104
Designated initializers with nested structs should not be used to
avoid a known problem in the qnx compiler that results in incorrect
values used for some fields.

Remove nested structs initialization and instead perform
runtime initialization for GM20B, GP10B and TU104 HAL assignments.

Change-Id: I6c94f85c7d6f7e279206bff7bd3535f56a377494
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401399
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Debarshi Dutta
9cb1fa8429 gpu: nvgpu: update HAL file for gv11b
Designated initializers with nested structs should not be used to
avoid a known problem in the qnx compiler that results in incorrect
values used for some fields.

5.1 Disclosure ID: NVGPU_RM-CODE-OIL-06

Remove nested structs initialization and instead perform
runtime initialization for GV11B's HAL.

Change-Id: Idd964c4e974db8707fc6cc8b1195a1365079c213
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401398
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Richard Zhao
78c45e889e gpu: nvgpu: vgpu: add gpu next hal & platform
- added compatible string and platform data
- added hal init
- mark gv11b_vgpu_probe global

Jira GVSCI-4645

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: If04261bf9421f23df065e26ffe998218a3ba5b73
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342377
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Debarshi Dutta
38ce6fa717 gpu: nvgpu: change unnamed structs to named structs
Following changes are made in this patch.
1) Change unnamed structs within gpu_ops to named structs
with the prefix gops_*.

2) Each named struct gops_ are moved into a separate gops specific file
under include/nvgpu/gops/

3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h
and all other dependent struct gops_* are included in this header.

4) Direct references to include/nvgpu/gops are removed from files as its enough
to include gk20a.h.

Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Richard Zhao
2dfa05ba50 gpu: nvgpu: fixes for tu104 usermode register write
- correct user register base l->usermode_regs. It should be bar0 address
plus .usermode.bus_base(). .bus_base() returns user register base offset
relative to bar0.
- correct .usermode.base for tu104. .base should be user register base
relative to virtual function base.
- use nvgpu_usermode_writel for tu104 ring doorbell.

Jira GVSCI-4650

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Iba98063c4a5cc007459319b0311e546ff10604a4
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403813
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
a2809088eb gpu: nvgpu: remove unnecessary hal gops.gr.gr_enable_hw()
gops.gr.gr_enable_hw() is a common function and not referred on vGPU.
Remove HAL pointer and directly use nvgpu_gr_enable_hw() instead.

Jira NVGPU-5648

Change-Id: Id031024ed01f9d890cffb5902cc433800810b219
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403548
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
smadhavan
c79a3dbc3a gpu: nvgpu: check only priv_sec_en fuse in fmodel
- On simulation, use --gpu_brom_args to set priv_sec_en fuse.
  wpr and auto_fetch_disable are set to expected values with
  "-vpr_load_from_pri_reg" during simulation launch

Bug 200638707

Change-Id: Ia440326a77a800bb739103bb0f0dbe06c3c741f2
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397510
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seema Khowala <seemaj@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
8cccb49bd2 gpu: nvgpu: collapse nvgpu_gr_prepare_sw into nvgpu_gr_alloc
common.gr unit exports a separate API nvgpu_gr_prepare_sw to
initialize some SW pieces required for nvgpu_gr_enable_hw().
A separate API is really unnecessary since same initialization
can be performed in nvgpu_gr_alloc().

Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw().
Initialize falcon and interrupt structures in loop from
nvgpu_gr_alloc().

Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to
common init path since netlist parsing need not be done from
common.gr unit. It just needs to happen before nvgpu_gr_enable_hw().

Also, trigger nvgpu_gr_free() from gr_remove_support() instead
of OS specific paths. Also remove nvgpu_gr_free() calls from
probe error paths since nvgpu_gr_alloc is no longer called in
probe path.

Move interrupt and falcon data structure free calls to nvgpu_gr_free().

Also remove corresponding unit testing code that tests
nvgpu_gr_prepare_sw() specifically.
Update some unit tests to initialize ecc counters and netlist.
Disable some unit tests that fail for reasons unknown.

Jira NVGPU-5648

Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
d020778c55 gpu: nvgpu: reserve pma stream for legacy profiler
Legacy profiler does not reserve PMA stream resource with PM
reservation system. Also, HWPM system reset is separately implemented
in membuf disable path. And it does not even restore perf unit SLCG
prod values.

Allcoate a dummy profiler object for debug session in perfbuf map
path. Free it in perfbuf unmap path.

This has advantage of synchronizing PMA stream reservation with new
profiler stack. And this also leverages HWPM system reset and SLCG
handling code during resource reservation.

Remove explicit HWPM reset from gops.perf.membuf_reset_streaming()
HALs

Bug 2510974
Jira NVGPU-5360

Change-Id: I54c5202b6251dea3d80a4dfc011e8a296339e07f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2399595
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
dfd9feace6 gpu: nvgpu: recover pbdma errors before ack
When a pbdma fault needs a channel teardown, do the recovery/teardown
process before acking the pbdma interrupt status back. Acking it causes
the hardware to proceed which could release fences too early before the
involved channel(s) have been found to be broken.

With these host copyengine interrupts, the teardown sequence is light
and proceeds even with the pbdma intr flag still set; there are no
engines to reset when these pbdma launch check interrupts happen. The
bad tsg is just disabled and the channels in it aborted.

A few unit tests are so heavily affected by this refactor that they
would need to be rewritten. They're not strictly needed at the moment,
so do only half of the rewrite: just delete them.

Bug 200611198

Change-Id: Id126fb158b6d05e46ba124cd426389046eedc053
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392669
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fba96fdc09 gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device
Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.

Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:

  - The enum_type that engine_info uses is defined in engines.h and
    has a bit of SW abstraction - in particular the GRCE type. The only
    place this seemed to be actually relevant (the IOCTL providing device
    info to userspace) the GRCE engines can be worked out by comparing
    runlist ID.
  - Addition of masks based on intr_id and reset_id; those can be
    computed easily enough using BIT32() but this is an area that
    could be improved on.

This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.

JIRA NVGPU-5421

Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
48f1da4dde gpu: nvgpu: Add bundle skip sequence in MIG mode
In MIG mode, 2D, 3D, I2M and ZBC classes are not supported
by GR engine. So skip those bundle programming sequence in MIG mode.

JIRA NVGPU-5648

Change-Id: I7ac28a40367e19a3e31e63f3e25991c0ed4d2d8b
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397912
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
2012a6b558 gpu: nvgpu: add profiler api to execute regops
Implement new API nvgpu_prof_ioctl_exec_reg_ops() to support regops on
new profiler objects.

Add two new staging buffers to hold regops copied from userspace, and
to convert and execute regops in common code.
Buffers are allocated and released along with the profiler object.

New API will implements this :
-  copy regops data in chunks of 4K from userspace
- store them in staging buffer
- convert the new regop struct into common regop struct and also
  copy the content into second staging buffer
- trigger gops.regops.exec_regops() with second staging buffer as
  operation pointer
- convert common regop struct back into new regop struct and copy
  back to userspace

Export bunch of helper functions from ioctl_dbg.h. e.g.
nvgpu_get_regops_op_values_common()

Update regop execution code to skip regop execution if regop status
is not valid. This is only possible when userspace requests for
CONTINUE_ON_ERROR mode.

Add more documentation to some of the fields in UAPI header.

Note that maximum atomic operations reported by new API are same
as legacy API and are incorrect. This will be fixed up in upcoming
patches.

Bug 2510974
Jira NVGPU-5360

Change-Id: I9f82052b22143aec33f6e778c0784386744b699e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2394208
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
2a6fcec078 gpu: nvgpu: add gr manager ops-2 and mig infra-2
This CL covers the code changes related to following support,
 - Enabled gr manager ops.
 - Added gr manager init/remove support.
 - Refactor in gpu instance config infra.
 - Refactor in gr syspipe gpcs config infra.

JIRA NVGPU-5645
JIRA NVGPU-5646

Change-Id: Ib2fab2796d76fe105fc5a08f2c5f9bfa36317f7c
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393550
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
6df58938ad gpu: nvgpu: gp10b: add beta cb default size define
Currently, get_attrib_cb_default_size() return value is hardcoded with
recommended beta cb default size value. Add a macro for the fixed
buffer size and add description.

JIRA NVGPU-5302

Change-Id: If415e8bc6bc15b2d2ed6875a49a1a23bbe3c740a
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2375623
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Richard Zhao
3f81f1952d gpu: nvgpu: vgpu: fix NVGPU_GPU_IOCTL_CLEAR_SM_ERRORS crash
vgpu currently does not support suspend gpu context and stall
the whole gpu, because of safety concerns. So vgpu does not set
HALs that are related to on-gpu context.

This change unset gops.gr.clear_sm_errors. And the ioctl
NVGPU_GPU_IOCTL_CLEAR_SM_ERRORS will return -ENOSYS.

Bug 200469468

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ie578495e175ad898994fe1c4184a0243d5541cd3
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395598
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Antony Clince Alex
e02ea5456b gpu: nvgpu: tu104: update offset calculation of gpccs ctxsw'ed priregs
The ctxsw'ed registers have been moved to a separate list starting from
nvgpu_next chip onwards. Hence, update gr_tu104_get_offset_in_gpccs_segment
function to account for ctxsw'ed registers in nvgpu_next.

Introduce functions: nvgpu_netlist_get_gpc_ctxsw_regs_count to compute the
number of ctxsw'ed gpc registers.

Bug 2916121

Change-Id: I69fcd8df883af62999d0fa8d1f9a398f8f5d7454
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2394684
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Alex Waterman
7e99a68e34 gpu: nvgpu: Add basic recovery debugging messages
Add basic recovery messages that describe what's happening during
the recovery process. Hide this under a new recovery specific GPU
debug log flag. The logs look like:

[  276.000733] nvgpu: 17000000.gv11b                gv11b_fifo_recover:162  [DBG]  REC | Recovery starting
[  276.000737] nvgpu: 17000000.gv11b                gv11b_fifo_recover:163  [DBG]  REC |   ID      = 0
[  276.000741] nvgpu: 17000000.gv11b                gv11b_fifo_recover:164  [DBG]  REC |   id_type = TSG
[  276.000745] nvgpu: 17000000.gv11b                gv11b_fifo_recover:165  [DBG]  REC |   rc_type = MMU fault
[  276.000748] nvgpu: 17000000.gv11b                gv11b_fifo_recover:166  [DBG]  REC |   Engine bitmask: 0x0
[  276.000753] nvgpu: 17000000.gv11b                gv11b_fifo_recover:170  [DBG]  REC | Acquiring engines_reset_mutex
[  276.000756] nvgpu: 17000000.gv11b                gv11b_fifo_recover:174  [DBG]  REC | Acquiring runlist_lock for active runlists
[  276.000764] nvgpu: 17000000.gv11b                gv11b_fifo_recover:185  [DBG]  REC | Channels bound to this TSG:
[  276.000767] nvgpu: 17000000.gv11b                gv11b_fifo_recover:190  [DBG]  REC |   0 | chid 511
[  276.001098] nvgpu: 17000000.gv11b                gv11b_fifo_recover:222  [DBG]  REC | PBDMA   Bitmask: 0x1
[  276.001102] nvgpu: 17000000.gv11b                gv11b_fifo_recover:228  [DBG]  REC | Runlist Bitmask: 0x1
[  276.001106] nvgpu: 17000000.gv11b                gv11b_fifo_recover:240  [DBG]  REC | Disabling RL scheduler now
[  276.001126] nvgpu: 17000000.gv11b                gv11b_fifo_recover:246  [DBG]  REC | Disabling CG/PG now
[  276.189348] nvgpu: 17000000.gv11b                gv11b_fifo_recover:259  [DBG]  REC | Clearing PBDMA_FAULTED, ENG_FAULTED in CCSR register
[  276.191972] nvgpu: 17000000.gv11b                gv11b_fifo_recover:264  [DBG]  REC | Disabling TSG
[  276.191983] nvgpu: 17000000.gv11b                gv11b_fifo_recover:279  [DBG]  REC | Preempting runlists for RC
[  276.192001] nvgpu: 17000000.gv11b                gv11b_fifo_recover:288  [DBG]  REC | Polling for TSG to be off PBDMA
[  276.192012] nvgpu: 17000000.gv11b                gv11b_fifo_recover:296  [DBG]  REC |   Done!
[  276.192016] nvgpu: 17000000.gv11b                gv11b_fifo_recover:306  [DBG]  REC | Resetting relevant engines
[  276.192020] nvgpu: 17000000.gv11b                gv11b_fifo_recover:318  [DBG]  REC |   Engine bitmask for RL 0: 0xd
[  276.192024] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=0
[  276.209567] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.209572] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=2
[  276.214290] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.214295] nvgpu: 17000000.gv11b                gv11b_fifo_recover:323  [DBG]  REC |   > Restting engine: ID=3
[  276.224986] nvgpu: 17000000.gv11b                gv11b_fifo_recover:347  [DBG]  REC |     Done!
[  276.225013] nvgpu: 17000000.gv11b                gv11b_fifo_recover:377  [DBG]  REC | Re-enabling runlists
[  276.225034] nvgpu: 17000000.gv11b                gv11b_fifo_recover:383  [DBG]  REC | Re-enabling CG/PG
[  276.225134] nvgpu: 17000000.gv11b                gv11b_fifo_recover:394  [DBG]  REC | Releasing engines reset mutex

Note the "REC |" which lets one easily do:

  $ dmesg | grep "REC |"

To get a clear ubobstrructed view of the recovery progress in the dmesg
log.

JIRA NVGPU-5606

Change-Id: I183f2b5ac54edc60ee894a82111723e27aa5c46b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392991
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
mkumbar
918fa1a658 gpu: nvgpu: PMU NS ucode boot update
Removed gpmu_ucode.bin usage by fetching PMU ucode descriptor
and image from respective files for NS boot.

JIRA NVGPU-5183

Change-Id: I597c5dd17b4a58603f550b32980d7d0ca9624aed
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376448
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
7c1c533a4a gpu: nvgpu: Don't disable coalesce for gv11b+
Stop enabling LG and SU coalesce on gv11b and tu104. This is
no longer required.

Bug 1951653
Bug 1801194

Change-Id: I412be2caae6b841d5387ae5a153d38e49d3d61bc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392901
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
6daa0636d1 gpu: nvgpu: rework regops execution API
Rework regops execution API to accomodate below updates for new
profiler design

- gops.regops.exec_regops() should accept TSG pointer instead of
  channel pointer.
- Remove individual boolean parameters and add one flag field.

Below new flags are added to this API :
NVGPU_REG_OP_FLAG_MODE_ALL_OR_NONE
NVGPU_REG_OP_FLAG_MODE_CONTINUE_ON_ERROR
NVGPU_REG_OP_FLAG_ALL_PASSED
NVGPU_REG_OP_FLAG_DIRECT_OPS

Update other APIs, e.g. gr_gk20a_exec_ctx_ops() and validate_reg_ops()
as per new API changes.

Add new API gk20a_is_tsg_ctx_resident() to check context residency
from TSG pointer.

Convert gr_gk20a_ctx_patch_smpc() to a HAL gops.gr.ctx_patch_smpc().
Set this HAL only for gm20b since it is not required for later chips.
Also, remove subcontext code from this function since gm20b does not
support subcontext.

Remove stale comment about missing vGPU support in exec_regops_gk20a()

Bug 2510974
Jira NVGPU-5360

Change-Id: I3c25c34277b5ca88484da1e20d459118f15da102
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389733
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
a73b5d3c6f gpu: nvgpu: use smpc global mode capability check
In nvgpu_dbg_gpu_ioctl_smpc_ctxsw_mode(), check if SMPC global mode
capability is supported instead of checking for the function pointer.

Enable the capability only for Turing since pre-Turing GPUs don't
support it.

Bug 2510974
Jira NVGPU-5360

Change-Id: I352fb2a91b836cd8ef727966a53a28255d8ea834
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2389653
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Seema Khowala
9ea21459b4 gpu: nvgpu: pascal+: trigger_suspend, wait_for/resume_from _pause set to NULL
- NvRmGpuDeviceSetSmDebugMode uses regops interface.
- NvRmGpuDeviceTriggerSuspend, NvRmGpuDeviceWaitForPause,
  and  NvRmGpuDeviceResumeFromPause should return error on Pascal+. Use
  regops interface to suspend/resume.
- On non-cilp devices(Maxwell), NvRmGpuDeviceTriggerSuspend,
  NvRmGpuDeviceWaitForPause, NvRmGpuDeviceResumeFromPause and
  NvRmGpuDeviceSetSmDebugMode are used when debugger(including coredump,
  memcheck) is attached or when CUDA application uses a syscall that
  requires traphandler(assert, cnp).

Bug 2558022
Bug 2559631
Bug 2706068
JIRA NVGPU-5502

Change-Id: I9eb2ab0c8c75c50f53523d8bf39c75f98b34f3f0
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2376159
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
c99afa1766 gpu: nvgpu: add gr manager and mig infra
This CL covers the code changes related to following support,
 - Added gr manager infra.
 - Added grmgr_gops infra.
 - Added mig infra.
 - Added log mask for MIG verbose support.

JIRA NVGPU-5645
JIRA NVGPU-5646

Change-Id: Iec356e08e6cfee86ad9f59fdf6cfee9c38231359
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385111
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Deepak Nibade
969b901999 gpu: nvgpu: create device/context profiler dev nodes
Create new dev nodes for device and context profilers. Example of dev
nodes on iGPU
/dev/nvhost-prof-dev-gpu - device scope profiler
/dev/nvhost-prof-ctx-gpu - context scope profiler

Add below APIs to open/close above dev nodes :
nvgpu_prof_dev_fops_open()
nvgpu_prof_ctx_fops_open()
nvgpu_prof_fops_release()

Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these
dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG
to profiler objects.

Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from
file descriptor. Also store profiler object pointer into TSG struct.

Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104.
Note that this is not yet enabled for vGPU.
Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this
will take longer to support.

Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can
explicitly unbind the context and release the resources before closing
the profiler descriptor.

Add context_init flag to profiler object for book keeping.

Bug 2510974
Jira NVGPU-5360

Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
fb95b7efa7 gpu: nvgpu: move nvgpu_func io functions to common
- Move nvgpu_func_writel and nvgpu_func_readl to common io file.
- Add func.get_full_phys_offset() hal to gk20a_gops structure.
- Add tu104_func_get_full_phys_offset() for tu104.

JIRA NVGPU-5363

Change-Id: I2aa13862a37f48321510882053256e16ef3f7377
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2383483
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Antony Clince Alex
dd82cdca97 gpu: nvgpu: introduce new ctxsw_addr_type LTS_MAIN
The LTS_MAIN will be used by nvgpu-next chips.

In addition, update gops_ltc.h to include nvgpu_next_gops_ltc.h and
nvgpu_next_gops_ltc_intr.h

Jira NVGPU-5352
Bug 200605474
Bug 200608785

Change-Id: Id77ddfc4c1aa2f93e98e05cfd8645f7ffb8f41c8
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2366350
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
0f5818b89e gpu: nvgpu: Condition debug dump on recovery profiling
If recovery sequence profiling is enabled skip the debug dump that
happens during an MMU fault. This prevents the debug dump from
dominating the time spent by the recovery sequence. The debug dump
is severly limited in speed by the (lack of) UART bandwidth.

JIRA NVGPU-5606

Change-Id: Ifc7c326d33d9115d58b13c0fa42ec4bb7acb3075
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382591
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
1bcdc306a0 gpu: nvgpu: Add gv11b recovery profiling
Add some basic profiling to the gv11b recovery sequence. This captures
the high level events. Subsequent patches start to dig into the
subsections in more detail.

JIRA NVGPU-5606

Change-Id: I488a448ca1cbf961651588e24685e2a5b4420c44
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368302
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
ab2b0b5949 gpu: nvgpu: Set unserviceable flag early during RC
During recovery, we set ch->unserviceable at the end after we preempt
the TSG and reset the engines. It might be too late and user-space
might submit more work to the broken channel which is not desirable.
Move setting this unserviceable flag right at the start
of recovery sequence.
Another thread doing a submit can still read the unserviceable flag
just before it is set here, leaving that submit stuck if recovery
completes before the submit thread advances enough to set up a post
fence visible for other threads. This could be fixed with a big lock
or with a double check at the end of the submit code after the job
data has been made visible.
We still release the fences, semaphore and error notifier wait queues
at the end; so user-space would not trigger channel unbind while
channel is being recovered.

Also, change the handle_mmu_fault APIs to return void as the
debug_dump return value is not used in any of the caller APIs.

JIRA NVGPU-5843

Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
shashank singh
650ce63466 gpu: nvgpu: make iommu bit getting hal NULL for turing
For dgpu iommu bit is causing smmu fault when sysmem is accessed
via pcie. Since pcie is always having iommu enabled on linux that
creates issue for linux. So don't set the iommu bit for dgpu in any
case.

Bug 200640033

Change-Id: I38556779db94289b0656cdb53d417e4ff83ed426
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384653
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Dinesh
d0087f3ad8 gpu: nvgpu: Support for runlist_max_supported
nvgpu_next needs support for max_runlist_supported by litter
value. So the function is changed to support.

JIRA NVGPU-5534

Change-Id: I097f6343295049532c46904316314dc82092a46b
Signed-off-by: Dinesh <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382882
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00