Commit Graph

141 Commits

Author SHA1 Message Date
Debarshi Dutta
8cb147aa88 gpu: nvgpu: add a soft dependency on podgov module
The present implementation of podgov driver doesn't
export any symbols and as a result, the dependency
between NVGPU driver and podgov is not established
by depmod. Fix that by adding a soft dependency.

MODULE_SOFTDEP("pre: governor_pod_scaling");

This allows loading the podgov governor before
nvgpu driver.

Bug 3674235

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Id1959639399042f488cdaa30372feb65d8f21aaa
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2740446
(cherry picked from commit e4b3499850)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2741188
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-08-12 11:01:32 -07:00
Sagar Kamble
c99819ffd8 gpu: nvgpu: acquire platforms clocks on floorsweeping gpc
bpmp will floorsweep GPCs as per parameters to tpc_pg_mask sysfs.
While doing that corresponding GPC clocks are also disabled.
nvgpu should re-initialize the clocks every time the
GPC/TPC pg_masks are passed to bpmp mrq.

Also print error when clk_prepare_enable fails.

Introduce platform->clks_lock to protect access to platform->clks
and platform->num_clks done from unrailgate/railgate and bpmp
mrq set calls from sysfs.

Acquire static_pg_lock in railgate path to synchronize railgate
with sysfs.

Bug 3688506

Change-Id: I3203d78b87289e7a847d78b3117e2d3119be3425
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2738920
(cherry picked from commit 28ddb0996f)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2741029
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-07-08 06:08:20 -07:00
Krishna Reddy
961925be02 Revert "gpu: nvgpu: correct usage for gk20a_busy_noresume"
This reverts commit c1ea9e3955.

Reason for revert: ap_vulkan, ap_opengles, ap_mods tests failures
Bug 3661058
Bug 3661080 
Bug 3659004 

Change-Id: I929b5675a4fb0ddc8cbf3eeefc982b4ba04ddc59
Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718996
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
2022-05-27 14:49:26 -07:00
Debarshi Dutta
c1ea9e3955 gpu: nvgpu: correct usage for gk20a_busy_noresume
Background: In case of a deferred suspend implemented by gk20a_idle,
the device waits for a delay before suspending and invoking
power gating callbacks. This helps minimize resume latency for any
resume calls(gk20a_busy) that occur before the delay.

Now, some APIs spread across the driver requires that if the device
is powered on, then they can proceed with register writes, but if its
powered off, then it must return. Examples of such APIs include
l2_flush, fb_flush and even nvs_thread. We have relied on
some hacks to ensure the device is kept powered on to prevent any such
delayed suspension to proceed. However, this still raced for some calls
like ioctl l2_flush, so gk20a_busy() was added (Refer to commit Id
dd341e7ecbaf65843cb8059f9d57a8be58952f63)

Upstream linux kernel has introduced the API pm_runtime_get_if_active
specifically to handle the corner case for locking the state during the
event of a deferred suspend.

According to the Linux kernel docs, invoking the API with
ign_usage_count parameter set to true, prevents an incoming suspend
if it has not already suspended.

With this, there is no longer a need to check whether
nvgpu_is_powered_off(). Changed the behavior of gk20a_busy_noresume()
to return bool. It returns true, iff it managed to prevent
an imminent suspend, else returns false. For cases where
PM runtime is disabled, the code follows the existing implementation.

Added missing gk20a_busy_noresume() calls to tlb_invalidate.

Also, moved gk20a_pm_deinit to after nvgpu_quiesce() in
the module removal path. This is done to prevent regs access
after registers are locked out at the end of nvgpu_quiesce. This
can happen as some free function calls post quiesce  might still
have l2_flush, fb_flush deep inside their stack, hence invoke
gk20a_pm_deinit to disable pm_runtime immediately after quiesce.

Kept the legacy implementation same for VGPU and
older kernels

Jira NVGPU-8487

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I972f9afe577b670c44fc09e3177a5ce8a44ca338
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715654
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-25 04:59:46 -07:00
Rajesh Devaraj
4652f96a6f gpu: nvgpu: add polling for back-to-back error reporting in av+l
When an error is reported to Safety_Services, it will be cleared
at FSI and reported to SEH (System Error Handler). Since MISC_EC
interface provides only one register for error reporting, there
is a need to poll the status of previously reported error before
reporting the next error. For this purpose, this patch adds logic
to perform polling using epl_get_misc_ec_err_status(), in AV+L.

JIRA NVGPU-8094
Bug 200729736

Change-Id: Ia01a2fc42a7ce586b7965a82c90027a9a2dd252b
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2684141
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-21 10:51:39 -07:00
Debarshi Dutta
cb70e86ac1 gpu: nvgpu: Allow SC7 suspend/resume
Allow SC7 suspend/resume for platforms even if runtime pm
is disabled.

Currently, nvgpu can disable runtime pm by setting railgate_init
field to false for platform_{gk20a/gv11b/ga10b) files. This is done by
taking extra reference count in the PM Framework.

However, device suspend would still fail. Fix this by checking
for NVGPU_CAN_RAILGATE and removing the additional reference count
taken as mentioned above. Take the extra refcount back at
the end of the resume path.

Bug 3458643

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I413e09e2f9f380d78c0ce30196591e9c5b7544f3
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668567
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-09 21:04:59 -08:00
Ramesh Mylavarapu
9302b2efee gpu: nvgpu: gsp units separation
Separated gsp unit into three unit:
- GSP unit which holds the core functionality of GSP RISCV core,
  bootstrap, interrupt, etc.
- GSP Scheduler to hold the cmd/msg management, IPC, etc.
- GSP Test to hold stress test ucode specific support.

NVGPU-7492

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I12340dc776d610502f28c8574843afc7481c0871
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660619
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:38:21 -08:00
Sagar Kamble
535a27411a gpu: nvgpu: fix allocator debugfs deinit
Allocator (bitmap, buddy, page) debugfs files are not cleaned up when
the allocators are destroyed. This leads to warning logs from nvgpu
like below:

[21073.493000] debugfs: File 'gk20a_as_17' in directory 'allocators' already present!
[21073.493026] debugfs: File 'gk20a_as_17-sys' in directory 'allocators' already present!

Remove the per-allocator debugfs node when destroying an allocator in
runtime.

While at this, add missing nvgpu_allocator locking to the function
nvgpu_bitmap_alloc_destroy. And create nop functions for the
functions nvgpu_init_alloc_debug and nvgpu_fini_alloc_debug
when CONFIG_DEBUG_FS is not defined to avoid adding the
CONFIG checks at multiple places.

Move gk20a_debug_deinit to the end of gk20a_free_cb called in nvgpu_put
as that tears down all debugfs entries. Allocator destroy happens as
part of nvgpu_put call and it can lead to invalid debugfs dentry
access if gk20a_debug_deinit is called before it.

Bug 3481097

Change-Id: I8a66bcf6ade7e5707f9207c78a54d12d7bd94c02
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648012
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-07 06:28:53 -08:00
Sagar Kamble
a2f4fdf190 gpu: nvgpu: enable CONFIG_NVGPU_VPR for all kernels
VPR functionality is split up as static VPR and VPR resize. Static VPR
is supported on all kernels. VPR resize is enabled only on 4.9 kernel.

Enable CONFIG_NVGPU_VPR unconditionally in Linux Makefile. Compile
VPR resize related functionality in nvgpu under the check for
Linux kernel version using new define NVGPU_VPR_RESIZE_SUPPORTED.

JIRA LS-458
Bug 200754700

Change-Id: Ib92f7f1b95afc6c69fbdf33354459c147337350c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2647619
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-05 20:57:32 -08:00
Konsta Hölttä
d086c678fd gpu: nvgpu: add domain scheduler worker
Move away from the prototype call in channel wdt worker and create a
separate worker thread for the domain scheduler. The details of runlist
domains are still encapsulated in the runlist code; the domain scheduler
controls when to switch domains. Switching happens based on domain
timeslices or when the current domain is deleted.

The worker thread is paused on railgate and spun back on poweron. The
scheduler data was also left dangling, so fix that by deinitializing all
nvs-related when gk20a_remove_support() is called. The runlist domains
already get freed as part of fifo removal.

Jira NVGPU-6427

Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-14 06:26:16 -08:00
Tejal Kudav
3b1bbc7259 gpu: nvgpu: Remove GP10b support
Starting 6.0.2.0, deprecate support for GP10b. Delete GP10b specific
things such as platform data, ucodes, regops allowlist, cg/pg register
list. Per unit specific gp10b code cleanup will be done later.

Bug 3431142

Change-Id: I4d5fd9ad8c6ee53845df3b6b2298af64d76e86c3
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2630946
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-03 08:43:52 -08:00
Sagar Kamble
41df3e17a7 gpu: nvgpu: fix nvgpu remove sequence
While removing the nvgpu module, all gpu unmaps should happen before
removing the PMU support as ELPG_MS accesses pmu pg structure and
ELPG_MS is disabled/enabled while accessing TLB or cache flush.

nvgpu_fb_vab_teardown_hal and mmu_fault.info_mem_destroy do gpu
unmaps. They were executed post removal of PMU support. Fix the
sequence.

Bug 3448630

Change-Id: I44925c313c625a2d0f297d1367d69069b3deacef
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632490
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-26 08:47:52 -08:00
Sagar Kamble
83dbb711bb gpu: nvgpu: make buffer metadata support independent of compression
Earlier, buffer metadata support was made dependent on compression.
However that is not required.

Update the enabled flag NVGPU_SUPPORT_BUFFER_METADATA setup for
various hals. Enable it for all from linux characteristics init.

Update REGISTER_BUFFER and GET_BUFFER_INFO ioctls to seggregate
the compile/runtime compression functionality.

If compression is disabled, return error in case comptags are
required else don't fail the REGISTER_BUFFER ioctl.

Bug 200767700

Change-Id: I3850ccc879f180c97b830fb3d652c094b9d28a5b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2614378
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-11-12 07:30:33 -08:00
Jon Hunter
aa44d0e041 gpu: nvgpu: Fix build for Linux v5.16-rc1
Building NVGPU against the current upstream mainline kernel is failing
and errors such as the following are seen.

ERROR: modpost: module nvgpu uses symbol dma_buf_map_attachment from
	namespace DMA_BUF, but does not import it.
ERROR: modpost: module nvgpu uses symbol dma_buf_detach from namespace
	DMA_BUF, but does not import it.
ERROR: modpost: module nvgpu uses symbol dma_buf_vmap from namespace
	DMA_BUF, but does not import it.

Following upstream commit 16b0314aa746 ("dma-buf: move dma-buf symbols
into the DMA_BUF module namespace"), it is now necessary to import the
DMA_BUF module namespace into the NVGPU driver to fix this.

Change-Id: I901b74cea692a5e0d66a190d01fe74a55aaf4431
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621641
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-12 02:51:30 -08:00
Konsta Hölttä
f4ec400d5f gpu: nvgpu: simplify nvgpu_timeout_init
nvgpu_timeout_init() returns an error code only when the flags parameter
is invalid. There are very few possible values for flags, so extract the
two most common cases - cpu clock based and a retry based timeout - to
functions that cannot fail and thus return nothing. Adjust all callers
to use those, simplfying error handling quite a bit.

Change-Id: I985fe7fa988ebbae25601d15cf57fd48eda0c677
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2613833
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-26 13:47:32 -07:00
Tejal Kudav
243e52a771 gpu: nvgpu: ga10b: Disable compression on Av+L/Q
GPU HW expects physically contiguous addresses when clearing
the compression bit store in memory. Currently on hypervisor setup,
the DMA_ATTR_FORCE_CONTIGUOUS flag ensures contiguous IPA, but it
is not possible to ensure contiguous physical memory.Disable
compression on virtualized environments until physically contiguous
memory is feasible.

Buffer Metadata support is dependent on compression support.
Move the initialization of NVGPU_SUPPORT_BUFFER_METADATA flag to
common code where NVGPU_SUPPORT_COMPRESSION is initialized.

Bug 200780546

Change-Id: Id94bffc878e275a80948880f0475162d0bb4ddae
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2607830
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-11 17:01:06 -07:00
Divya
ae2d561c48 gpu: nvgpu: add platform support for Static PG
- Add support for taking static PG config values
  from DT nodes
- Check those values against valid set of values
  for GPC, TPC and FBP
- Store valid values in g->gpc_pg_mask, g->fbp_pg_mask
  and g->tpc_pg_mask[] array.

Bug 200768322
JIRA NVGPU-6433

Change-Id: Ifc87e7d369034b1daa13866bc16a970602514bf6
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2594802
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-25 15:47:25 -07:00
Debarshi Dutta
9328f057a7 gpu: nvgpu: fix use-after-free use case of CE APP.
The following issue is reported when running sudo modprobe -r nvgpu

[  134.066392] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[  134.066428] Mem abort info:
[  134.066431]   ESR = 0x96000004
[  134.066434]   EC = 0x25: DABT (current EL), IL = 32 bit
[  134.066450] [0000000000000058] pgd=0000000000000000, p4d=0000000000000000
[  134.066459] Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP

[  134.066639] pc : nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu]
[  134.066847] lr : nvgpu_cic_rm_wait_for_stall_interrupts+0x74/0xd0 [nvgpu]
[  134.067043] sp : ffff80001971ba80
[  134.067046] x29: ffff80001971ba80 x28: ffff000093b0da00
[  134.067054] x27: 0000000000000000 x26: ffff80001c28b990
[  134.067061] x25: ffff00008cd01000 x24: 0000000000000bb8
[  134.067067] x23: 0000000000000000 x22: ffff0000915b0000
[  134.067073] x21: ffff000093b0da00 x20: ffff0000915b0000
[  134.067079] x19: ffff0000915b0000 x18: 0000000000000036
[  134.067085] x17: 0000000000000000 x16: 0000000000000000
[  134.067091] x15: ffff8000126b5fd8 x14: 7373616c633d4d45
[  134.067097] x13: ffff8000098abef0 x12: 0000000000000000
[  134.067102] x11: ffff8000098ab5a0 x10: ffff8000098abef8
[  134.067108] x9 : ffff80001010e844 x8 : ffff80001971ba48
[  134.067115] x7 : 2222222222222222 x6 : ffff000093b0da00
[  134.067122] x5 : ffff8000098b1fd8 x4 : 0000000000000000
[  134.067127] x3 : 0000000000000000 x2 : 0000000000000000
[  134.067133] x1 : 0000000000000000 x0 : 0000000000000000
[  134.067138] Call trace:
[  134.067140]  nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu]
[  134.067328]  nvgpu_cic_rm_wait_for_deferred_interrupts+0x20/0xb0 [nvgpu]
[  134.067517]  nvgpu_channel_deferred_reset_engines+0x29c/0x920 [nvgpu]
[  134.067714]  nvgpu_channel_close+0x18/0x20 [nvgpu]
[  134.067904]  nvgpu_init_pramin+0x2ac/0x350 [nvgpu]
[  134.068092]  nvgpu_ce_app_destroy+0x94/0xe0 [nvgpu]
[  134.068279]  nvgpu_put+0x90/0x120 [nvgpu]
[  134.068465]  nvgpu_pci_shutdown+0x29c/0x18a0 [nvgpu]
[  134.068655]  pci_device_remove+0x44/0xe0
[  134.068665]  device_release_driver_internal+0x114/0x1f0
[  134.068701]  driver_detach+0x54/0xe0
[  134.068709]  bus_remove_driver+0x70/0x120
[  134.068733]  driver_unregister+0x34/0x60

The above issue occurs due to freeing of CIC resources earlier than
dependent users of interrupts e.g. CDE, CE etc.

As a solution, move CIC deinit sequence to end of nvgpu_put.
This handles deinit properly for VGPU/IGPU/DGPU.

Bug 200763510

Change-Id: I696e31d5e03a9468cccfe710048000dbf7cf0269
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592063
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-16 21:45:43 -07:00
Sahil Mukund Patki
794d1edbe4 gpu: nvgpu: Fix debugfs compilation errors
The function "nvgpu_ce_debugfs_init" is declared in "debug_ce.h".
This file is only compiled when CONFIG_DEBUG_FS is enabled. So
any accesses to this function result in compilation errors when
CONFIG_DEBUG_FS is disabled.

This patch fixes the errors by guarding all accesses to the above
mentioned function by CONFIG_DEBUG_FS.

Bug 200755555

Change-Id: Ie566413913c4a72b10b87c3285d1263d1c811074
Signed-off-by: Sahil Mukund Patki <spatki@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2591304
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-15 09:16:22 -07:00
Debarshi Dutta
79ab0ba6c4 gpu: nvgpu: remove sudo restrictions on gpu nodes.
When SMC modes are enabled, devices are created with sudo-only
access permissions. Those permissions are relaxed to allow non-sudo
processes to allow job submission.

Also, allow only root users to poweroff explicitely via the device
power node.

Bug 3374078

Change-Id: Ieb869399c3ada3588708cf2bc99a580414023cb7
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2590584
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-15 09:15:49 -07:00
Sagar Kadamati
dd9b4364aa gpu: nvgpu: add nvgpu-next infrastructure
* As of now, working on multiple chip bringup in nvgpu-next repo has
   an issue because we end with losing control on source code (hard to
   find which part of the code belongs to which chip) and it's valuable
   history this affects chip migration on release.

 * To support multiple chip bringup simultaneously, we need new
   guidelines to avoid losing control on source code and make migration
   easier. This change adds links to nvgpu-next repo.

 * Updated return code to ENODEV for consistency
 * Updated ACR unittest to work with ENODEV return code

NOTE:
     These are the initial set of infrastructure changes, guidelines
     will evolve, and source code will get updated accordingly.

     Based on future chip features, Which part of the source code falls
     under nvgpu-next repo is decided.

JIRA NVGPU-6574

Change-Id: I81827e35d189c55554df00e255b527a4473e0338
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556793
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-08 06:50:38 -07:00
Sagar Kamble
ed16377983 gpu: nvgpu: allocate comptags and store metadata in REGISTER_BUFFER ioctl
To enable userspace query about comptags allocation status of a buffer,
comptags are to be allocated only during buffer registration done by
nvrm_gpu. Earlier, they were allocated during map.

nvrm_gpu will be sending metadata blob to be associated with the buffer.
This will have to be stored in the dmabuf privdata for all the buffers
registered by nvrm_gpu.

This patch moves the privdata allocation to buffer registration ioctl.

Remove g->mm.priv_lock as it is not needed now. This lock was added
to protect dmabuf private data setup. That private data is now
handled through dmabuf->ops and setup of dmabuf->ops is done
under dmabuf->lock.

To support legacy userspace, this patch still allocates comptags on
demand on map calls for unregistered buffers.

Bug 200586313

Change-Id: I88b2ca04c733dd02a84bcbf05060bddc00147790
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480761
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-02 11:42:08 -07:00
Jon Hunter
8a4b72a4aa gpu: nvgpu: Fix crash when reading CE_APP debugfs
The CE_APP debugfs nodes are created when the NVGPU driver is probed,
however, the 'ce_app' structure which contains the variables exposed
via the debugfs, is not allocated until nvgpu_finalize_poweron() is
called. Therefore, if the user attempts to access the CE_APP debugfs
nodes before the NVGPU has been powered on, for example, right after
Linux has booted, then this results in a NULL pointer dereference crash.
Fix this by moving the creation of the CE_APP debugfs nodes to
nvgpu_finalize_poweron_linux() which is called after
nvgpu_finalize_poweron().

Bug 200747304

Change-Id: Icd28952112f86887a1d6b6f8beb382f5189461a9
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572106
(cherry picked from commit 35a0c18d93e97265611c3bbfae41b39d9cd183e3)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587367
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-02 07:23:53 -07:00
Debarshi Dutta
608decf1e6 gpu: nvgpu: add support for powering off gpu
Add support for powering off IGPU for switching between
legacy to SMC mode/vice-versa or changing SMC configuration.
The power off can be issued as follows

echo 0 > /dev/nvgpu/igpu0/power

The following steps are done during a poweroff.
1) Deterministic channel idle
2) Acquire write_lock on l->busy semaphore.
3) Wait till power_usage decrements to indicate 0 active jobs.
4) Invoke pm_runtime_put_sync_suspend()
5) Invoke nvgpu_gr_remove_support() to clear existing GR memory.
6) Release write_lock on l->busy
7) Deterministic channel unidle.

Part of the sequence matches that of the gk20a_do_idle code.
The common parts are extracted into new functions
gk20a_block_new_jobs_and_idle() and gk20a_unblock_jobs()

For joint-rail case, the current implementation, does a railgate
and then sets pm_runtime_set_autosuspend_delay(-1) to disable
regular runtime resume/suspend.

Remove clearing of NVGPU_SUPPORT_MIG status during state change
ias it leads to inconsistencies.

Jira NVGPU-6920

Change-Id: I0b3eb3278176122ac061c1e8a94ebfb3c17c3925
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578501
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:50 -07:00
Tejal Kudav
b33079d47e gpu: nvgpu: Move intr data members from MC to CIC
Move interrupt specific data-members from common.mc to common.cic
Some of these data members like sw_irq_stall_last_handled_cond need
To be initialized much earlier during the OS specific init/probe stage.
Also, some more members from struct nvgpu_interrupts(like stall_size,
stall_lines[]), which will soon be moved to CIC will also need to be
initialized early during the OS specific probe stage.
However, the chip specific LUT can only be initialized after the
hal_init stage where the HALs are all initialized.
Split the CIC init to accommodate the above initialization requirements.

JIRA NVGPU-6899

Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 18:06:28 -07:00
Divya Singhatwaria
842bef7124 gpu: nvgpu: Support GPC and FBP Floorsweeping
- Add gops_fbp_fs and gops_gpc_pg struct
- Add HALs to write to NV_FUSE_CTRL_OPT_FBP and
  NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping
- Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask
  respectively during gpu probe
- Add sysfs node: fbp_fs_mask and gpc_fs_mask to store
  FBP and GPC floorsweeping mask sent from userspace
- Move the floorsweeping programming early in NVGPU’s GPU init
  function and then issue a PRI init.

JIRA NVGPU-6433

Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 06:17:25 -07:00
Ramesh Mylavarapu
d328bff79e gpu: nvgpu: gsp NVRISCV load and bootstrap
Changes:
- This change will only init gsp software
  state, nvgpu_gsp_bootstrap need to be called.
- CONFIG_NVGPU_GSP_SCHEDULER flag is created to
  compile out the gsp scheduler code when needed.
- Created GSP engine reset which is needed when
  ACR completed execution and need to load gsp fw.

NVGPU-6783

Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I2ce43e512b01df59443559eab621ed39868ad158
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554267
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-15 17:21:03 -07:00
Lakshmanan M
e9872a0d91 gpu: nvgpu: Skip graphics unit access when MIG is enabled
This CL covers the following modifications,
1) Added logic to skip the graphics unit specific sw context load
   register write during context creation when MIG is enabled.
2) Added logic to skip the graphics unit specific sw method
   register write when MIG is enabled.
3) Added logic to skip the graphics unit specific slcg and blcg gr
   register write when MIG is enabled.
4) Fixed some priv errors observed during MIG boot.
5) Added MIG Physical support for GPU count < 1.
6) Host clk register access is not allowed for GA100.
   So skipped to access host clk register.
7) Added utiliy api - nvgpu_gr_exec_with_ret_for_all_instances()
8) Added gr_pri_mme_shadow_ram_index_nvclass_v() reg field
   to identify the sw method class number.

Bug 200649233

Change-Id: Ie434226f007ee5df75a506fedeeb10c3d6e227a3
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2549811
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 16:41:51 -07:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
scottl
3cd256b344 gpu: nvgpu: add linux REMAP support
Add REMAP ioctl and accompanying support to the linux nvgpu driver.

REMAP support provides per-page control over sparse VM areas using the
concept of a virtual memory pool.

The REMAP ioctl accepts a list of operations (each a map or unmap) that
modify the VM area pages tracked by the virtual mmemory pool.

Inclusion of REMAP support in the nvgpu build is controlled by the new
CONFIG_NVGPU_REMAP flag.  This flag is enabled by default for linux builds.
A new NVGPU_GPU_FLAGS_SUPPORT_REMAP characteristics flag is added for use
in detecting when REMAP support is available.

When a VM allocation tagged with NVGPU_VM_AREA_ALLOC_SPARSE is made the
base virtual memory pool resources are allocated.  Per-page resources are
later allocated when the NVGPU_AS_IOCTL_REMAP ioctl is issued.  All REMAP
resources are released when the corresponding VM area is freed.

Jira NVGPU-6804

Change-Id: I1f2cdc0c06c1698a62640c1c6fbcb2f9db24a0bc
Signed-off-by: scottl <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542178
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-28 22:39:06 -07:00
Antony Clince Alex
68e11c8bd3 gpu: nvgpu: remove nvgpu_next_gpuid.h
Replace all usages of NVGPU_NEXT_GPUID and NVGPU_NEXT_DGPU_GPUID
with NVGPU_GPUID_GA10B and NVGPU_GPUID_GA100.

Remove nvgpu_next_gpuid.h and update yaml.

Jira NVGPU-4771

Change-Id: I3baf0de4eb5266b79aabd5c6ddf8442bf8f73419
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2547735
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-27 05:03:09 -07:00
Antony Clince Alex
c7d43f5292 gpu: nvgpu: remove usage of CONFIG_NVGPU_NEXT
The CONFIG_NVGPU_NEXT config is no longer required now that ga10b and
ga100 sources have been collapsed. However, the ga100, ga10b sources
are not safety certified, so mark them as NON_FUSA by replacing
CONFIG_NVGPU_NEXT with CONFIG_NVGPU_NON_FUSA.

Move CONFIG_NVGPU_MIG to Makefile.linux.config and enable MIG support
by default on standard build.

Jira NVGPU-4771

Change-Id: Idc5861fe71d9d510766cf242c6858e2faf97d7d0
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2547092
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-27 05:02:47 -07:00
Richard Zhao
ff75647d59 gpu: nvgpu: unify power state management code
The management code of g->power_on_state on different OS are almost
same, so moved the code to the common place.

Jira GVSCI-10882

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I890015867b7bbdf3f749ab275ffd085ef76dfec2
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542846
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-06-23 09:26:49 -07:00
Lakshmanan M
19186c8a02 gpu: nvgpu: select map access type from dmabuf permission and user request
Add api to translate dmabuf's fmode_t to gk20a_mem_rw_flag
for read only/read write mapping selection.

By default dmabuf fd mapping permission should be a maximum
access permission associated to a particual dmabuf fd.

Remove bit flag MAP_ACCESS_NO_WRITE and add 2 bit values for
user access requests NVGPU_VM_MAP_ACCESS_DEFAULT|READ_ONLY|
READ_WRITE.

To unify map access type handling in Linux and QNX move the
parameter NVGPU_VM_MAP_ACCESS_* check to common function
nvgpu_vm_map.

Set MAP_ACCESS_TYPE enabled flag in common characteristics
init function as it is supported for Linux and QNX.

Bug 200717195
Bug 3250920

Change-Id: I1a249f7c52bda099390dd4f371b005e1a7cef62f
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2507150
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-21 14:48:32 -07:00
Sagar Kamble
e0e337fb83 gpu: nvgpu: set nvgpu power state to POWERED_OFF on poweron fail
When force closing the app, poweron needed in channel close path will
fail as pg_task kthread creation fails with -EINTR (process is
SIGKILL'd so threads don't get created).

Upon poweron failure, device nodes are removed and the nvgpu power
state is not reset to NVGPU_STATE_POWERED_OFF. Hence on further
gk20a_busy attempts, poweron is not attempted and gpu remains
unusable from thereon.

Change the state to POWERED_OFF from POWERING_ON on poweron fail.

Bug 3308828

Change-Id: I2360f11a4937dfe93eb7933b30c13748fb570898
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2543797
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-15 04:58:28 -07:00
Debarshi Dutta
8f9ac1dea9 gpu: nvgpu: split away power node removal
Presently, gk20a_user_deinit is used to remove all device nodes
including "power" node as well.

Split removal of power node into a separate function
gk20a_power_node_deinit to enable other device removal during the
normal runtime_suspend path to facilitate the fast path for MIG
reconfiguration. Powernode can be removed only during a call to
Rmmod. This also enables separately powering off the device nodes
in the unlikely case of a poweron failure.

Bug 3308828
Jira NVGPU-6920

Change-Id: Ib045a09a992a63c468492a837b273cca41e20f15
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2543014
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-15 04:57:57 -07:00
Tejal Kudav
9f43914933 gpu: nvgpu: Move Intr handling common code to CIC
CIC (Central Interrupt controller) will be responsible for the
interrupt handling. common.cic unit is the placeholder for all
interrupt related code. Move interrupt related defines and
Public APIs present in common.mc to common.cic.
Note: The common.mc interrupts related struct definitions are
not moved as part of this patch.

Adapt the code to use interrupt handling related defines and public
APIs migrated from common.mc to common.cic

JIRA NVGPU-6899

Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-31 19:37:31 -07:00
Vedashree Vidwans
aba26fa082 gpu: nvgpu: handle chip specific erratas
Currently, there are few chip specific erratas present in nvgpu code.
For better traceability of the erratas and corresponding fixes,
introduce flags to indicate existing erratas on a chip. These flags
decide if a corresponding solution is applied to the chip(s).

This patch introduces below functions to handle errata flags:
- nvgpu_init_errata_flags
- nvgpu_set_errata
- nvgpu_is_errata_present
- nvgpu_print_errata_flags
- nvgpu_free_errata_flags

nvgpu_print_errata_flags: print below details of erratas present in chip
1. errata flag name
2. chip where the errata was first discovered
3. short description of the errata

Flags corresponding to erratas present in a chip are set during chip hal
init sequence.

JIRA NVGPU-6510

Change-Id: Id5a8fb627222ac0a585aba071af052950f4de965
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2498095
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-28 19:14:44 -07:00
Lakshmanan M
3f8c562004 gpu: nvgpu: Add nvgpu_early_poweron() support
1) NvGpu dev node needs to be created in gpu power on
early stage to avoid latency introduced by udevd.
For creating dev node, device and grmgr init
needs to move to early stage of GPU power on.
After grmgr init, NvGpu can identify the number of MIG
instance required for each physical GPU.
For that, added a new API nvgpu_early_poweron() to handle
early init which is required for before dev node creation.

2) Removed fifo dependency in nvgpu_init_gr_manager()

3) Used get_max_subctx_count() directly to query
the veid/subctx count.

JIRA NVGPU-6633

Change-Id: Ib9d7c3e184c71237b0da9305515ccd8ceda1d5ad
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2517173
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-22 15:00:54 -07:00
dt
639ca4edfb gpu: nvgpu: mig: Defer dev_nodes creation and create new power node to support MIG
- This is deferring the dev_nodes creation after power_on to
select the MIG config and to create the dev_nodes as per the
selected MIG config.

- The patch is adding a device node to issue power on. The
nodes are:
 for igpu :/dev/nvgpu/igpu0/power
 for dgpu:/dev/nvgpu/dgpu-0001:01:00.0/power

To issue power on :
echo "1" > /dev/nvgpu/igpu0/power
echo "1" > /dev/nvgpu/dgpu-0001:01:00.0/power

JIRA NVGPU-6633

Change-Id: Ic4f1f3e42724cc788dcfaf0e881d188fd3bd1ce1
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2512647
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-21 10:15:20 -07:00
Richard Zhao
cfc1281223 gpu: nvgpu: vgpu: remove gp10b support
gp10b vgpu won't be supported on future releases.

- removed gp10b vgpu hal code
- removed vgpu bar1 related code
- removed gp10b vgpu linux platform code

Jira GVSCI-10202

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ic1bfeb12c854df3808a0c7e67f5c52bc1e80ab2d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2517273
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-04-21 06:06:22 -07:00
Richard Zhao
643eb158a3 gpu: nvgpu: move mapped regs to gk20a
- moved reg fields to gk20a
- added os abstract register accessor in nvgpu/io.h
- defined linux register access abstract implementation
- hook up with posix. posix implementation of the register accessor uses
  the high 4 bit of address to identify register apertures then call the
  according callbacks.

It helps to unify code across OSes.

Bug 2999617

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ifcb737e4b4d5b1d8bae310ae50b1ce0aa04f750c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2497937
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-04-19 19:45:24 -07:00
Richard Zhao
a56d93aa2f gpu: nvgpu: linux: remove definition of ecc_sysfs_stats_htable
No one uses it anymore.

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I0ea7f62e4e4e53d8da66bc00dcbe08a1f94e19a8
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2497936
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-25 14:08:18 -07:00
Vedashree Vidwans
e445b57b04 gpu: nvgpu: Move interrupt ISR code to common
This is one of the steps in restructuring of interrupt code.
- Move ISR logic to common code. This will allow us to add mixed ASIL
error handling levels.
- Modify nonstall ISR to use threaded interrupts. Bottom half of
nonstall ISR will run nonstall operations instead of adding work to
workqueues.
- Remove nonstall workqueue implementation.

JIRA NVGPU-6351

Change-Id: I5f891b0de4b0c34f6ac05522a5da08dc36221aa6
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2467713
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-25 02:34:57 -07:00
Sagar Kamble
ff8fbf1004 Revert "gpu: nvgpu: disable DGPU_THERMAL_ALERT for k5.9 temporarily"
Bug 200669739

This reverts commit d3f5905a0c.

Change-Id: I76a4ca4ec5be316c24410860a722a13383a3cea3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2501427
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-19 14:37:00 -07:00
scottl
75d98f55d7 gpu: nvgpu: add SUPPORT_MAPPING_MODIFY flags
Add new NVGPU_SUPPORT_MAPPING_MODIFY enable flag that is used to
control the value of the exported NVGPU_GPU_FLAGS_SUPPORT_MAPPING_MODIFY
flag.

These flags are currently only enabled on linux in non-virtualized
environments.

Jira NVGPU-6374

Change-Id: Ia85c353b767b4f7d0aebc04838f44996bc38c61f
Signed-off-by: scottl <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2490986
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-16 14:07:21 -07:00
Sagar Kamble
bd7bda4f98 gpu: nvgpu: do_idle/unidle handling with runtime PM after probe
Extend the runtime suspend/resume based idle/unidle logic in the
probe case to handling done in gk20a_do_idle/unidle for nvgpu
after the probe completion.

If the railgating is disabled, setting autosuspend_delay to 0 will
enable the suspend. If railgating is enabled, autosuspend delay
will be > 0. Setting it to 0 will enable the immediate suspend.

With this approach based on RPM, forced_reset logic is removed.
force_reset_in_do_idle is also removed as railgating is
supported.

Bug 200602747
JIRA NVGPU-5356

Change-Id: Iaf6d5ab651b8200f0547b45d90f812110cf63c0e
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2375941
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:48 -06:00
Sagar Kamble
4012a97640 gpu: nvgpu: enable runtime PM before secure_alloc init during probe
With genpd based runtime PM, the device railgating is managed by the PM
core and the nvgpu manages the clocks. To suspend/resume the device for
idling/unidling while initializing secure alloc, runtime PM is to be
enabled during probe.

nvgpu platform railgate handlers will be only managing the clocks.
During probe, the nvgpu driver poweroff/poweron are not to be
invoked as part of driver runtime suspend/resume hence probe
state is added.

After platform probe initializes the clock, explicit runtime resume of
the device is required to sanely suspend it during gk20a_do_idle.

Runtime PM configuration differs based on the NVGPU_CAN_RAILGATE
capability, hence the runtime PM is enabled ("truly") only for
the duration of nvgpu_probe and then the state is reverted at
the beginning of gk20a_pm_late_init.

Bug 200602747
JIRA NVGPU-5356

Change-Id: I1fbd03d3f49da07ccbee9714387e00ffc688864e
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2375939
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:48 -06:00
Sagar Kamble
d3f5905a0c gpu: nvgpu: disable DGPU_THERMAL_ALERT for k5.9 temporarily
GPIOs are not working currently in k5.9.

Bug 200669739

Change-Id: Ia7848d55d95f7986cf0fa73a34b15b7546d62e0f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2437640
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Deepak Nibade
2a6c473fe6 gpu: nvgpu: remove interface names and static classes to create dev nodes
Remove static class definition and registration for iGPU and dGPU.
Create the class dynamically in gk20a_user_init() and setup the callback
function to create devnode name based on GPU type.

For now add nvgpu_pci_devnode() callback for dGPU that sets correct
dev node path for dGPUs. For iGPU, Android apparently does not honor dev
node path set in callback and hence override the device name for iGPU
with function nvgpu_devnode().

Destroy the class in gk20a_user_deinit().

This will overall be helpful in adding multiple classes and dev nodes
for each GPU instance in MIG mode.

Set GPU device pointer as the parent of new devices created with
device_create(). This is helpful in getting GPU device name in
callback function nvgpu_pci_devnode().

Update functions to not pass class structure and interface names :
nvgpu_probe()
gk20a_user_init()
gk20a_user_deinit()
nvgpu_remove()

Remove static interface name format like INTERFACE_NAME since it is no
longer needed.

Update GK20A_NUM_CDEVS to 10 since there are 10 dev nodes per GPU right
now.

Jira NVGPU-5648

Change-Id: I5d41db5a0f87fa4a558297fb4135a9fbfcd51080
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2423492
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00