Commit Graph

552 Commits

Author SHA1 Message Date
rmylavarapu
65a7896987 nvgpu: gpu: Implement PMU therm channel get status
Currently nvgpu reads the temperature by reading the
NV_THERM_I2CS_SENSOR_00 register. Below are the issues
with current approach
    1) NV_THERM_I2CS_SENSOR_00 doesn't support
       fractional precision which is POR.
    2) It doesn't support negative temperatures which
       is required for Auto.
    3) It doesn't take into account the right POR
       sensor in VFE VBIOS tables.

From therm channel get status interface we can read the
current temperature from PMU.

NVBUG - 200549047

Change-Id: I2fb21926208876f3d3bebe3f2dee08edafedbc7d
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196224
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Philip Elcan
9169e8c048 gpu: nvgpu: mc: move mc declarations to mc.h
Move declarations that belong to mc from gk20a.h to mc.h where they
belong.

JIRA NVGPU-2532

Change-Id: I91934ff60e2735c61d16459c04507fed6e1c96d7
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2214421
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Peter Daifuku
05c892f3f1 nvgpu: fix get_maxrate when no dvfs
In nvgpu_linux_get_maxrate, if tegra_dvfs_get_maxrate
returns 0 (a sign that there is no dvfs support), call
nvgpu_clk_arb_get_arbiter_clk_range to get the max
gpu frequency.

Bug 200543218

Change-Id: I4f9bc0acaef98cd9dfa22f709656f4bb7e9fd349
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2215161
(cherry picked from commit 12202fbdcf)
Reviewed-on: https://git-master.nvidia.com/r/2217945
GVS: Gerrit_Virtual_Submit
Reviewed-by: Luis Dib <ldib@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Philip Elcan
06fd513e1e gpu: nvgpu: move common.unit into common.mc
nvgpu.common.unit was just an enum used for passing to nvgpu.common.mc
APIs. So, move the enum into mc.h, and replace the include of unit.h
with mc.h where appropriate. And update the yaml arch.

JIRA NVGPU-4144

Change-Id: I210ea4d3b49cd494e43add1b52f3fbcdb020a1e3
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2216106
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Peter Daifuku
77e3704d3d nvgpu: vgpu: no debugfs entries that rely on PMU
When virtualized, the guest OS has no direct access to
PMU functionality:

- Don't create debugfs entries that rely on PMU access
- Clean up PMU vgpu HAL entries that imply that PMU access
  is supported

Bug 200543218

Change-Id: I12730b600802448a240f3de042760041d3ae7d29
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2213650
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vedashree Vidwans
7c98fbba42 gpu: nvgpu: fix MISRA 17.1 in logging functions
MISRA Rule 17.1 forbids use of stdarg.h features which are defined for
variable arguments.
This patch modifies logging macros to use slogf function for QNX builds.
This avoids use of variable argument functions used for formatting log
message.

Jira NVGPU-4075

Change-Id: I5b6bb1107a7e431afaa960003858193a477b2ee6
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2192016
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sagar Kamble
7a62265dde gpu: nvgpu: enable irqs before nvgpu_finalize_poweron
IRQs were not enabled before nvgpu_finalize_poweron, so debugging early
init issues such as MMU fault, invalid PRIV ring or bus access etc.
triggered during nvgpu power-on was cumbersome. Hence, Enable the
IRQs before nvgpu_finalize_poweron is called.

In HUB (MMU fault) ISR, MMU fault handling is only limited to snapped
in priv reg in case of fault during nvgpu power-on.

In HUB (MMU fault) ISR, access to fault buffers is synchronized as
nvgpu driver reads the fault buffer registers before proceeding
with fault handling. However, additional MMU fault handling
needs to be synchronized with GR/FIFO/quiesce/recovery setup
through nvgpu power-on state.

JIRA NVGPU-1592

Change-Id: I8a5f2fcd79cb7ad8e215359e7a9fad50bfd46d67
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2203861
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sagar Kamble
6c3c360462 gpu: nvgpu: protect nvgpu power state access using spinlock
IRQs can get triggered during nvgpu power-on due to MMU fault, invalid
PRIV ring or bus access etc. Handlers for those IRQs can't access the
full state related to the IRQ unless nvgpu is fully powered on.

In order to let the IRQ handlers know about the nvgpu power-on state
gk20a.power_on_state variable has to be protected through spinlock
to avoid the deadlock due to usage of earlier power_lock mutex.

Further the IRQs need to be disabled on local CPU while updating the
power state variable hence use spin_lock_irqsave and spin_unlock_-
irqrestore APIs for protecting the access.

JIRA NVGPU-1592

Change-Id: If5d1b5e2617ad90a68faa56ff47f62bb3f0b232b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2203860
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sagar Kamble
1cd6ae945c gpu: nvgpu: introduce nvgpu_enable_irqs
Prepare function to enable the stall and non-stall kernel interrupts.
Update the type of irq state irqs_enabled to bool.

JIRA NVGPU-1592

Change-Id: I758794e0f230814a0bea2f3c035562e9a5c7e0ea
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2203859
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Philip Elcan
065f98f669 gpu: nvgpu: init: add return for all init APIs
This adds return values for all init APIs. This make all the init APIs
have the same signature. This is a prerequisite to making a table of
init functions.

JIRA NVGPU-3980

Change-Id: I5b71fd06ad248092af133ffe908e2930acb6d2b0
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2202973
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Shashank Singh
6fd0d972ae nvgpu: gpu: include qnx_init unit in doxygen documentation
-Include qnx_init unit in doxygen documentation.
-Add documentation for gk20a_busy/idle and similar functions.
-Remove must_check return value as misra already reports violation for
 that.

Jira NVGPU-2571

Change-Id: I9573cb61865677944809dcc494d92f63cc6e0f58
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176755
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Abdul Salam
65ecd7a181 gpu: nvgpu: Remove fixed wait time for change seq completion
Currently after sending change seq RPC, nvgpu waits for a fixed time
of 20ms.
This CL replaces this with pmu_wait_message_cond, which will return
immediately after getting change seq completion event.
Also added debug fs node to get the change seq execution time.

Bug 200545366

Change-Id: Iba283f65d4949858be9cbff88de4d21a8c92ff81
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2202423
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vedashree Vidwans
920b704ec7 gpu: nvgpu: put memory ref count
Put dma buffer ref count for all vm buffer mapping fail conditions.

Bug 200531152

Change-Id: I6bfad867eb9bd636a48b5ceb3a4417a80994a3ec
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Original Author: Bruce Xu <brucex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2194025
(cherry picked from commit f85504ae46d65d5346d9e2a5cc84ffb960ba9fb7)
Reviewed-on: https://git-master.nvidia.com/r/2195439
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sunny Li
516023e1e4 gpu: nvgpu: sysfs adding NULL pointer check
golden image size will be set when memory allocated.
See function:
- nvgpu_gr_obj_ctx_init

If golden image size is 0, gr_golden_image should be a NULL
pointer in most cases. So add NULL pointer checking in
tpc_pg_mask_store to avoid NULL pointer exception.

Bug 2403210

Change-Id: I14df5cd94d7a4418c3089c5f84b6eab93c485ba6
Signed-off-by: Sunny Li <sunnyl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2161280
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Mahantesh Kumbar
525ff83910 gpu: nvgpu: Cleanup PMU unit header file pmu.h
Moved PMU subunits specific defines from pmu.h to
respective subunits header file by renaming properly
as needed

JIRA NVGPU-2457

Change-Id: Id29a2d5cb028fc69049738c735c5585b6276b115
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2199547
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Rajesh Devaraj
935c5f6578 gpu: nvgpu: fix misra violations in SDL
This patch addresses misra violations due to SDL error reporting
callbacks. In particular, it addresses the following misra violation:

- misra_c_2012_directive_4_7_violation: Calling function
  "nvgpu_report_*_err()" which returns error information without testing
  the error information.

JIRA NVGPU-4025

Change-Id: Ia10b6b3fd9c127a8c5189c3b6ba316f243cedf04
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196895
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sagar Kamble
e53d24d6d2 gpu: nvgpu: fix MISRA Rule 8.6 violations
ifdef function prototypes with CONFIG_* defines. This fixes MISRA rule
8.6 violations which complain about undefined functions.
Also moved nvgpu_channel_get_from_file prototype to ioctl_channel.h &
nvgpu_probe to driver_common.h as those are linux specific. Define
nvgpu_init_soc_vars in posix/soc.c as it is implemented in QNX.

JIRA NVGPU-3873

Change-Id: I5d2b238e1b5d1318867cd2416ac5f03cc6ab7c6a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196794
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Jeremy Ho
6118009b84 gpu: nvgpu: remove reversed ordering for deadlock
In some cases, we would get deadlock issue due to there are two locks
acquisition on common clk driver's lock and nvgpu driver's locks. At
the bug, inconsistent lock ordering problem will come with one thread
gets "nvgpu lock -> clk lock" and the other thread gets "clk lock ->
nvgpu lock".

Slove the latter path with one-time initializing clk_parent entry
and use cached data afterward.

Bug 2555115

Change-Id: I31c5c2728f406307e7cfd4e555f4db0c163234d8
Signed-off-by: Jeremy Ho <jeremyh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146727
(cherry picked from commit 42c2bdfb9f)
Reviewed-on: https://git-master.nvidia.com/r/2160290
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Thomas Fleury
62d7c5641f gpu: nvgpu: rename recovery capability
Rename "recovery" capability to more specific "fault recovery":
- NVGPU_SUPPORT_FAULT_RECOVERY in UAPI
- NVGPU_GPU_FLAGS_SUPPORT_FAULT_RECOVERY in enabled flags.

Jira NVGPU-3896

Change-Id: I2a60601a7c73ce15e08b65f377e8a27a526d5eb2
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2197427
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Preetham Chandru Ramchandra
1c1fd99faf gpu: nvgpu: Enable big pages if PAGE_SIZE >= 64k
Disable big pages only if iommu is not supported for the platform and
if kernel page size is less then 64k

Bug 2500080
Bug 2508793
Bug 2508677
Bug 2507041

Change-Id: I77dad7e54825e2cb36b5ca29e5d038a9bee293ff
Signed-off-by: Preetham Chandru Ramchandra <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2195084
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Debarshi Dutta
06949c508f gpu: nvgpu: Add support for XPU rail split
Check if CPU/GPU rails are joint, disable railgating if they are.
Add the DT support for T194 and T186 platforms.

Disable railgate_enable sysfs node update in the above condition.

Bug 200546450
Bug 200545711

Change-Id: I002488f6418805569b0ef0fc3032b58297adeafb
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2185221
(cherry picked from commit 1d532589b0
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/2190402
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:01:38 -06:00
Thomas Fleury
9f0dff4a03 gpu: nvgpu: add recovery capability
Add NVGPU_SUPPORT_RECOVERY and NVGPU_FLAGS_GPU_SUPPORT_RECOVERY,
to indicate if recovery is supported.

When true, an engine reset is performed in order to recover from an
uncorrectable error. When false, the driver enters SW quiesce state.

Jira NVGPU-3896

Change-Id: Iea809c13a844641e31ce6306fbd1630ef622bfe9
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2175447
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:01:38 -06:00
Thomas Fleury
f422aee393 gpu: nvgpu: use refcnt for ch mmu_debug_mode
Replaced ch->mmu_debug_mode_enabled with ch->mmu_debug_mode_refcnt.
If channel is enabled multiple times by userspace, then ref count is
updated accordingly. There is an expectation that enable/disable
calls are balanced for setting channel's mmu debug mode.
When unbinding the channel, decrease refcnt for the channel until it
reaches 0.
Also, removed tsg parameter from nvgpu_tsg_set_mmu_debug_mode as it
can be retrieved from ch.

Bug 2515097

Change-Id: If334e374a55bd14ae219edbfd3b1fce5ff25c226
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2184702
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-28 16:54:51 -07:00
Thomas Fleury
8057514a9f gpu: nvgpu: set FB/HSMMU debug mode
Set NV_PFB_HSMMU_PRI_MMU_DEBUG_CTRL and NV_PFB_PRI_MMU_DEBUG_CTRL
in addition to NV_PGRAPH_PRI_GPCS_MMU_DEBUG_CTRL, in
NVGPU_DBG_GPU_IOCTL_SET_CTX_MMU_DEBUG_MODE

Bug 2515097

Change-Id: I1763b43e79fac3edb68a35980683d58bfa89519f
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2115785
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-28 16:54:26 -07:00
Vedashree Vidwans
7bc3cdcf95 gpu: nvgpu: use vpr resize enabled API
This patch adds nvgpu API in linux and posix to query vpr resize.
The new API nvgpu_is_vpr_resize_enabled() is used in
nvgpu_submit_channel_gpfifo().
Previously, if non-deterministic channel has timeout disabled and
GPU cannot railgate on some platform, then channel doesn't power ref
count and results in video freeze. To resolve non-determinstic channel
job tracking needs to be enabled if vpr resize is supported or if GPU
can railgate.

Bug 200532122

Change-Id: Icfbff6253762b195b2f5955749343974b1a7a269
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2171093
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-28 14:24:19 -07:00
Thomas Fleury
95bb19827e gpu: nvgpu: add sw quiesce
For safety build, nvgpu driver should enter SW quiesce state
in case an uncorrectable error has occurred. In this state, any
activity on the GPU should be prevented, without powering off the GPU.
Also, a minimal set of operations should be used to enter SW quiesce
state.

Entering SW quiesce state does the following:
- set sw_quiesce_pending: when this flag is set, interrupt
  handlers exit after masking interrupts. This should help mitigate
  an interrupt storm.
- wake up thread to complete quiescing.

The thread performs the following:
- set NVGPU_DRIVER_IS_DYING to prevent allocation of new resources
- disable interrupts
- disable fifo scheduling
- preempt all runlists
- set error notifier for all active channels

Note: for channels with usermode submit enabled, userspace can
still ring doorbell, but this will not trigger any work on
engines since fifo scheduling is disabled.

Jira NVGPU-3493

Change-Id: I639a32da754d8833f54dcec1fa23135721d8d89a
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2172391
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-27 10:37:21 -07:00
Thomas Fleury
36fbd3bf40 gpu: nvgpu: check Board ID and VBIOS version
Check that current VBIOS meets minimal version requirement.
Read VBIOS Board ID to identify the board SKU.
Warn if VBIOS version is lower than expected version for this SKU.
Warn if Board ID is unknown.

Bug 200544064

Change-Id: I83176ab1342c9b8c8f5d273dd5ac00e6e26a0e7d
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176974
(cherry picked from commit 621a10c123b9ba25e3cb89dee340741c4ad2cd8e)
Reviewed-on: https://git-master.nvidia.com/r/2176931
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-23 04:18:13 -07:00
Shashank Singh
c4e29841e5 nvgpu: gpu: Fix misra rule 10.3 in vm unit
For getting mapping kind is passed as signed 32 bit whereas it is stored
as unsigned 32 bit. So, change the kind type to s16 in struct
nvgpu_mapped_buf and also in the declaration from int to s16 to address
that. This is a dependent change for qnx
https://git-master.nvidia.com/r/#/c/2174451/.

Jira NVGPU-3891

Change-Id: I0578409313442ad0e2f09c8019d2701b4da53ec9
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176497
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-22 14:07:25 -07:00
vinodg
087d4d3df4 gpu: nvgpu: rmmod support in dgpu simulation
Changes added to support "rmmod nvgpu" in dgpu simulation after gpu
poweron.

nvgpu_engine-wait_for_idle got stuck in busy mode for nvdec and nvec
engines in simulation as simulation doesnt support timeout.
These engines are not valid engines in nvgpu engine list.
Add nvgpu_engine_check_valid_id before checking engine status.

Simulation crash on accessing 0xb81604 top interrupt register.
Add func_priv_cpu_intr_top__size_1_v() function to get the supported
size than using default MAX_INTR_TOP_REGS.

nvlink is not supprted in dgpu simulation. Avoid warning for
-ENODEV return.

Avoid register read following gpu power off completion.

Bug 2498574

Change-Id: I9f9f1cf1ac4620242bda1d2cc0f29f51f81a6711
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2179930
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-21 23:38:56 -07:00
Sagar Kamble
2f95efd8d1 gpu: nvgpu: move CE app logic under CONFIG_NVGPU_DGPU
CE app functionality from nvgpu is non-safe for igpu. CE engines init
/reset/cg related functionality is required in safety. Hence move the
CE app logic under CONFIG_NVGPU_DGPU flag and update the sources
accordingly.

JIRA NVGPU-3814

Change-Id: I37aa00b1184baccd5fe569ec315be60ac42dac9b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168956
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-19 07:55:57 -07:00
Konsta Holtta
6e2e4d0658 gpu: nvgpu: delete value tracking in syncpt wait API
QNX nvhost_syncpt_wait_timeout_ext() no longer supports reporting the
current syncpoint value (which nvgpu does not use either).

Jira HOSTX-1347

Change-Id: I5108f19a53802df63df014dd0ec3a103e0c6531f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170180
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-19 07:07:18 -07:00
Konsta Holtta
4658ba6952 gpu: nvgpu: delete timestamp in legacy syncpt wait path
QNX nvhost_syncpt_wait_timeout_ext() no longer supports the completion
timestamp (which nvgpu does not use either).

Jira HOSTX-1347

Change-Id: Ib822fe1d549e42aaf3415f7a1ce5557b30b8430c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170179
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-19 07:07:09 -07:00
Preetham Chandru R
7963a40661 gpu: nvgpu: init: skip failing probe if therm DT entry is absent
For dGPU with PCIE interface do not have a thermal alert pin.
Only platforms where dGPU is used with SXM interface have the
thermal alert pin.
This change makes sure that if nvgpu-therm-gpio DT entry is
is missing we don't fail probe but continue with GPU
initialization without enabling thermal alert feature.

Bug 200542024

Change-Id: Iaf3aec9b66695a45daf86ecfdeec398b66f96bfd
Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2173495
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-16 05:32:58 -07:00
Scott Long
a139172130 gpu: nvgpu: mm: fix misra 2.7 violation
Advisory Rule 2.7 states that there should be no unused
parameters in functions.

This patch removes the unused struct gk20a pointer from
the nvgpu_aperture_str() function.

Jira NVGPU-3178

Change-Id: Ied7fed13e44f1083e7477a5d6fb9facafca838de
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2174883
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-14 15:17:14 -07:00
Divya Singhatwaria
2916a2067d gpu: nvgpu: Use TPC_PG_MASK to powergate the TPC
- In GV11B, read fuse_status_opt_tpc_gpc register
  to read which TPCs are floorswept.
- The driver will also read sysfs node: tpc_pg_mask
- Based on these two values "can_tpc_powergate" will
  be set to true or false and mask will be used to write to
  fuse_ctrl_opt_tpc_gpc register to powergate the TPC.
- can_tpc_powergate = true indicates that the mask value
  sent from userspace is valid and can be used to power gate
  the desired TPC
- can_tpc_powergate = false indicates that the mask value
  sent from userspace is not valid and cannot  be used to
  power gate the desired TPC.

Bug 200532639

Change-Id: Ib0806e4c96305a13b3574e8063ad8e16770aa7cd
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170736
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-12 00:47:55 -07:00
Vedashree Vidwans
19c80f89be gpu: nvgpu; fix MISRA errors in nvgpu.common.mm
Rule 2.2 doesn't allow unused variable assignments. The reason is
presence of unused variable assignments may indicate error in program's
logic.
Rule 21.x doesn't allow reserved identifier or macro names starting with
'_' to be reused or defined.

Jira NVGPU-3864

Change-Id: I8ee31c0ee522cd4de00b317b0b4463868ac958ef
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2163723
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-01 21:57:18 -07:00
Debarshi Dutta
48c00bbea9 gpu: nvgpu: rename channel functions
This patch makes the following changes

1) rename public channel functions to use nvgpu_channel prefix
2) rename static channel functions to use channel prefix

Jira NVGPU-3248

Change-Id: Ib556a0d6ac24dc0882bfd3b8c68b9d2854834030
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2150729
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-01 04:37:31 -07:00
Mahantesh Kumbar
6290d92926 gpu: nvgpu: PCIE table update for TU104-QS
-Added PCIE device info for TU104-QS chip & marked as FUSA SKU
 using device flag
-is_fusa_sku flag will be set if device flag has FUSA SKU flag set
 & this will be checked in driver to execute functionality
 specific to FUSA SKU

JIRA NVGPU-3727

Change-Id: I49ea357133ce0b9bbf52dae72afcf8139ab01346
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2161163
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-29 07:48:33 -07:00
rmylavarapu
e786c3fc35 gpu: nvgpu: Read current_volt from vol_rail_get_status
-Latest ucode doesn't support get_voltage RPC, the data
can be extracted from data obtained by volt_rail_get_status
board_obj cmd. Updating the debugfs node to read the data from
volt_rail_get_status.

JIRA NVGPU-3815

Change-Id: I85f84a757425411725773802c20f05063b222afc
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2153387
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-29 05:30:28 -07:00
Abdul Salam
e58e00b0fb gpu: nvgpu: Initialize clk counters for dGPU clocks
Initialize the clock counters for GPCCLK, XBARCLK, SYSCLK.
This INIT was done in PMU before, but now disabled from TU10A profile.
Hence the initialization is moved into nvgpu.

This patch does the following.
1. Move clock files from GV100 to TU104.
2. Add the Counter HW Registers.
3. Initialize the counter registers for gpc, xbar and sysclk.
4. Change the debug fs node from gv100 to tu104.
5. Update in yaml file with new file names.

Bug 200536091

Change-Id: I436019a18f5c4c73979977666d0c04ce4c569047
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2155298
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-26 04:07:01 -07:00
Scott Long
3c7cf8b75a gpu: nvgpu: fix MISRA 10.5 issue in timeout code
This change switches nvgpu_timeout_peek_expired() to return a bool
instead of an int to remove advisory rule MISRA 10.5 violations.

MISRA 10.5 states that the value of an expression should not be
cast to an inappropriate essential type.

JIRA NVGPU-3798

Change-Id: I5cf9badaf07493e11a639e47ae4cf221700134ff
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2155617
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-24 17:04:38 -07:00
Adeel Raza
59ac65d8d7 gpu: nvgpu: rename error notifier APIs
There was a name clash between the nvgpu_set_error_notifier*() APIs and
the SET_ERROR_NOTIFIER IOCTL. Therefore, the APIs were renamed from
nvgpu_set_error_notifier*() to nvgpu_set_err_notifier*(). This rename
was done to fix MISRA 5.x errors.

JIRA NVGPU-1633

Change-Id: I06af551a664b0706f106e853f1ea8733894f11bd
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159813
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-24 15:57:07 -07:00
Scott Long
93a74d6700 gpu: nvgpu: fix MISRA 10.5 issue in syncpt code
This change switches nvgpu_nvhost_syncpt_is_expired_ext()
to return a bool instead of an int to remove advisory rule
MISRA 10.5 violations.

MISRA 10.5 states that the value of an expression should not be
cast to an inappropriate essential type.

JIRA NVGPU-3798

Change-Id: Ie0772ac7167a3c999129de0dc7f22cd96faa28fc
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159801
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-24 15:56:43 -07:00
Vaibhav Kachore
e8c53b4e81 Revert "Revert "gpu: nvgpu: Improve accuracy of dGPU clk measurement""
This reverts commit ffda24df36.

Bug 2637525
Bug 200530176

Change-Id: I542e51ea340f344768f9a3a090164964372fb5d2
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2148174
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-24 10:16:30 -07:00
Philip Elcan
b0ad7c0ad2 gpu: nvgpu: init: move out linux-specific APIs
The functions nvgpu_warn_on_no_regs() and nvgpu_wait_for_idle() are only
used by linux, so move them out of nvgpu.common.init into linux-specific
driver code.

JIRA NVGPU-2385

Change-Id: Iea38cdb16f9e513d8242c1b07b80171b8b68db5b
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2156459
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-23 13:27:29 -07:00
Philip Elcan
91187b6db2 gpu: nvgpu: init: rename init functions
Rename init functions that still carry the gk20a moniker to use the more
appropriate nvgpu name instead.

JIRA NVGPU-2385

Change-Id: I5d40cd72943272c8b5f16b97d9a786d9c41496d4
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2156220
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-23 13:27:18 -07:00
Philip Elcan
9705c86b98 gpu: nvgpu: init: move functions from gk20a.h to own header
This moves the nvgpu.common.init function prototypes from gk20a.h to a
new unit-specific header nvgpu_init.h

JIRA NVGPU-2385

Change-Id: I48c0b0e02a8064be0eda89f26cf55189ffd55803
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2133845
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-23 13:26:12 -07:00
Rajesh Devaraj
2d8791e866 gpu: nvgpu: SWUD for SDL unit
This patch adds SWUD (SW Unit Design) document for SDL unit. In addition,
it re-names err_type to err_id in error reporting APIs related to ECC, GR,
PRI and MMU, to keep the name consistent with other APIs.

JIRA NVGPU-3758

Change-Id: I968218574aa78144497fc12bd6dab20d1be7aa1c
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2151092
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-19 00:05:40 -07:00
Vedashree Vidwans
280dceb864 gpu: nvgpu: fix MISRA issues nvgpu.common.clk_arb
MISRA Rule 5.7 doesn't allow reuse of variable or tag name.
MISRA Rule 21.x forbids use of identifiers beginning with an underscore.

This patch resolves MISRA violations in nvgpu.common.clk_arb for above
mentioned rules.

Jira NVGPU-3740

Change-Id: I73234d1a9e1c98812620dd1c3b9a80426742e747
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2151248
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-12 15:55:49 -07:00
ajesh
eaf1048111 gpu: nvgpu: fix MISRA violations in utils unit
MISRA rule 11.6 states that a cast shall not be performed between
pointer to void and an arithmetic type.  Fix violations of rule 11.6
in utils unit.

Jira NVGPU-3300

Change-Id: I9513baf326be9618bae9bcfed597bfe27a5a2f47
Signed-off-by: ajesh <akv@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2137305
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-11 05:42:54 -07:00