Commit Graph

40 Commits

Author SHA1 Message Date
Lakshmanan M
883c12529a gpu: nvgpu: Add multi GR reset support for MIG
* Added multi GR reset/recovery support for MIG.
* Added a api to get the gr engine id using gr instance id.

JIRA NVGPU-5650
JIRA NVGPU-5653

Change-Id: I12ece75a4c33f0944f404121b54879e814dda6df
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2443644
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:48 -06:00
Konsta Hölttä
e8201d6ce3 gpu: nvgpu: decouple channel watchdog dependencies
The channel code needs the watchdog code and vice versa. Cut this
circular dependency with a few simplifications so that the watchdog
wouldn't depend on so much.

When calling watchdog APIs that cause stores or comparisons of channel
progress, provide a snapshot of the current progress instead of a whole
channel pointer. struct nvgpu_channel_wdt_state is added as an interface
for this to track gp_get and pb_get.

When periodically checking the watchdog state, make the channel code ask
whether a hang has been detected and abort the channel from within
channel code instead of asking the watchdog to abort the channel. The
debug dump verbosity flag is also moved back to the channel data.

Move the functionality to restart all channels' watchdogs to channel
code from watchdog code. Looping over active channels is not a good
feature for the watchdog; it's better for the channel handling to just
use the watchdog as a tracking tool.

Move a few unserviceable checks up in the stack to the callers of the
wdt code. They're a kludge but this will do for now and demonstrates
what needs to be eventually fixed.

This does not leave much code in the watchdog unit. Now the purpose of
the watchdog is to only isolate the logic to couple a timer and progress
snapshots with careful locking to start and stop the tracking.

Jira NVGPU-5582

Change-Id: I7c728542ff30d88b1414500210be3fbaf61e6e8a
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369820
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
359fc24aaf gpu: nvgpu: Rework engine management to work with vGPU
Currently the vGPU engine management rewrites a lot of the common
device agnostic engine management code.

With the new top HAL parsing one device at a time, it is now more
easily possible to tie the vGPU into the new common device framework
by implementing the top HAL but with the vGPU engine list backend.

This lets the vGPU inherit all the common engine and device
management code. By doing so the vGPU HAL need only implement a
trivial and simple HAL.

This also gets us a step closer to merging all of the CE init
code: logically it just iterates through all CE engines whatever
they may be. The only reason this differs between chips is because
of the swap from CE0-2 to LCEs in the Pascal generation. This could
be abstracted by the unit code easily enough.

Also, the pbdma_id for each engine has to be added to the device
struct. Eventually this was going to happen anyway, since the
device struct will soon replace the nvgpu_engine_info struct.
It's a little bit of an abuse but might be worth it long term. If
not, it should not be difficult to replace uses of dev->pbdma_id
with a proper lookup of PBDMA ID based on the device info.

JIRA NVGPU-5421

Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
194fac7f3c gpu: nvgpu: Remove clutter in engine code
Remove the get_mask_on_id() HAL and replace it's usage with the
global nvgpu_engine_get_mask_on_id() function. There's no need
to have this function as a HAL.

JIRA NVGPU-5420

Change-Id: I4fc843beff8e65806da26a0addc83fa218d390ac
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361315
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
6cbc174fc2 gpu: nvgpu: avoid channel wdt ifdefs
Implement empty stubs of the channel watchdog functions for when
watchdog is disabled from build. Add some forward declarations that were
missing. Now most call sites don't need #idefs for the build flag.

Add error checks for the wdt alloc failure.

Jira NVGPU-5494
Jira NVGPU-5493

Change-Id: I2d42e8ab4c5e045cd280b2e1f254396127bd154b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2352050
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
21e02878f4 gpu: nvgpu: move wdt code out of channel.c
Cut and paste the existing channel watchdog functions to another file
for better isolation of units.

Jira NVGPU-5494

Change-Id: Id437f0939e69a4a8b495eaee164c4d7a9f283fa9
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2345934
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Scott Long
5ee9a446b5 gpu: nvgpu: misra 12.1 fixes
MISRA Advisory Rule states that the precedence of operators within
expressions should be made explicit.

This change removes the Advisory Rule 12.1 violations from various
common units.

Jira NVGPU-3178

Change-Id: I4b77238afdb929c81320efa93ac105f9e69af9cd
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2277480
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Thomas Fleury
6b62e0f79a gpu: nvgpu: engine preempt timeout in safety
Preempt TSG occurs in non-mission mode, when unbinding channel
from TSG, or aborting TSG. Should a preempt not complete on
engine, we expect other HW safety mechanisms such as FECS
watchdog to detect issues that prevented saving current context.
Add BUG_ON when attempting to recover from preempt timeout,
to make sure we got such error, and sw_quiesce has been
requested.

Jira NVGPU-4230

Change-Id: Ia26a61e703f74eb28d29e72e75664ca4ec97a586
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265082
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Thomas Fleury
5e9bdbc80d gpu: nvgpu: runlist update timeout in safety
Runlist update occurs in non-mission mode, when
adding/removing channel/TSGs. The pending bit
is a debug only feature. As a result logging a
warning is sufficient.

We expect other HW safety mechanisms such as
PBDMA timeout to detect issues that caused pending
to not clear. It's possible bad base address could
cause some MMU faults too.

Worst case we rely on the application level task
monitor to detect the GPU tasks are not completing
on time.

Jira NVGPU-4322

Change-Id: I7233770349db5dfad6904170a1e9a2d5eada70b2
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265094
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Thomas Fleury
5e688c35f8 gpu: nvgpu: set error notifier in SW quiesce
For MMU and PBDMA faults, error notifier needs to be set
before entering SW quiesce. Otherwise it ends up with
default NVGPU_ERR_NOTIFIER_FIFO_ERROR_IDLE_TIMEOUT.

Added nvgpu_rc_mmu_fault to:
- call g->ops.fifo.recover when recovery is enabled
- set MMU error when recovery is disabled

Updated nvgpu_rc_pbdma_fault to set PBDMA error when
recovery is disabled as well.

Wait for deferred interrupts to complete before actually
entering SW quiesce state, to make sure error notifier has
been set.

Jira NVGPU-4127

Change-Id: Ia84c723e021e397391c6c609d4bb96c06afdcc47
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2210909
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Thomas Fleury
b8465d479d gpu: nvgpu: sw quiesce when recovery is disabled
When CONFIG_NVGPU_RECOVERY is disabled, warn if recovery function
is entered with sw_quiesce_pending false.

Jira NVGPU-3871

Change-Id: Ic8e878ff6637c07f80b1a3542355ec51f729fe12
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2175446
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:01:38 -06:00
Scott Long
4277f65834 gpu: nvgpu: fix misra 2.7 violations
Advisory Rule 2.7 states that there should be no unused
parameters in functions.

This patch removes unused function parameters from the following:

 * nvgpu_channel_ctxsw_timeout_debug_dump_state()
 * nvgpu_channel_destroy()
 * nvgpu_tsg_destroy()
 * nvgpu_rc_pdbma_fault()

Jira NVGPU-3178

Change-Id: I12ad0d287fd7980533663a9776428ef5d4fd1fb9
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176066
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-16 16:06:04 -07:00
Thomas Fleury
4899875f00 gpu: nvgpu: fix missing exports when disabling recovery
Some recovery functions are currently exported in libnvgpu_safe.export.
Once CONFIG_NVGPU_RECOVERY is permanently disabled for safety build,
we can remove those functions from the export file.
Until we can disable it, make sure that related functions do exist,
even when CONFIG_NVGPU_RECOVERY is disabled, by using #ifdefs inside
the functions, instead of redeclaring functions as static inline.

Jira NVGPU-3871

Change-Id: Ib682ae81268b35cd1050a55cc73653fb6637b87c
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2170433
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-08 16:06:30 -07:00
Debarshi Dutta
0ef96e4b1a gpu: nvgpu: correct handling of pbdma rc
nvgpu_rc_pbdma_fault just checks for the id and id_type from struct
nvgpu_pbdma_status_info. These contain invalid values during chsw_load
and chsw_switch. This patch corrects the above bug by checking for the
chsw status and then loading the values for id and type.

The current code reads the pbdma_status info after clearing the
interrupt. Other interrupts can cause enough delay between clearing the
interrupt and pbdma switching the channel leading to invalid channel/tsg
ID. Correct that by reading the pbdma_status info register before
clearing of the pbdma interrupt to correctly read the context
information before the pbdma can switch out the context.

Bug 2648298

Change-Id: Ic2f0682526e00d14ad58f0411472f34388183f2b
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2165047
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-05 02:56:16 -07:00
Deepak Nibade
0755b25231 gpu: nvgpu: remove reset and enable/disable ctxsw hals
Remove below hals since the corresponding functions are same on all
platforms and they are h/w independent
g->ops.gr.enable_ctxsw()
g->ops.gr.disable_ctxsw()
g->ops.gr.reset()

Call the functions directly at all places

Remove CONFIG_NVGPU_DEBUGGER from places where these functions are
called since they are not debugger dependent
This also helps to disable CONFIG_NVGPU_DEBUGGER and to keep recovery
sequence intact

Jira NVGPU-3506

Change-Id: Id2b208ca23dc4667e78edcd8ad242a8558e0ff64
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2137255
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-18 01:39:20 -07:00
Deepak Nibade
10fae67c21 gpu: nvgpu: add flag for debugger fields in struct gk20a
Add CONFIG_NVGPU_DEBUGGER flag for debugger specific fields in struct
gk20a

Jira NVGPU-3506

Change-Id: Icfae87e16e0079a2c5f16714b8a8ced7c6572cd4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2137254
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-18 01:39:10 -07:00
Sagar Kamble
3f08cf8a48 gpu: nvgpu: rename feature Make and C flags
Name the Make and C flag variables consistently wih syntax:
CONFIG_NVGPU_<feature name>

s/NVGPU_DEBUGGER/CONFIG_NVGPU_DEBUGGER
s/NVGPU_CYCLESTATS/CONFIG_NVGPU_CYCLESTATS
s/NVGPU_USERD/CONFIG_NVGPU_USERD
s/NVGPU_CHANNEL_WDT/CONFIG_NVGPU_CHANNEL_WDT
s/NVGPU_FEATURE_CE/CONFIG_NVGPU_CE
s/NVGPU_GRAPHICS/CONFIG_NVGPU_GRAPHICS
s/NVGPU_ENGINE/CONFIG_NVGPU_FIFO_ENGINE_ACTIVITY
s/NVGPU_FEATURE_CHANNEL_TSG_SCHED/CONFIG_NVGPU_CHANNEL_TSG_SCHED
s/NVGPU_FEATURE_CHANNEL_TSG_CONTROL/CONFIG_NVGPU_CHANNEL_TSG_CONTROL
s/NVGPU_FEATURE_ENGINE_QUEUE/CONFIG_NVGPU_ENGINE_QUEUE
s/GK20A_CTXSW_TRACE/CONFIG_NVGPU_FECS_TRACE
s/IGPU_VIRT_SUPPORT/CONFIG_NVGPU_IGPU_VIRT
s/CONFIG_TEGRA_NVLINK/CONFIG_NVGPU_NVLINK
s/NVGPU_DGPU_SUPPORT/CONFIG_NVGPU_DGPU
s/NVGPU_VPR/CONFIG_NVGPU_VPR
s/NVGPU_REPLAYABLE_FAULT/CONFIG_NVGPU_REPLAYABLE_FAULT
s/NVGPU_FEATURE_LS_PMU/CONFIG_NVGPU_LS_PMU
s/NVGPU_FEATURE_POWER_PG/CONFIG_NVGPU_POWER_PG

JIRA NVGPU-3624

Change-Id: I8b2492b085095fc6ee95926d8f8c3929702a1773
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2130290
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-11 09:46:24 -07:00
Deepak Nibade
649a2b57a8 gpu: nvgpu: add debugger flag for hal.gr.gr unit
Add NVGPU_DEBUGGER flag for common.hal.gr.gr unit and corresponding
hals.

Also add this flag for deferred reset functionality

Jira NVGPU-3506

Change-Id: Iee4fbc1305346bb4d779cd69e8fd5539cb07206b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2130149
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-06 16:28:44 -07:00
Debarshi Dutta
4c30bd599f gpu: nvgpu: rename tsg_gk20a*/gk20a_tsg* functions.
rename the functions with the prefixes tsg_gk20a*/gk20a_tsg*
to nvgpu_tsg_*

Jira NVGPU-3248

Change-Id: I9f5f601040d994cd7798fe76813cc86c8df126dc
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2120165
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-17 01:49:27 -07:00
Debarshi Dutta
1dea88c6c7 gpu: nvgpu: Add NVGPU_CHANNEL_WDT flag
NVGPU_CHANNEL_WDT feature is embedded within the NVGPU_CHANNEL_WDT flag
to allow it to be compiled out for safety builds.

Jira NVGPU-3012

Change-Id: I0ca54af9d7b1b8e01f4090442341eaaadca8e339
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2114480
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-16 23:28:13 -07:00
Seema Khowala
671f1c8a36 gpu: nvgpu: channel MISRA fix for Rule 21.2
Rename
_gk20a_channel_get -> nvgpu_channel_get__func
gk20a_channel_get -> nvgpu_channel_get
_gk20a_channel_put -> nvgpu_channel_put__func
gk20a_channel_put -> nvgpu_channel_put
trace_gk20a_channel_get -> trace_nvgpu_channel_get
trace_gk20a_channel_put -> trace_nvgpu_channel_put

JIRA NVGPU-3388

Change-Id: I4e37adddbb5ce14aa18132722719ca2f73f1ba52
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2114118
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 04:39:34 -07:00
Seema Khowala
26d13b3b6b gpu: nvgpu: channel MISRA fix for Rule 21.2
Rename functions starting with '_' and '__'.
__gk20a_channel_kill -> nvgpu_channel_kill
_gk20a_channel_from_id -> nvgpu_channel_from_id__func
gk20a_channel_from_id -> nvgpu_channel_from_id

JIRA NVGPU-3388

Change-Id: I3b5f63bf214c5c5e49bc84ba8ef79bd49831c56e
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2114037
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 04:39:08 -07:00
Debarshi Dutta
17486ec1f6 gpu: nvgpu: rename tsg_gk20a and channel_gk20a structs
rename struct tsg_gk20a to struct nvgpu_tsg and rename struct
channel_gk20a to struct nvgpu_channel

Jira NVGPU-3248

Change-Id: I2a227347d249f9eea59223d82f09eae23dfc1306
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2112424
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-06 02:56:53 -07:00
Seema Khowala
cfb4ff0bfb gpu: nvgpu: rename struct fifo_gk20a
Rename
struct fifo_gk20a -> nvgpu_fifo

JIRA NVGPU-2012

Change-Id: Ifb5854592c88894ecd830da092ada27c7f05380d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109625
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-03 16:25:43 -07:00
Seema Khowala
39070c653f gpu: nvgpu: move FIFO_INVAL_* out of fifo_gk20a.h
Move and rename
FIFO_INVAL_ENGINE_ID -> NVGPU_INVALID_ENG_ID
FIFO_INVAL_TSG_ID -> NVGPU_INVALID_TSG_ID
FIFO_INVAL_RUNLIST_ID -> NVGPU_INVALID_RUNLIST_ID
FIFO_INVAL_SYNCPT_ID -> NVGPU_INVALID_SYNCPT_ID
FIFO_INVAL_CHANNEL_ID -> NVGPU_INVALID_CHANNEL_ID

JIRA NVGPU-2012

Change-Id: Ic4cc16ece64d85e22f16e4d28dcfd0c187bb65f3
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109011
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 23:40:26 -07:00
Debarshi Dutta
965062c2bc gpu: nvgpu: remove direct tsg retrieval from fifo
Added
- nvgpu_tsg_check_and_get_from_id
- nvgpu_tsg_get_from_id

And removed direct accesses to f->tsg array.

Jira NVGPU-3156

Change-Id: I8610e19c1a6e06521c16a1ec0c3a7a011978d0b7
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2101251
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-26 14:16:47 -07:00
Vinod G
344b164eea gpu: nvgpu: remove gr_gk20a.h from gk20a.h
Remove gr_gk20a.h from gk20a.h
Add gr_gk20a.h in all gr hal files

Removed ununsed gr_priv.h from two files

Jira NVGPU-3217
Jira NVGPU-3218

Change-Id: Ic74c068782432e99ddba168f65a5cf42e1405305
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2104569
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-25 16:27:11 -07:00
Seema Khowala
60633ca551 gpu: nvgpu: move gv11b rc code to rc_gv11b.c
Move chip specific recovery code for volta onwards
architecture to hal/rc/rc_gv11b.c

Rename
fifo.teardown_ch_tsg -> fifo.recover
gk20a_runlist_update_locked -> nvgpu_runlist_update_locked

Remove
Unused h/w headers from fifo_gv11b.c

Use local variable f instead of g->fifo

JIRA NVGPU-1314

Change-Id: Ia535bbe4780e7241fdd911a8f577c6b98cf0fe53
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2102897
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-24 20:23:06 -07:00
Seshendra Gadagottu
a91535e3a3 gpu: nvgpu: avoid gr_falcon dependency outside gr
Basic units like fifo, rc are having dependency on
gr_falcon. Avoided outside gr units dependency on gr_falcon
by moving following functions to gr:

int nvgpu_gr_falcon_disable_ctxsw(struct gk20a *g,
			struct nvgpu_gr_falcon *falcon); ->
int nvgpu_gr_disable_ctxsw(struct gk20a *g);

int nvgpu_gr_falcon_enable_ctxsw(struct gk20a *g,
			struct nvgpu_gr_falcon *falcon); ->
int nvgpu_gr_enable_ctxsw(struct gk20a *g);
int nvgpu_gr_falcon_halt_pipe(struct gk20a *g); ->
		int nvgpu_gr_halt_pipe(struct gk20a *g);

HALs also moved accordingly and updated code to reflect this.

Also moved following data back to gr from gr_falcon:
struct nvgpu_mutex ctxsw_disable_mutex;
int ctxsw_disable_count;

JIRA NVGPU-3168

Change-Id: I2bdd4a646b6f87df4c835638fc83c061acf4051e
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2100009
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-23 05:04:44 -07:00
Vinod G
dc82262b99 gpu: nvgpu: Add gr_priv header file
Move nvgpu_gr structure to private file gr_priv.h
Include the private file where gr variables are used.

JIRA NVGPU-3132
JIRA NVGPU-3079

Change-Id: Ib26ca5c5cb25fd8dd013a7c643278efc34aa55d4
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2098021
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-22 03:15:09 -07:00
Vinod G
556e139077 gpu: nvgpu: Cleanup for gr_gk20a header
Removed unused struct from gr_gk20a.h
Change static allocation for struct gr_gk20a to dynamic type.
Change all the files that being affected by that change.

Call gr allocation from corresponding init_support functions, which
are part of the probe functions.
nvgpu_pci_init_support in pci.c
vgpu_init_support in vgpu_linux.c
gk20a_init_support in module.c

Call gr free before the gk20a free call in nvgpu_free_gk20a.

Rename struct gr_gk20a to struct nvgpu_gr

JIRA NVGPU-3132

Change-Id: Ief5e664521f141c7378c4044ed0df5f03ba06fca
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2095798
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-19 00:04:00 -07:00
Seema Khowala
ca628dfd6e gpu: nvgpu: move engine functions to engines.c
Removed
fifo.runlist_busy_engines ops

Moved to engines.c and renamed
gk20a_fifo_get_failing_engine_data -> nvgpu_engine_find_busy_doing_ctxsw
gk20a_fifo_get_faulty_id_type -> nvgpu_engine_get_id_and_type
gk20a_fifo_runlist_busy_engines -> nvgpu_engine_get_runlist_busy_engines

JIRA NVGPU-1314

Change-Id: I89c81f331321d47a616a785082d66f9b4a51ff71
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093788
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-18 15:55:24 -07:00
Seema Khowala
92e26141d5 gpu: nvgpu: move gk20a_fifo_recover to common/rc
Move gk20a_fifo_recover from gk20a/fifo_gk20a.c to
common/rc/rc.c
Rename gk20a_fifo_recover -> nvgpu_rc_fifo_recover

JIRA NVGPU-1314

Change-Id: I5155a73cda9a60275dacd2568423386cd0f808ee
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093719
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:06:08 -07:00
Seema Khowala
03b521d9d7 gpu: nvgpu: move nvgpu_tsg_recover to common/rc
Moved from common/tsg to common/rc and renamed
nvgpu_tsg_recover -> nvgpu_rc_tsg_and_related_engines

JIRA NVGPU-1314

Change-Id: I887d5fcdb15def13cc74e2993312b3b36119c97c
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2095622
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:05:59 -07:00
Seema Khowala
c570ba99ed gpu: nvgpu: move sched error bad tsg recovery
Move sched error bad tsg recovery from fifo_intr_gv11b.c
to common/rc/rc.c

JIRA NVGPU-1314

Change-Id: Ic731a3162cad2fe184d764f0b3ad98acc1f382cb
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2095621
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:05:49 -07:00
Seema Khowala
8c5c9de72a gpu: nvgpu: move gr recovery from gr_gk20a.c to common/rc
Move gr fault recovery from gr_gk20a.c to common/rc/rc.c

JIRA NVGPU-1314

Change-Id: I0d924975f0397ae2417e5a43b2d048f3ae9c4f79
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093706
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:05:40 -07:00
Seema Khowala
2f00275584 gpu: nvgpu: move preempt timeout rc from fifo to rc
Move preempt timeout recovery related function to common/rc.
Remove nvgpu_channel_recover as bare channels are not recovered.
Recover channels bound to tsg.

JIRA NVGPU-1314

Change-Id: Ic1f94b321d0404eea86dd6d6d990529b2f3a8d57
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093682
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:05:25 -07:00
Seema Khowala
1882a7413d gpu: nvgpu: move runlist update timeout rc to common/rc
Move runlist update timeout recovery from runlist.c to
rc.c
Move RC_TYPE defines from fifo.h to rc.h

JIRA NVGPU-1314

Change-Id: I66925ca9fba904c523be69ad99808e3de33a7d46
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093666
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 17:05:10 -07:00
Debarshi Dutta
c48bfdd0d6 gpu: nvgpu: move gk20a_fifo_pbdma_fault_rc to common.rc unit
gk20a_fifo_pbdma_fault_rc is moved to common.rc unit and renamed to
nvgpu_rc_pbdma_fault.

The function is modified such that when the pbdma id is a channel,
recovery is issued only when the channel is part of a valid tsg.

Jira NVGPU-2950

Change-Id: I5e975cf79810479f83ffd50581c214a64d1619a6
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2083749
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-29 04:47:28 -07:00
Seema Khowala
dfafddcc21 gpu: nvgpu: move common and chip specific ctxsw timeout
Delete apply_ctxsw_timeout_intr ops and add
ctxsw_timeout_enable ops

Move chip specific sched_error and ctxsw_timeout
functions to hal/fifo/fifo_intr_* and hal/fifo/ctxsw_timeout_*

Add nvgpu_rc_ctxsw_timeout function under common/rc/rc.c

Do not check ctxsw timeout for channels that are no more
bound to tsg.

JIRA NVGPU-1312

Change-Id: Ide977fb60b3b72a27d9f22873f7a416c3bd1181d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2075734
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-25 22:47:45 -07:00