Commit Graph

4595 Commits

Author SHA1 Message Date
Debarshi Dutta
c81cc032c4 gpu: nvgpu: add cg and pg function
Add new power/clock gating functions that can be called by
other units.

New clock_gating functions will reside in cg.c under
common/power_features/cg unit.

New power gating functions will reside in pg.c under
common/power_features/pg unit.

Use nvgpu_pg_elpg_disable and nvgpu_pg_elpg_enable to disable/enable
elpg and also in gr_gk20a_elpg_protected macro to access gr registers.

Add cg_pg_lock to make elpg_enabled, elcg_enabled, blcg_enabled
and slcg_enabled thread safe.

JIRA NVGPU-2014

Change-Id: I00d124c2ee16242c9a3ef82e7620fbb7f1297aff
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2025493
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from c905858565 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2108406
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-09 14:41:30 -07:00
Anuj Gangwar
f495f52c70 nvgpu: Change the path in the dependent files
changes in path because we move the nvhost linux user-interface
from include/linux/ to include/uapi/linux

depends on I2e116dc8f6c33f53c03fb56b923931b6e600b534

Bug 2062672

Change-Id: If2e165852432d5795cf6680cfeb5d4b661fdee74
Signed-off-by: Anuj Gangwar <anujg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1953731
(cherry picked from commit 4e7333967d)
Reviewed-on: https://git-master.nvidia.com/r/2110254
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-03 13:43:59 -07:00
Seema Khowala
889271dc04 gpu: nvgpu: change err to info print if failing eng id is -1
For handle_sched_error, change err to info print for failing eng
id returned as -1 i.e. FIFO_INVAL_ENGINE_ID as no engine is found
busy doing ctxsw. May be ctxsw already finished for the context
for which ctxsw timeout intr was triggered.

Possible Causes:
a)
On hitting engine reset, h/w drops the ctxsw_status to INVALID in
fifo_engine_status register. Also while the engine is held in reset
h/w passes busy/idle straight through. fifo_engine_status registers
are correct in that there is no context switch outstanding
as the CTXSW is aborted when reset is asserted.
This is just a side effect of how gv100 and earlier versions of
ctxsw_timeout behave.
With gv10b and later, h/w snaps the context at the point of error
so that s/w can see the tsg_id which caused the HW timeout.
b)
If engines are not busy and ctxsw state is valid then intr occurred
in the past and if the ctxsw state has moved on to VALID from LOAD
or SAVE, it means that whatever timed out eventually finished
anyways. The problem with this is that s/w cannot conclude which
context caused the problem as maybe more switches occurred before
intr is handled.

Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287

Change-Id: Ia79bee6e860fb179ee39024c963671d4f8245227
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2030866
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry-picked from d27f875d2c
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2076126
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 02:43:42 -07:00
Seema Khowala
dd282e229a gpu: nvgpu: do not do timeout_debug_dump for non fifo_error_idle_timeout
Any recovery that goes through gk20a_fifo_recover path e.g. gr error,
mmu fault or any recovery that involves engine recovery as well, will
still dump the full debug dump. This change will just avoid dumping debug
dump for force reset channels and pbdma intr if they do not involve
engine recovery. For FIFO_ERROR_IDLE_TIMEOUT error notifiers that
involves tsg recovery only, debug_dump will happen only if
timeout_debug_dump is set. timeout_debug_dump by default is set to true
but can be changed using NVGPU_IOCTL_CHANNEL_SET_TIMEOUT_EX.

Bug 2092051

Change-Id: Ibbf3cd2c44c586d9deb9e61ffbf37945b8d9e428
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2033068
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 5222d0ff4f
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2076117
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 02:43:11 -07:00
Seema Khowala
ef69df6dae gpu: nvgpu: add hal to mask/unmask intr during teardown
ctxsw timeout error prevents recovery as it can get triggered
periodically. Disable ctxsw timeout interrupt to allow recovery.

Bug 2092051
Bug 2429295
Bug 2484211
Bug 1890287

Change-Id: I47470e13968d8b26cdaf519b62fd510bc7ea05d9
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2019645
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 68c13e2f04
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2024899
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 02:43:02 -07:00
Seema Khowala
9bde6f8950 gpu: nvgpu: gv11b: add missing tsg_mark_error
nvgpu_tsg_mark_error is missing in teardown path for aborting tsg.
Without this, channels corresponding to tsg being aborted will not be
set to timedout (unserviceable) and also notifier_wq and semaphore_wq
will not be woken up.

Bug 2092051
Bug 2429295
Bug 2484211

Change-Id: Ie71c9a3b7a7fd1aa8cb9ec5d0dc30ccaeadfeae5
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1999026
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 7fed0c1937
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2086594
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-16 03:57:43 -07:00
Peng Liu
3a11883f7f gpu: nvgpu: using pmu counters for load estimate
PMU counters #0 and #4 are used to count total cycles and busy cycles.
These counts are used by podgov to estimate GPU load.

PMU idle intr status register is used to monitor overflow. Overflow
rarely occurs because frequency governor reads and resets the counters
at a high cadence. When overflow occurs, 100% work load is reported to
frequency governor.

Bug 1963732

Change-Id: I046480ebde162e6eda24577932b96cfd91b77c69
Signed-off-by: Peng Liu <pengliu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1939547
(cherry picked from commit 34df003519)
Reviewed-on: https://git-master.nvidia.com/r/1979495
Reviewed-by: Aaron Tian <atian@nvidia.com>
Tested-by: Aaron Tian <atian@nvidia.com>
Reviewed-by: Rajkumar Kasirajan <rkasirajan@nvidia.com>
Tested-by: Rajkumar Kasirajan <rkasirajan@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-01 15:27:17 -07:00
Deepak Nibade
f1be222687 gpu: nvgpu: fix invalid TSG pointer
In gr_gp10b_set_cilp_preempt_pending() we already extract TSG pointer
by calling tsg_gk20a_from_ch() which safely returns correct TSG or
NULL in error case

But before calling g->ops.fifo.post_event_id() we again extract TSG
by directly accessing g->fifo.tsg array, and this could result in
getting invalid TSG pointer

Fix this by removing direct TSG extraction through g->fifo.tsg

Bug 2444819
Jira NVGPU-1601

Change-Id: I9d49b5309c74e162828e7cb7d97556aae939a07c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1984954
(cherry picked from commit dcd3778b5e)
Reviewed-on: https://git-master.nvidia.com/r/2077313
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-01 09:12:33 -07:00
Deepak Nibade
8282b72a04 gpu: nvgpu: fix channel reference leak in error case
In gr_gp10b_get_cilp_preempt_pending_chid(), we leak the channel
reference if tsg_gk20a_from_ch() returns NULL
Fix this by calling gk20a_channel_put() in error case

Bug 2444819
Jira NVGPU-1601

Change-Id: Ic5d036c6d043b0b95dd2a564afcc0add67c1ca02
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1984953
(cherry picked from commit 2322cb131c)
Reviewed-on: https://git-master.nvidia.com/r/2077312
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-01 09:12:25 -07:00
Peter Daifuku
9e329ca39b gpu: nvgpu: tsg: ensure unbound channel is disabled
Multiple threads could be unbinding different channels from
the same tsg at the same time. At the point where we
remove the channel from the tsg's channel list, call
disable_channel again, in case another thread had
re-enabled the channel after we had disabled it.

Bug 200404549

Change-Id: I9abbc08dc11fe1f7a0abada88376c0ef96b56610
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2083337
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-29 03:57:26 -07:00
Seema Khowala
e00804594b gpu: nvgpu: remove gk20a_is_channel_marked_as_tsg
Use tsg_gk20a_from_ch to get tsg pointer for tsgid of a channel. For
invalid tsgid, tsg pointer will be NULL

Bug 2092051
Bug 2429295
Bug 2484211

Change-Id: I82cd6a2dc5fab4acb147202af667ca97a2842a73
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2006722
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 13f37f9c70
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2025507
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-18 11:30:16 -07:00
Preetham Chandru R
77ee4144ce gpu: nvgpu: add compatibility version
Add compatibility version to page table and dma mapping structure.

Bug 200438879

Change-Id: I04b4601f71ae2b3e75843f39f5445ecca2b16677
Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2029086
(cherry picked from commit 8bbbd09caa)
Reviewed-on: https://git-master.nvidia.com/r/2071427
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-13 14:43:56 -07:00
Dmitry Pervushin
4269d56d02 nvgpu: more changes to clean loading/unloading
Bug 200487652

Change-Id: Ib52cc6a85a19ea0396c8ab584c5ce9970f93085a
Signed-off-by: Dmitry Pervushin <dpervushin@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2020386
(cherry picked from commit 617dff478c3687a08ed5b77f4ac2073b290c57ea)
Reviewed-on: https://git-master.nvidia.com/r/2035720
GVS: Gerrit_Virtual_Submit
Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-11 11:00:46 -07:00
Dmitry Pervushin
7c8d212b50 gpu: do not release managed resource
l->bar is a managed resource, it will be released automatically
Therefore, there is no need to explicitly unmap it

Bug 200487652

Change-Id: Ic543baa770d9cbcf7e7319281c4a27fab4b4b4df
Signed-off-by: dmitry pervushin <dpervushin@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2012324
GVS: Gerrit_Virtual_Submit
Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-11 10:58:11 -07:00
Seema Khowala
c9d4df288d gpu: nvgpu: remove code for ch not bound to tsg
- Remove handling for channels that are no more bound to tsg
  as channel could be referenceable but no more part of a tsg
- Use tsg_gk20a_from_ch to get pointer to tsg for a given channel
- Clear unhandled gr interrupts

Bug 2429295
JIRA NVGPU-1580

Change-Id: I9da43a2bc9a0282c793b9f301eaf8e8604f91d70
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1972492
(cherry picked from commit 013ca60edd
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2018262
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-22 18:59:18 -08:00
Seema Khowala
465aff5f0d gpu: nvgpu: do not use raw spinlock for ch->timeout.lock
With PREEMPT_RT kernel, regular spinlocks are mapped onto sleeping
spinlocks (rt_mutex locks), and raw spinlocks retain their behaviour.

Schedule while atomic can occur in gk20a_channel_timeout_start,
as it acquires ch->timeout.lock raw spinlock, and then calls
functions that acquire ch->ch_timedout_lock regular spinlock.

Bug 200484795

Change-Id: Iacc63195d8ee6a2d571c998da1b4b5d396f49439
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004100
(cherry picked from commit aacc33bb47
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2017923
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-18 06:02:00 -08:00
Konsta Holtta
5e440e63d6 gpu: nvgpu: abstract out timeout rewinding
The channel timeout ends up in a strange state during timeout handling
for a brief moment; it can become stopped and started again, and the
timeout lock is released in the middle. Add a more explicit rewind
function to reset the timeout to start if it's active. The active check
allows to use this from gk20a_channel_timeout_restart_all_channels(), so
that's also modified.

Also replace the return statements with more readable control flow in
gk20a_channel_timeout_handler().

Bug 200484795

Change-Id: Ia7d67242dfc149ace1f4f841a837e90b6c985308
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1989327
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
(cherry picked from commit 8979a97af3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2017922
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-18 06:01:57 -08:00
Preetham Chandru R
b5d13e16ae gpu: nvgpu: rename dma map/umap interfaces
On Desktop verion, map is called nvidia_p2p_dma_map_pages and umap is
called nvidia_p2p_dma_umap_pages. So renamed these two apis to match
the desktop version.

Bug 200438879

Change-Id: I66301c48b832dfed8c3950678f473c2f82b8761a
Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2014940
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-14 06:00:16 -08:00
Seema Khowala
cbf6394482 gpu: nvgpu: check ch_timedout for poll/restart
poll_timeouts and timeout_restart_all_channels should
only handle channels that have not been recovered/aborted.
Check ch_timedout status of the channel to make sure
channel is still alive to be used. A channel reference
could still be available even if it is recovered but not
closed.

Bug 2404865

Change-Id: I016c8b9952ef1d4c349c2a2a2ca55cb81326d380
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1929339
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit def687d4df
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/2016995
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:45 -08:00
Seema Khowala
f78918fd6c gpu: nvgpu: do not suspend/resume recovered channel
Already torn down channels should not be suspended or
resumed. A channel reference could still be available
even if it is recovered but not closed. Use ch_timedout
status to check if channel is already recovered/aborted.

Bug 2404865

Change-Id: I718eab6032ee94a9322da7a239a978b388de2b01
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1929338
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 88cff206ae
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2016994
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:41 -08:00
Seema Khowala
220860d043 gpu: nvgpu: rename has_timedout and make it thread safe
Currently has_timedout variable is protected by wmb at places
where it is being set and there is no correspoding rmb whenever
has_timedout variable is read. This is prone to errors for
concurrent execution. This change is supposed to fix this issue.
Rename has_timedout variable of channel struct to ch_timedout.
Also to avoid rmb every time ch_timedout is read,
ch_timedout_spinlock is added to protect ch_timedout
variable for taking care of concurrent execution.

Bug 2404865
Bug 2092051

Change-Id: I0bee9f50af0a48720aa8b54cbc3af97ef9f6df00
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1930935
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 1f54ea09e3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2016975
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:37 -08:00
Debarshi Dutta
18643ac135 gpu: nvgpu: replace input param chid with pointer to channel
preempt_channel needs to use the channel to pass it to other
public functions, get access to a tsg etc. This qualifies it to take a
pointer to a channel as an input parameter instead of a chid.

Increment the channel ref counter using the function
gk20a_channel_from_id in functions where we get the chid from the h/w
registers directly. Once the prempt_channel function call is done,
use a gk20a_channel_put on the referenced channel.

Jira NVGPU-1461

Change-Id: I6c87c8104cfcb418d468c8c590087fd4aeabf4bd
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1963200
(cherry picked from commit 9abe9fe062
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013728
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:47 -08:00
Debarshi Dutta
a8f0cb89f4 gpu: nvgpu: replace input param chid with pointer to channel
gk20a_fifo_recover_channel takes a reference to the channel via its
chid before passing the channel pointer to other public functions such
as gk20a_channel_abort and gk20a_fifo_error_ch. This qualifies the
gk20a_fifo_recover_channel to take a pointer to a channel instead of
only chid.

Jira NVGPU-1461

Change-Id: I338a12a05e5ccee785a202fea7848db5201a3a39
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1963199
(cherry picked from commit 99acb8011a
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013727
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:44 -08:00
Debarshi Dutta
d9efcd5871 gpu: nvgpu: replace input parameter tsgid with pointer to struct tsg_gk20a
The function gk20a_fifo_recover_tsg has to pass a valid struct tsg to
other functions from within. This qualifies it to have a pointer to
struct tsg_gk20a as an input parameter.

Tsg specific parts of the gk20a_fifo_preempt_timeout_rc are now moved
into another function gk20a_fifo_preempt_timeout_rc_tsg
that takes a tsg as an input and passes it to gk20a_fifo_recover_tsg.
The pointer to a tsg is also used to enumerate channels from within.

The function gk20a_fifo_preempt_timeout_rc now contains only channel
specific code.

Jira NVGPU-1461

Change-Id: Ice0a9921567841fb5586a7e4e010c442ca6cf172
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1961675
(cherry picked from commit e19cea7ab3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013726
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:40 -08:00
Debarshi Dutta
ef9de9e992 gpu: nvgpu: replace input parameter tsgid with pointer to struct tsg_gk20a
gv11b_fifo_preempt_tsg needs to access the runlist_id of the tsg as
well as pass the tsg pointer to other public functions such as
gk20a_fifo_disable_tsg_sched. This qualifies the preempt_tsg to use a
pointer to a struct tsg_gk20a instead of just using the tsgid.

Jira NVGPU-1461

Change-Id: I01fbd2370b5746c2a597a0351e0301b0f7d25175
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1959068
(cherry picked from commit 1e78d47f15
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/2013725
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:36 -08:00
Debarshi Dutta
5b8ecbc51f gpu: nvgpu: replace tsgid input variable with pointer to a struct tsg_gk20a
replace tsgid with a pointer to a struct tsg_gk20a in the function
gk20a_fifo_tsg_abort(). gk20a_fifo_tsg_abort needs to enumerate through
all the channels within the tsg as well as pass the tsg pointer to
other functions, qualifying the need to use a pointer instead as an
input parameter.

Jira NVGPU-1461

Change-Id: I59cec05d5d778f733d0c3e9ffadf46e74e249080
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1956567
(cherry picked from commit e5bebd880f
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013724
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:33 -08:00
Konsta Holtta
7e8ba851a8 gpu: nvgpu: delete raw chid lookup
This (dangerous) array lookup with no channel references is now unused.

Jira NVGPU-1460

Change-Id: Ic6bdbcf19fc8996bc6ff02a40afe3224bdd5bc27
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955402
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 4a53854a92 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008517
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:33 -08:00
Konsta Holtta
301a9d2426 gpu: nvgpu: store ch ptr in gr isr data
Store a channel pointer that is either NULL or a referenced channel to
avoid confusion about channel ownership. A pure channel ID is dangerous.

Jira NVGPU-1460

Change-Id: I6f7b4f80cf39abc290ce9153ec6bf5b62918da97
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955401
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 4e6d9afab8 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008516
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:24 -08:00
Konsta Holtta
3794afbeb1 gpu: nvgpu: add safe channel id lookup
Add gk20a_channel_from_id() to retrieve a channel, given a raw channel
ID, with a reference taken (or NULL if the channel was dead). This makes
it harder to mistakenly use a channel that's dead and thus uncovers bugs
sooner. Convert code to use the new lookup when applicable; work remains
to convert complex uses where a ref should have been taken but hasn't.

The channel ID is also validated against FIFO_INVAL_CHANNEL_ID; NULL is
returned for such IDs. This is often useful and does not hurt when
unnecessary.

However, this does not prevent the case where a channel would be closed
and reopened again when someone would hold a stale channel number. In
all such conditions the caller should hold a reference already.

The only conditions where a channel can be safely looked up by an id and
used without taking a ref are when initializing or deinitializing the
list of channels.

Jira NVGPU-1460

Change-Id: I0a30968d17c1e0784d315a676bbe69c03a73481c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955400
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 7df3d58750
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008515
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:20 -08:00
Philip Elcan
ed6e396090 gpu: nvgpu: channel: make chid u32
The chid member of the channel_gk20a struct was being used as a unsigned
value. By being declared as an int, it was causing MISRA 10.3 violations
for implicit assignment of different types.

JIRA NVGPU-647

Change-Id: I7477fad6f0c837cf7ede1dba803158b1dda717af
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1918470
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 1c7bb9b538 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008514
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:11 -08:00
Philip Elcan
bace52ac7a gpu: nvgpu: make tsgid a consistent type
Different units were declaring tsgid as int or u32. This makes everyone
use u32. This change resolves MISRA 10.3 violations for implicit
assingment to different types.

JIRA NVGPU-647

Change-Id: I78660e737acb0dad76dd538e5dd37f4527cf5acd
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1918469
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit f5cac144a0 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008513
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:02 -08:00
Konsta Holtta
aa84e8a986 gpu: nvgpu: fix double handling in timeout
The context switch timeout works by triggering a hardware timeout at 10
Hz. When handling these, we check whether a channel has actually timed
out. Currently the timeout limit can be shorter than the 10 Hz interval
which always causes us to recover a channel but would also cause
detection of progress if there was any in the interval.

Handling both situations at the same time would reuse the channel
pointer local to the function after a loop has finished and would cause
memory corruption. Fix this by making the two branches mutually
exclusive, and move the recover case to happen first because that's how
our tests assume things to work.

Jira NVGPU-967
Bug 2502074

Change-Id: I26aa0fa7fd80ab42a9a1a93a6cca2cd29c9d3f3f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1932449
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 8ac9a53d816a3d012a6948a9a96ac6db699c662di
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1997597
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 01:53:04 -08:00
Seema Khowala
bcac2a22a4 gpu: nvgpu: gm20b: clear priv intr in log_pending_intrs
Clear pending priv interrupt in log_pending_intrs. Priv
ring errors have not been cleaned up in gm20b. It is ok
to just clear it.

Bug 200477291
Bug 200486293

Change-Id: I850a261828a9d49b6b4a82d75f5347acbc17b0fe
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2008818
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-02 03:51:02 -08:00
Peter Daifuku
d39781054f gpu: nvgpu: allocate ctxsw buffers once only
In *_set_ctxsw_preemption_mode, only allocate
buffers the first time through.

Bug 200418468

Change-Id: I22d06463416615b9a9d671c32b6fe76b602a2623
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004301
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Satish Arora <satisha@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-01-30 22:18:04 -08:00
Preetham Chandru Ramchandra
61bb9dc403 gpu: nvgpu: nvgpu locks to vanilla Linux locks
Replace nvgpu locks to vanilla Linux locks. For the custom kernel
driver when they include nv-p2p.h, nvgpu/linux/lock.h will not be
available because nvgpu/linux/lock.h is not copied to
/usr/src/kernel_header_file.

Bug 200438879

Change-Id: I55b52c6f791970650388b7d51c4d30b5fe75bbb8
Signed-off-by: Preetham Chandru Ramchandra <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1997950
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit eb887094e4)
Reviewed-on: https://git-master.nvidia.com/r/2000831
2019-01-22 19:14:57 -08:00
Preetham Chandru Ramchandra
dbb014e34f gpu: nvgpu: move nv-p2p.h to include/linux
Move nv-p2p.h to include/linux so that it is
available to external kernel modules to be used.

Bug 200438879

Change-Id: I40707fe9f798b3ccf077dbdc942f8d6fc9019458
Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1986646
(cherry picked from commit cfe4a2e5e8)
Reviewed-on: https://git-master.nvidia.com/r/2000830
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-01-22 19:14:53 -08:00
Seema Khowala
89d5f40116 gpu: nvgpu: handle timestamp buffer full ctxsw_intr0
If enabled, fecs trace updating happens from ucode
side even when there is no fecs trace dumper application
to consume it. Due to this, trace buffer will get
eventually full and ucode will trigger ctxsw_intr0.
Reset fecs_trace buffer to handle timestamp buffer full
ctxsw_intr0.

Bug 2361571
Bug 200472922

Change-Id: Ia26a17635fc6bd6e8663b8af983acc91839ecfcd
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1965370
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit 790ba09554)
Reviewed-on: https://git-master.nvidia.com/r/1979746
GVS: Gerrit_Virtual_Submit
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bitan Biswas <bbiswas@nvidia.com>
2018-12-28 23:51:39 -08:00
Seema Khowala
8e2d0c7b3d gpu: nvgpu: add handling for ctxsw_intr0
ctxsw_intr0 is triggered by ucode even if it
is not enabled by driver. Add handling
for processing ctxsw_intr0. fecs mailbox(6)
is used to report fecs/gpccs misc error codes.
Also dump falcon stats for unhandled fecs intr.

Bug 2361571
Bug 200472922

Change-Id: Iefb3c0d46ad1d08db07fd3c08cff91a77835908c
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1966984
(cherry picked from commit 2c379cad0f
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1979745
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Bitan Biswas <bbiswas@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-12-28 23:51:36 -08:00
Seema Khowala
0d110b7522 gpu: nvgpu: clear all handled fifo interrupts
Issue is that local variable clear_intr is reset if fifo intr
handler happens to handle interrupts handled by fifo_error_isr.
This fix is to take care of clearing all handled fifo interrupts.

Bug 2361571

Change-Id: Ic8fe2294cfb25c58925942750a81c104ec9747de
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1960330
(cherry picked from commit 1195239d1c
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1979744
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bitan Biswas <bbiswas@nvidia.com>
Tested-by: Bitan Biswas <bbiswas@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-12-28 23:51:32 -08:00
Mahantesh Kumbar
bb9c8e4e29 gpu: nvgpu: remove unnecessary error print of falcon queue
-For queue full there is pmu_dbg message & returned with
 EAGAIN error to end caller for retry, so intermediate error
 message is not correct print for queue full.

Bug 200477085
Bug 200477931
Bug 200475876

Change-Id: I1109f15d0815f4ab2d8f8ca303db447d856f372c
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
(cherry picked from commit I263f66f7a8d8f1b98985f32f9daa49b09309c359)
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1979935
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Tested-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
2018-12-26 12:01:04 -08:00
Peter Daifuku
32672afbc0 nvgpu: pmu: cleanup init thread on destroy
In nvgpu_kill_task_pg_init(), call nvgpu_thread_join()
if the init thread is no longer running in order to
reclaim thread resources.

Bug 2452799
JIRA ESRM-437

Change-Id: Id9c67f689027f00039ac2df226ee9c28ad89dd1d
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1967983
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1970058
Reviewed-by: Shmuel Ungerfeld <sungerfeld@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Tested-by: Shmuel Ungerfeld <sungerfeld@nvidia.com>
Reviewed-by: Rahul Jain (SW-TEGRA) <rahuljain@nvidia.com>
2018-12-15 17:41:55 -08:00
Deepak
2d3e99067e gpu: nvgpu: vgpu: Get channel reference
- In vGPU code path, function vgpu_channel_abort_cleanup() does not
  obtain channel reference before using the channel structure
  (channel_gk20a)
- vgpu_channel_abort_cleanup() is called by vgpu_intr_thread() which
  runs commands obtained from interrupt queue from RM server.
- If there is a scenario where gk20a_channel_release() function runs
  before guest receives notification from RM server to abort channel
  cleanup, channel gets freed before vgpu_channel_abort_cleanup() runs.
- However, because vgpu_channel_abort_cleanup() does not take explicit
  reference to the channel, it ends up accessing structures
  (such as ch->g) which are set to NULL and thus we end up in a crash.
- This patch explicitly takes reference of channel before
  vgpu_channel_abort_cleanup() is called.
- If gk20a_channel_release() runs before vgpu_channel_abort_cleanup()
  and ends up freeing channel, we dont get reference to freed
  channel in vgpu_channel_abort_cleanup() and thus we return from
  function rather than continuing with freed channel as was the case
  previously.

Bug 200453473
JIRA EVLR-3411

Change-Id: I311043b2231336616b28246531cf8a0dc151b0cd
Signed-off-by: Deepak Bhosale <dbhosale@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1932028
(cherry picked from commit b91228e506)
Reviewed-on: https://git-master.nvidia.com/r/1970807
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Tested-by: Karl Ding <kding@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Nirav Patel <nipatel@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-12-12 20:45:36 -08:00
Preetham Chandru R
d6278955f6 gpu: nvgpu: RDMA implementation
This change adds RDMA supports for tegra iGPU.
1. Cuda Process allocates the memory and passes
   the VA and size to the custom kernel driver.
2. The custom kernel driver maps the user allocated
   buf and does the DMA to/from it.
3. Only supports iGPU + cudaHostAlloc sysmem
4. Works only for a given process.
5. Address should be sysmem page aligned and size should
   be multiple of sysmem page size.
6. The custom kernel driver must register a free_callback when get_page()
   function is called.

Bug 200438879

Signed-off-by: Preetham Chandru R <pchandru@nvidia.com>
Change-Id: I43ec45734eb46d30341d0701550206c16e051106
Reviewed-on: https://git-master.nvidia.com/r/1953780
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-29 14:36:30 -08:00
Sagar Kamble
b3bda98fbd gpu: nvgpu: disable/clear PMU IRQs on power off
While tearing down PMU state during power off, nvgpu doesn't disable
the PMU interrupts. Disable them unconditionally.

Bug 200457485

Change-Id: Ia2462d879c1e7bbb4b5e8295ce211c38567c13e5
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1939025
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1951361
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Bibek Basu <bbasu@nvidia.com>
2018-11-20 09:59:26 -08:00
Thomas Steinle
3686634b2a gpu: nvgpu: Add NVGPU_SUPPORT_GET_GPU_LOAD
Add a flag to show if NVGPU_GPU_IOCTL_GET_GPU_LOAD is supported

Bug 200421190

Change-Id: I59200b1a3dbbcc0d731d1e77597e163c61417a96
Signed-off-by: Thomas Steinle <tsteinle@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1919448
(cherry picked from commit 7cb9e3cc14d3cec7e4685bd56728dc0e61b1b700)
Reviewed-on: https://git-master.nvidia.com/r/1944689
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-20 09:58:52 -08:00
Alex Waterman
20ac0d74cf gpu: nvgpu: Fix comment in priv_cmd_buf allocation
Update the comment to fix obvious issues and describe the
new allocation logic.

Bug 2327792

Change-Id: Ica0dd4159467e3023cc487a2bf9f525db3ad76e6
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1831096
(cherry picked from commit c64f9432b1)
Reviewed-on: https://git-master.nvidia.com/r/1949221
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: James Norton <jnorton@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-16 15:12:34 -08:00
Alex Waterman
abcbb58fe4 gpu: nvgpu: Make priv_cmd_buf honor num_in_flight jobs
If num_in_flight jobs is set use that to determine the proper
size of the priv_cmd_buf. If num_in_flight is not set then use
the original logic: the priv_cmd_buf is sized based on a worst
case assumption for the GPFIFO.

Also clean up MISRA issues.

Bug 2327792

Change-Id: Ie192caeb6cc48fdcac57e5cbb71c534aeaf46011
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1831095
(cherry picked from commit b9ec592f1d)
Reviewed-on: https://git-master.nvidia.com/r/1949220
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: James Norton <jnorton@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-16 15:12:27 -08:00
Alex Waterman
96f2d4320a gpu: nvgpu: Use deterministic flag to decide pre-alloc
Instead of using num_inflight_jobs to determine whether to pre-alloc
resources for a channel use the c->deterministic flag and the
number of inflight jobs field. Non-determinsitic channels do not
require pre-alloced resources and deterministic channels with 0
in flight jobs (i.e no kernel job tracking, AKA fast path sumits)
also do not require pre-alloced resources.

Bug 2327792

Change-Id: I7e8eb0478c22e005ca2c46c555415afa0ded0be1
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1850123
(cherry picked from commit 05ec7b80eb)
Reviewed-on: https://git-master.nvidia.com/r/1949219
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: James Norton <jnorton@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-16 15:12:24 -08:00
Vince Hsu
9cdc1ccdaf gpu: nvgpu: re-initialize fw pointer when failed to load fw
When ACR and PMU BL fail to boot, the firmware are releasd, but the
firmware pointers are not re-initialized. That causes later invalid
pointer usage. Fix that by setting them as NULL.

Bug 200462464

Change-Id: Iacdf4b3c7f7144a77f595c77e6f5a29d35505672
Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1941671
(cherry picked from commit 3a87a0c998)
Reviewed-on: https://git-master.nvidia.com/r/1942950
GVS: Gerrit_Virtual_Submit
Reviewed-by: Siddardha Naraharisetti <siddardhan@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-07 10:18:55 -08:00
Vince Hsu
6099b07090 gpu: nvgpu: fix deadlock when ACR boot fails
The tpc_pg_lock is not released properly when ACR fails to boot, so
the subsequent runtime PM resume operation will block. And it in
turn also causes shutdown block due to pending runtime PM operations.

Bug 200462464

Change-Id: Ia28ac11e8a7bbd826cf5f90ba8f90b29d2a55baa
Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1941670
(cherry picked from commit 0bda191d7b)
Reviewed-on: https://git-master.nvidia.com/r/1942949
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Siddardha Naraharisetti <siddardhan@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-11-07 10:18:51 -08:00