Commit Graph

134 Commits

Author SHA1 Message Date
Anton Vorontsov
1c40d09c4c gpu: nvgpu: Add support for FECS ctxsw tracing
bug 1648908

This commit adds support for FECS ctxsw tracing. Code is compiled
conditionnaly under CONFIG_GK20_CTXSW_TRACE.
This feature requires an updated FECS ucode that writes one record to a ring
buffer on each context switch. On RM/Kernel side, the GPU driver reads records
from the master ring buffer and generates trace entries into a user-facing
VM ring buffer. For each record in the master ring buffer, RM/Kernel has
to retrieve the vmid+pid of the user process that submitted related work.

Features currently implemented:
- master ring buffer allocation
- debugfs to dump master ring buffer
- FECS record per context switch (with both current and new contexts)
- dedicated device for ctxsw tracing (access to VM ring buffer)
- SOF generation (and access to PTIMER)
- VM ring buffer allocation, and reconfiguration
- enable/disable tracing at user level
- event-based trace filtering
- context_ptr to vmid+pid mapping
- read system call for ctxsw dev
- mmap system call for ctxsw dev (direct access to VM ring buffer)
- poll system call for ctxsw dev
- save/restore register on ELPG/CG6
- separate user ring from FECS ring handling

Features requiring ucode changes:
- enable/disable tracing at FECS level
- actual busy time on engine (bug 1642354)
- master ring buffer threshold interrupt (P1)
- API for GPU to CPU timestamp conversion (P1)
- vmid/pid/uid based filtering (P1)

Change-Id: I8e39c648221ee0fa09d5df8524b03dca83fe24f3
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1022737
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-23 07:48:47 -07:00
Richard Zhao
97108797a2 gpu: nvgpu: enable semaphore acquire timeout only when timeouts_enabled is set
Bug 1727687

Change-Id: I7a7a4a2011b029474122fdbfbeb02b6302a5902b
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/1011486
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-22 10:19:23 -07:00
Aingara Paramakuru
2a58d3c27b gpu: nvgpu: improve channel interleave support
Previously, only "high" priority bare channels were interleaved
between all other bare channels and TSGs. This patch decouples
priority from interleaving and introduces 3 levels for interleaving
a bare channel or TSG: high, medium, and low. The levels define
the number of times a channel or TSG will appear on a runlist (see
nvgpu.h for details).

By default, all bare channels and TSGs are set to interleave level
low. Userspace can then request the interleave level to be increased
via the CHANNEL_SET_RUNLIST_INTERLEAVE ioctl (TSG-specific ioctl will
be added later).

As timeslice settings will soon be coming from userspace, the default
timeslice for "high" priority channels has been restored.

JIRA VFND-1302
Bug 1729664

Change-Id: I178bc1cecda23f5002fec6d791e6dcaedfa05c0c
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1014962
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-15 16:23:44 -07:00
Mahantesh Kumbar
6d585840ad gpu: nvgpu: Enable ELPG when disabled due to reset
Enable ELPG back whenever ELPG disable is done due to reset or recovery.
Otherwise elpg_refcnt mismatch doesn't engage ELPG correctly

Bug 200156347
Bug 1716764

Change-Id: I9284bb52b32fe911bb8eb260f138b616f4a564be
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: http://git-master/r/1020617
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-02-26 12:28:11 -08:00
Deepak Nibade
04f4f2334e gpu: nvgpu: separate API to issue preempt
Export separate API gk20a_fifo_issue_preempt() to issue
preempt request to a channel or TSG

Bug 200156699

Change-Id: Ib3b097ef66a6411d75c1fe213cdbe8b1d08d3418
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/935771
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-02-05 12:44:47 -08:00
Richard Zhao
42b0f49d42 gpu: nvgpu: let gk20a_fifo_preempt call gops callbacks
It fixed vgpu regression that vgpu tried to call native
channel preemption function.

Bug 1617046

Change-Id: Ia5a5486d8b95a34ca6ecc75f8d3b5fea76919405
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/935897
Tested-by: Damian Halas <dhalas@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
2016-01-22 11:01:08 -08:00
Richard Zhao
7095a72e56 gpu: nvgpu: fix tsg bugs
- correct runlist entry type for tsg
- consider tsg when preempt channel

Bug 1617046

Change-Id: Ie067df17fb53ae91c49403637a5f35fc3710e0b3
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/926571
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-19 08:34:55 -08:00
Deepak Nibade
ca76b336b3 gpu: nvgpu: support preprocessing of SM exceptions
Support preprocessing of SM exceptions if API
pointer pre_process_sm_exception() is defined

Also, expose some common APIs

Bug 200156699

Change-Id: I1303642c1c4403c520b62efb6fd83e95eaeb519b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925883
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-12 22:58:32 -08:00
Peter Pipkorn
2b064ce65e gpu: nvgpu: add high priority channel interleave
Interleave all high priority channels between all other channels.
This reduces the latency for high priority work when there
are a lot of lower priority work present, imposing an upper
bound on the latency. Change the default high priority timeslice
from 5.2ms to 3.0 in the process, to prevent long running high priority
apps from hogging the GPU too much.

Introduce a new debugfs node to enable/disable high priority
channel interleaving. It is currently enabled by default.

Adds new runlist length max register, used for allocating
suitable sized runlist.

Limit the number of interleaved channels to 32.

This change reduces the maximum time a lower priority job
is running (one timeslice) before we check that high priority
jobs are running.

Tested with gles2_context_priority (still passes)
Basic sanity testing is done with graphics_submit
(one app is high priority)

Also more functional testing using lots of parallel runs with:
NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw
 –drawsperframe 20000 –triangles 50 –runtime 30 –finish
plus multiple:
NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 -finish

Previous to this change, the relative performance between
high priority work and normal priority work comes down
to timeslice value. This means that when there are many
low priority channels, the high priority work will still
drop quite a lot. But with this change, the high priority
work will roughly get about half the entire GPU time, meaning
that after the initial lower performance, it is less likely
to get lower in performance due to more apps running on the system.

This change makes a large step towards real priority levels.
It is not perfect and there are no guarantees on anything,
but it is a step forwards without any additional CPU overhead
or other complications. It will also serve as a baseline to
judge other algorithms against.

Support for priorities with TSG is future work.
Support for interleave mid + high priority channels,
instead of just high, is also future work.

Bug 1419900

Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98
Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com>
Reviewed-on: http://git-master/r/805961
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-11 09:04:01 -08:00
Richard Zhao
a9c6f59539 gpu: nvgpu: enable semaphore acquire timeout
It'll detect dead semaphore acquire. The worst case is when
ACQUIRE_SWITCH is disabled, semaphore acquire will poll and
consume full gpu timeslicees.

The timeout value is set to half of channel WDT.

Bug 1636800

Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/928827
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
2016-01-10 20:07:53 -08:00
Seshendra Gadagottu
aeee97b059 Revert "gpu: nvgpu: Enable ELPG when disabled due to reset"
This reverts commit f6ab5bd17d16f3605b78c3c2ee80513d5823c594.

Fix for graphics_submit regresssion.

Bug 200164812

Change-Id: I5e37b8263758ee389cdba3ec6e3758afbdd9c910
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/929605
Tested-by: Hoang Pham <hopham@nvidia.com>
Reviewed-by: Hoang Pham <hopham@nvidia.com>
2016-01-07 10:58:03 -08:00
Mahantesh Kumbar
47bd35e153 gpu: nvgpu: Enable ELPG when disabled due to reset
Enable ELPG back whenever ELPG disable is done due to reset or recovery.
Otherwise elpg_refcnt mismatch doesn’t engage ELPG correctly

Bug 200156347

Change-Id: Ic01f85b9e1eff10cfb9cb180b50b045f67d4b33c
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: http://git-master/r/925763
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-04 10:04:29 -08:00
Deepak Nibade
c4ac1ed369 gpu: nvgpu: preempt before adjusting fences
Current sequence in gk20a_disable_channel() is
- disable channel in gk20a_channel_abort()
- adjust pending fence in gk20a_channel_abort()
- preempt channel

But this leads to scenarios where syncpoint has
min > max value

Hence to fix this, make sequence in gk20a_disable_channel()
- disable channel in gk20a_channel_abort()
- preempt channel in gk20a_channel_abort()
- adjust pending fence in gk20a_channel_abort()

If gk20a_channel_abort() is called from other API where
preemption is not needed, then use channel_preempt
flag and do not preempt channel in those cases

Bug 1683059

Change-Id: I4d46d4294cf8597ae5f05f79dfe1b95c4187f2e3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/921290
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-12-10 08:39:42 -08:00
Deepak Nibade
503d3a0b10 gpu: nvgpu: add fake mmu fault print
If we are handlling a fake mmu fault, then mention
so in the error print

Bug 1696529

Change-Id: Ife0106c3b83d18b315fc08dafadeca0129187624
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/828460
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-18 09:10:41 -08:00
Seshendra Gadagottu
19b3bd28b3 gpu: nvgpu: use platform data for ptimer source rate
Instead of depending on clock frame-work, use platform data
for ptimer source rate. Removed ptimerscaling10x platform
data, and use ptimer source frequency to calculate
ptimerscaling factor.

Reviewed-on: http://git-master/r/819030
(cherry picked from commit dd291334d54dab80cab7eb1656dffc48a59610b4)

Change-Id: I7638ce9875a6e440bbfc2ba2da0d0b094b2700ff
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/827300
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-04 14:38:34 -08:00
Deepak Nibade
9592a4e6fc gpu: nvgpu: IOCTL to set TSG timeslice
Add new IOCTL NVGPU_IOCTL_TSG_SET_PRIORITY to allow
setting timeslice for entire TSG

Return error from channel specific IOCTL_CHANNEL_SET_PRIORITY
if the channel is part of TSG

Separate out API gk20a_channel_get_timescale_from_timeslice()
to get timeslice_timeout and scale from timeslice period

Use this API to get timeslice_timeout and scale for TSG and
store it in tsg_gk20a structure

Then trigger runlist update so that new timeslice values
will be re-written to runlist for TSG

Bug 200146615

Change-Id: I555467d034f81b372b31372f0835d72b1c159508
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/824206
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-03 14:20:08 -08:00
Terje Bergstrom
c3b595178e gpu: nvgpu: Fix fake MMU fault for TSGs
When we induce a fake MMU fault, we do not have pointer to a channel.
Use the tsg pointer instead. Also remove the error print in case we
do not have ch pointer.

Change-Id: I14fd75d2b743244915bf32fe39de76097ef5c42f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/819034
2015-10-20 13:58:35 -07:00
Deepak Nibade
0269339256 gpu: nvgpu: restart timer instead of cancel
In gk20a_fifo_handle_sched_error(), we currently cancel
the timeout on all the channels
But this could cause us to miss one of stuck channel

hence, instead of cancelling, restart the timeout of channel
on which it is already active

Bug 200133289

Change-Id: I40e7e0e5394911fc110ab6fde39592b885dfaf7d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/816133
Reviewed-by: Ishan Mittal <imittal@nvidia.com>
Tested-by: Ishan Mittal <imittal@nvidia.com>
2015-10-19 23:52:50 -07:00
Deepak Nibade
68099f8298 gpu: nvgpu: fix pbdma intr handling
To handle any of the pbdma interrupt, we currently write zero
to pbdma_method0 and then clear the interrupt

But this is insufficient since we cannot use same intr clear
method for all the interrupts

Hence, add intr specific routines to handle those interrupts

NV_PPBDMA_INTR_0_PBENTRY:
- fix the pb_header to have a null opcode
- fix the pbdma_method to have a valid nop

NV_PPBDMA_INTR_0_METHOD:
- fix the pbdma_method to have a valid nop

NV_PPBDMA_INTR_0_DEVICE:
- fix the pb_header to have a null opcode
- go through all pbdma_method0/1/2/3
-- if they contain host s/w methods, replace those
   methods with a valid NOP

Bug 200134238

Change-Id: I10c284a6cdc1441f9d437cea65aae00d3c33a8c8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/814561
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-10-16 08:23:49 -07:00
Deepak Nibade
d01a0249c4 gpu: nvgpu: cancel all wdt timeouts while handling SCHED errors
A SCHED error might cause multiple channels' watchdogs to trigger
simultaneously

Hence, to avoid this conflict cancel watchdog timeout on all
channels before recovering from SCHED errors

Also, define API gk20a_channel_timeout_stop_all_channels()
to cancel wdt timeout on all channels

Bug 200133289

Change-Id: I8324c397891f0a711327b77d0677cd6718af6d01
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/810959
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-10-07 15:03:28 -07:00
Vijayakumar
d60a45b9fd gpu: nvgpu: scale ptimer based timeouts
bug 1603226

host based timeouts use ptimer for detecting
timeouts. on gk20a and gm20b ptimer runs 2.6x
slower. scale the fifo_eng_timeout to account
for this

Change-Id: Ie44718382953e36436ea47d6e89b9a225d5c2070
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/799983
(cherry picked from commit d1d837fd09ff0f035feff1757c67488404c23cc6)
Reviewed-on: http://git-master/r/808250
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-10-06 13:30:54 -07:00
Deepak Nibade
613990cb39 gpu: nvgpu: implement per-channel watchdog
Implement per-channel watchdog/timer as per below rules :
- start the timer while submitting first job on channel or if
  no timer is already running
- cancel the timer when job completes
- re-start the timer if there is any incomplete job left
  in the channel's queue
- trigger appropriate recovery method as part of timeout
  handling mechanism

Handle the timeout as per below :
- get timed out channel, and job data
- disable activity on all engines
- check if fence is really pending
- get information on failing engine
- if no engine is failing, just abort the channel
- if engine is failing, trigger the recovery

Also, add flag "ch_wdt_enabled" to enable/disable channel
watchdog mechanism. Watchdog can also be disabled using
global flag "timeouts_enabled"

Set the watchdog time to be 5s using macro
NVGPU_CHANNEL_WATCHDOG_DEFAULT_TIMEOUT_MS

Bug 200133289

Change-Id: I401cf14dd34a210bc429f31bd5216a361edf1237
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/797072
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-09-28 09:08:12 -07:00
Deepak Nibade
cb8c102131 gpu: nvgpu: APIs to disable/enable all engines' activity
Add below APIs to disable/re-enable activity on all
engines
gk20a_fifo_disable_all_engine_activity()
gk20a_fifo_enable_all_engine_activity()

Bug 200133289

Change-Id: Ie01a260d587807a3c1712ee32fe870fbcb08f9cd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/798747
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-09-28 09:07:46 -07:00
Vijayakumar
b8faddfe2a gpu: nvgpu: fix runlist update timeout handling
bug 1625901
1) disable ELPG before doing GR reset when runlist update times out
2) add mutex for GR reset to avoid multiple threads resetting GR
3) protect GR reset with FECS mutex so that no one else submits methods

Change-Id: I02993fd1eabe6875ab1c58a40a06e6c79fcdeeae
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/793643
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-09-16 09:44:00 -07:00
Deepak Nibade
0398e0751f gpu: nvgpu: separate API to get failing engine data
In gk20a_fifo_handle_sched_error(), we currently have a sequence
to identify failing engine (stuck on context switch) and
corresponding failing channel with its type

Separate out this sequence in new API
gk20a_fifo_get_failing_engine_data() so that it can be
reused from else where too

Bug 200133289

Change-Id: I3cef395170cf8990c014c7505c798fd6f2e37921
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/797070
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-09-11 08:46:06 -07:00
Vijayakumar
55c85cfa7b gpu: nvgpu: improve sched err handling
bug 200114561

1) when handling sched error, if CTXSW status reads switch
check FECS mailbox register to know whether next or current
channel caused error
2) Update recovery function to use ch id passed to it
3) Recovery function now passes mmu_engine_id to mmu fault
handler instead of fifo_engine_id

Change-Id: I3576cc4a90408b2f76b2c42cce19c27344531b1c
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/763538
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-07-17 01:16:48 -07:00
Rich Wiley
25b540e5c9 gpu: nvgpu: More verbose BAR1 failure message
Change-Id: Ie575aa3eeea8ebddf5778be0d03cf9744ec35540
Signed-off-by: Rich Wiley <rwiley@nvidia.com>
Reviewed-on: http://git-master/r/760860
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-06-26 11:07:54 -07:00
Amit Sharma (SW-TEGRA)
f877d0649c gpu: nvgpu: gk20a: Use NULL instead of integer '0'
Fixed the following sparse warning by using the proper 'NULL' instead of '0':
- fifo_gk20a.c: warning: Using plain integer as NULL pointer

Bug 200067946
Bug 200088648

Change-Id: I316b119e87b7203450ce85232398b43384ee16cc
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/755348
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-on: http://git-master/r/757050
2015-06-19 01:56:40 -07:00
Konsta Holtta
6085c90f49 gpu: nvgpu: add per-channel refcounting
Add reference counting for channels, and wait for reference count to
get to 0 in gk20a_channel_free() before actually freeing the channel.
Also, change free channel tracking a bit by employing a list of free
channels, which simplifies the procedure of finding available channels
with reference counting.

Each use of a channel must have a reference taken before use or held
by the caller. Taking a reference of a wild channel pointer may fail, if
the channel is either not opened or in a process of being closed. Also,
add safeguards for protecting accidental use of closed channels,
specifically, by setting ch->g = NULL in channel free. This will make it
obvious if freed channel is attempted to be used.

The last user of a channel might be the deferred interrupt handler,
so wait for deferred interrupts to be processed twice in the channel
free procedure: once for providing last notifications to the channel
and once to make sure there are no stale pointers left after referencing
to the channel has been denied.

Finally, fix some races in channel and TSG force reset IOCTL path,
by pausing the channel scheduler in gk20a_fifo_recover_ch() and
gk20a_fifo_recover_tsg(), while the affected engines have been identified,
the appropriate MMU faults triggered, and the MMU faults handled. In this
case, make sure that the MMU fault does not attempt to query the hardware
about the failing channel or TSG ids. This should make channel recovery
more safe also in the regular (i.e., not in the interrupt handler) context.

Bug 1530226
Bug 1597493
Bug 1625901
Bug 200076344
Bug 200071810

Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/448463
(cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353)
Reviewed-on: http://git-master/r/755147
Reviewed-by: Automatic_Commit_Validation_User
2015-06-09 11:13:43 -07:00
Deepak Nibade
0a99cb3559 gpu: nvgpu: fix invalid "reset initiated" prints
Improve the error path and return values from function
gk20a_fifo_handle_sched_error()
- return true, when we trigger the recovery in this function
- otherwise, return false

No need to reset the scheduler error register in this function,
since we anyway clear the interrupt in fifo_error_isr()

Also, fix the typo
"reset initated" -> "reset initiated"

Bug 200089043

Change-Id: Ibfc2fd2133982a268699d08682bcd6f044a3196a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/722711
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 19:02:55 -07:00
Deepak Nibade
38fc3a48a0 gpu: nvgpu: add platform specific get_iova_addr()
Add platform specific API pointer (*get_iova_addr)()
which can be used to get iova/physical address from
given scatterlist and flags

Use this API with g->ops.mm.get_iova_addr() instead
of calling API gk20a_mm_iova_addr() which makes it
platform specific

Bug 1605653

Change-Id: I798763db1501bd0b16e84daab68f6093a83caac2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713089
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 19:02:17 -07:00
Terje Bergstrom
1eded55286 gpu: nvgpu: Use gk20a_mem_phys instead of sg_phys
There were still a couple of places using sg_phys directly. Use new
gk20a_mem_phys() to make the code shorter.

Change-Id: I6eb9b14e0c14a27ec39bacd06ab24e31e99769ca
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/717502
2015-04-04 19:00:43 -07:00
Konsta Holtta
54defe2be6 gpu: nvgpu: print also fifo_intr in reset log
Print the interrupt code for diagnostics when logging the message about
channel reset in fifo_error_isr.

Change-Id: I44e9eb818af7264671f1c804d9a77b14c197457d
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/715804
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:59:44 -07:00
Terje Bergstrom
7290a6cbd5 gpu: nvgpu: Implement common allocator and mem_desc
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.

Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
2015-04-04 18:59:26 -07:00
Sam Payne
ce3afaaaf6 gpu: nvgpu: disable ce2 interrupts when unhandled
ce2 interrupts enabled only on gk20a and gm20b when
interrupts are handled through hal

Change-Id: Ib570db8f5f41e71e768b95e781153ec8a5d20015
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/677447
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:08:17 -07:00
Deepak Nibade
ccfea85b47 gpu: nvgpu: register dump in ctxsw timeout
Dump GR status registers in case of ctxsw timeout.

This is helpful in case where ctxsw timeout is encountered
during stress testing but we lose the bad state since we do
the recovery. So dump as much status as we can when timeouts
are seen

Bug 200062436

Change-Id: Ie7d320cefa7b272f2cc607cdb5c01ba1f43ba1f2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/708465
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:07:37 -07:00
Terje Bergstrom
bdb34abd93 gpu: nvgpu: Per-chip PBDMA signature
PBDMA HW signature depends on the chip.

Change-Id: If57d721d9bb77a090f967930a1aa2037bf4a16fe
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/672922
2015-04-04 18:07:37 -07:00
Naveen Kumar S
51b299c7dd gpu: nvgpu: reduce message severity to info
Shader informs user about context switch wait time.
This doesn't affect any functionality. Hence changing print to info.

bug 200015967

Change-Id: I7fbb562e43ee6ec1bc8ac01a51d3c9f19d5cb4cf
Signed-off-by: Naveen Kumar S <nkumars@nvidia.com>
Reviewed-on: http://git-master/r/662657
(cherry picked from commit 3a4d2022369f4bfc1701d6543226e01d7f6f8e0d)
Reviewed-on: http://git-master/r/671534
Reviewed-by: Venkat Moganty <vmoganty@nvidia.com>
Tested-by: Venkat Moganty <vmoganty@nvidia.com>
2015-04-04 18:07:26 -07:00
Supriya
3d9a83eb5a gpu: nvgpu: gk20a: FECS HALT method
FECS halt method is used to do graceful FECS shutdown.

Bug 1551865

Change-Id: Iec8590e86cb09f9b54c36f85859208fc8650f6a6
Signed-off-by: Supriya <ssharatkumar@nvidia.com>
Reviewed-on: http://git-master/r/682459
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:06:39 -07:00
Terje Bergstrom
226c671f8e gpu: nvgpu: More robust recovery
Make recovery a more straightforward process. When we detect a fault,
trigger MMU fault, and wait for it to trigger, and complete recovery.
Also reset engines before aborting channel to ensure no stray sync
point increments can happen.

Change-Id: Iac685db6534cb64fe62d9fb452391f43100f2999
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/709060
(cherry picked from commit 95c62ffd9ac30a0d2eb88d033dcc6e6ff25efd6f)
Reviewed-on: http://git-master/r/707443
2015-04-04 18:06:37 -07:00
Terje Bergstrom
2d71d633cf gpu: nvgpu: Physical page bits to be per chip
Retrieve number of physical page bits based on chip.

Bug 1567274

Change-Id: I5a0f6a66be37f2cf720d66b5bdb2b704cd992234
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/601700
2015-03-18 12:12:19 -07:00
Terje Bergstrom
c0668f05ea gpu: nvgpu: Retrieve intr & reset id from HW
Query interrupt number and reset id from HW. Use the number
from HW when enabling and detecting interrupts.

Bug 200036089
Bug 1567274

Change-Id: If9cb4db79a19dcb193ba7ad9db7081f4fe1ab433
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/600988
2015-03-18 12:12:14 -07:00
Terje Bergstrom
afc470e867 gpu: nvgpu: Do not call ELPG if disabled
Do not call PMU ELPG calls if ELPG should be disabled. Also skips
initialization of PMU ucode if PMU is disabled.

Bug 1567274

Change-Id: Ia9cd3b553c358142ee05a1b0e0832f9412f7cf17
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/593335
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
2015-03-18 12:12:02 -07:00
Deepak Nibade
b3f575074b gpu: nvgpu: fix sparse warnings
Fix below sparse warnings :

warning: Using plain integer as NULL pointer
warning: symbol <variable/funcion> was not declared. Should it be static?
warning: Initializer entry defined twice

Also, remove dead functions

Bug 1573254

Change-Id: I29d71ecc01c841233cf6b26c9088ca8874773469
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/593363
Reviewed-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-03-18 12:12:01 -07:00
Vijayakumar
3c6a6376de gpu: nvgpu: disable cg in mmu error handler
With CG enabled sometimes fifo could not be idled
during firmware load.

Bug 200042729

Change-Id: I43d7551c0c7c19314c52ac5f678afed8c6df6415
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/559077
Reviewed-by: Automatic_Commit_Validation_User
2015-03-18 12:11:58 -07:00
Sam Payne
8c6a9fd115 Revert "gpu: nvgpu: GR and LTC HAL to use const structs"
This reverts commit 41b82e97164138f45fbdaef6ab6939d82ca9419e.

Change-Id: Iabd01fcb124e0d22cd9be62151a6552cbb27fc94
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/592221
Tested-by: Hoang Pham <hopham@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mitch Luban <mluban@nvidia.com>
2015-03-18 12:11:56 -07:00
Terje Bergstrom
2d5ff668cb gpu: nvgpu: GR and LTC HAL to use const structs
Convert GR and LTC HALs to use const structs, and initialize them
with macros.

Bug 1567274

Change-Id: Ia3f24a5eccb27578d9cba69755f636818d11275c
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/590371
2015-03-18 12:11:54 -07:00
Terje Bergstrom
8ee725777d gpu: nvgpu: Improve error handing in fifo
When initializing fifo, we ignore several error conditions. Add
checks for them.

Change-Id: Id67f3ea51e3d4444b61a3be19553a5541b1d1e3a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/553269
2015-03-18 12:11:40 -07:00
Konsta Holtta
eab87c7afe gpu: nvgpu: dump falcon stats in mmu fault handler
If engine status is in context switch in the fifo mmu fault handler,
dump falcon stats and gr stats for each engine.

Bug 1544766

Change-Id: Idfa9772b7e67072941144ac3bdd73e791fdc2b23
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/553205
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:39 -07:00
Konsta Holtta
719923ad9f gpu: nvgpu: rename gpu ioctls and structs to nvgpu
To help remove the nvhost dependency from nvgpu, rename ioctl defines
and structures used by nvgpu such that nvhost is replaced by nvgpu.
Duplicate some structures as needed.

Update header guards and such accordingly.

Change-Id: Ifc3a867713072bae70256502735583ab38381877
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/542620
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:33 -07:00