Commit Graph

109 Commits

Author SHA1 Message Date
Vijayakumar
55c85cfa7b gpu: nvgpu: improve sched err handling
bug 200114561

1) when handling sched error, if CTXSW status reads switch
check FECS mailbox register to know whether next or current
channel caused error
2) Update recovery function to use ch id passed to it
3) Recovery function now passes mmu_engine_id to mmu fault
handler instead of fifo_engine_id

Change-Id: I3576cc4a90408b2f76b2c42cce19c27344531b1c
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/763538
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-07-17 01:16:48 -07:00
Rich Wiley
25b540e5c9 gpu: nvgpu: More verbose BAR1 failure message
Change-Id: Ie575aa3eeea8ebddf5778be0d03cf9744ec35540
Signed-off-by: Rich Wiley <rwiley@nvidia.com>
Reviewed-on: http://git-master/r/760860
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-06-26 11:07:54 -07:00
Amit Sharma (SW-TEGRA)
f877d0649c gpu: nvgpu: gk20a: Use NULL instead of integer '0'
Fixed the following sparse warning by using the proper 'NULL' instead of '0':
- fifo_gk20a.c: warning: Using plain integer as NULL pointer

Bug 200067946
Bug 200088648

Change-Id: I316b119e87b7203450ce85232398b43384ee16cc
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/755348
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-on: http://git-master/r/757050
2015-06-19 01:56:40 -07:00
Konsta Holtta
6085c90f49 gpu: nvgpu: add per-channel refcounting
Add reference counting for channels, and wait for reference count to
get to 0 in gk20a_channel_free() before actually freeing the channel.
Also, change free channel tracking a bit by employing a list of free
channels, which simplifies the procedure of finding available channels
with reference counting.

Each use of a channel must have a reference taken before use or held
by the caller. Taking a reference of a wild channel pointer may fail, if
the channel is either not opened or in a process of being closed. Also,
add safeguards for protecting accidental use of closed channels,
specifically, by setting ch->g = NULL in channel free. This will make it
obvious if freed channel is attempted to be used.

The last user of a channel might be the deferred interrupt handler,
so wait for deferred interrupts to be processed twice in the channel
free procedure: once for providing last notifications to the channel
and once to make sure there are no stale pointers left after referencing
to the channel has been denied.

Finally, fix some races in channel and TSG force reset IOCTL path,
by pausing the channel scheduler in gk20a_fifo_recover_ch() and
gk20a_fifo_recover_tsg(), while the affected engines have been identified,
the appropriate MMU faults triggered, and the MMU faults handled. In this
case, make sure that the MMU fault does not attempt to query the hardware
about the failing channel or TSG ids. This should make channel recovery
more safe also in the regular (i.e., not in the interrupt handler) context.

Bug 1530226
Bug 1597493
Bug 1625901
Bug 200076344
Bug 200071810

Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/448463
(cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353)
Reviewed-on: http://git-master/r/755147
Reviewed-by: Automatic_Commit_Validation_User
2015-06-09 11:13:43 -07:00
Deepak Nibade
0a99cb3559 gpu: nvgpu: fix invalid "reset initiated" prints
Improve the error path and return values from function
gk20a_fifo_handle_sched_error()
- return true, when we trigger the recovery in this function
- otherwise, return false

No need to reset the scheduler error register in this function,
since we anyway clear the interrupt in fifo_error_isr()

Also, fix the typo
"reset initated" -> "reset initiated"

Bug 200089043

Change-Id: Ibfc2fd2133982a268699d08682bcd6f044a3196a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/722711
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 19:02:55 -07:00
Deepak Nibade
38fc3a48a0 gpu: nvgpu: add platform specific get_iova_addr()
Add platform specific API pointer (*get_iova_addr)()
which can be used to get iova/physical address from
given scatterlist and flags

Use this API with g->ops.mm.get_iova_addr() instead
of calling API gk20a_mm_iova_addr() which makes it
platform specific

Bug 1605653

Change-Id: I798763db1501bd0b16e84daab68f6093a83caac2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713089
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 19:02:17 -07:00
Terje Bergstrom
1eded55286 gpu: nvgpu: Use gk20a_mem_phys instead of sg_phys
There were still a couple of places using sg_phys directly. Use new
gk20a_mem_phys() to make the code shorter.

Change-Id: I6eb9b14e0c14a27ec39bacd06ab24e31e99769ca
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/717502
2015-04-04 19:00:43 -07:00
Konsta Holtta
54defe2be6 gpu: nvgpu: print also fifo_intr in reset log
Print the interrupt code for diagnostics when logging the message about
channel reset in fifo_error_isr.

Change-Id: I44e9eb818af7264671f1c804d9a77b14c197457d
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/715804
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:59:44 -07:00
Terje Bergstrom
7290a6cbd5 gpu: nvgpu: Implement common allocator and mem_desc
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.

Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
2015-04-04 18:59:26 -07:00
Sam Payne
ce3afaaaf6 gpu: nvgpu: disable ce2 interrupts when unhandled
ce2 interrupts enabled only on gk20a and gm20b when
interrupts are handled through hal

Change-Id: Ib570db8f5f41e71e768b95e781153ec8a5d20015
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/677447
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:08:17 -07:00
Deepak Nibade
ccfea85b47 gpu: nvgpu: register dump in ctxsw timeout
Dump GR status registers in case of ctxsw timeout.

This is helpful in case where ctxsw timeout is encountered
during stress testing but we lose the bad state since we do
the recovery. So dump as much status as we can when timeouts
are seen

Bug 200062436

Change-Id: Ie7d320cefa7b272f2cc607cdb5c01ba1f43ba1f2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/708465
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:07:37 -07:00
Terje Bergstrom
bdb34abd93 gpu: nvgpu: Per-chip PBDMA signature
PBDMA HW signature depends on the chip.

Change-Id: If57d721d9bb77a090f967930a1aa2037bf4a16fe
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/672922
2015-04-04 18:07:37 -07:00
Naveen Kumar S
51b299c7dd gpu: nvgpu: reduce message severity to info
Shader informs user about context switch wait time.
This doesn't affect any functionality. Hence changing print to info.

bug 200015967

Change-Id: I7fbb562e43ee6ec1bc8ac01a51d3c9f19d5cb4cf
Signed-off-by: Naveen Kumar S <nkumars@nvidia.com>
Reviewed-on: http://git-master/r/662657
(cherry picked from commit 3a4d2022369f4bfc1701d6543226e01d7f6f8e0d)
Reviewed-on: http://git-master/r/671534
Reviewed-by: Venkat Moganty <vmoganty@nvidia.com>
Tested-by: Venkat Moganty <vmoganty@nvidia.com>
2015-04-04 18:07:26 -07:00
Supriya
3d9a83eb5a gpu: nvgpu: gk20a: FECS HALT method
FECS halt method is used to do graceful FECS shutdown.

Bug 1551865

Change-Id: Iec8590e86cb09f9b54c36f85859208fc8650f6a6
Signed-off-by: Supriya <ssharatkumar@nvidia.com>
Reviewed-on: http://git-master/r/682459
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-04-04 18:06:39 -07:00
Terje Bergstrom
226c671f8e gpu: nvgpu: More robust recovery
Make recovery a more straightforward process. When we detect a fault,
trigger MMU fault, and wait for it to trigger, and complete recovery.
Also reset engines before aborting channel to ensure no stray sync
point increments can happen.

Change-Id: Iac685db6534cb64fe62d9fb452391f43100f2999
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/709060
(cherry picked from commit 95c62ffd9ac30a0d2eb88d033dcc6e6ff25efd6f)
Reviewed-on: http://git-master/r/707443
2015-04-04 18:06:37 -07:00
Terje Bergstrom
2d71d633cf gpu: nvgpu: Physical page bits to be per chip
Retrieve number of physical page bits based on chip.

Bug 1567274

Change-Id: I5a0f6a66be37f2cf720d66b5bdb2b704cd992234
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/601700
2015-03-18 12:12:19 -07:00
Terje Bergstrom
c0668f05ea gpu: nvgpu: Retrieve intr & reset id from HW
Query interrupt number and reset id from HW. Use the number
from HW when enabling and detecting interrupts.

Bug 200036089
Bug 1567274

Change-Id: If9cb4db79a19dcb193ba7ad9db7081f4fe1ab433
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/600988
2015-03-18 12:12:14 -07:00
Terje Bergstrom
afc470e867 gpu: nvgpu: Do not call ELPG if disabled
Do not call PMU ELPG calls if ELPG should be disabled. Also skips
initialization of PMU ucode if PMU is disabled.

Bug 1567274

Change-Id: Ia9cd3b553c358142ee05a1b0e0832f9412f7cf17
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/593335
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
2015-03-18 12:12:02 -07:00
Deepak Nibade
b3f575074b gpu: nvgpu: fix sparse warnings
Fix below sparse warnings :

warning: Using plain integer as NULL pointer
warning: symbol <variable/funcion> was not declared. Should it be static?
warning: Initializer entry defined twice

Also, remove dead functions

Bug 1573254

Change-Id: I29d71ecc01c841233cf6b26c9088ca8874773469
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/593363
Reviewed-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-03-18 12:12:01 -07:00
Vijayakumar
3c6a6376de gpu: nvgpu: disable cg in mmu error handler
With CG enabled sometimes fifo could not be idled
during firmware load.

Bug 200042729

Change-Id: I43d7551c0c7c19314c52ac5f678afed8c6df6415
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/559077
Reviewed-by: Automatic_Commit_Validation_User
2015-03-18 12:11:58 -07:00
Sam Payne
8c6a9fd115 Revert "gpu: nvgpu: GR and LTC HAL to use const structs"
This reverts commit 41b82e97164138f45fbdaef6ab6939d82ca9419e.

Change-Id: Iabd01fcb124e0d22cd9be62151a6552cbb27fc94
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/592221
Tested-by: Hoang Pham <hopham@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mitch Luban <mluban@nvidia.com>
2015-03-18 12:11:56 -07:00
Terje Bergstrom
2d5ff668cb gpu: nvgpu: GR and LTC HAL to use const structs
Convert GR and LTC HALs to use const structs, and initialize them
with macros.

Bug 1567274

Change-Id: Ia3f24a5eccb27578d9cba69755f636818d11275c
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/590371
2015-03-18 12:11:54 -07:00
Terje Bergstrom
8ee725777d gpu: nvgpu: Improve error handing in fifo
When initializing fifo, we ignore several error conditions. Add
checks for them.

Change-Id: Id67f3ea51e3d4444b61a3be19553a5541b1d1e3a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/553269
2015-03-18 12:11:40 -07:00
Konsta Holtta
eab87c7afe gpu: nvgpu: dump falcon stats in mmu fault handler
If engine status is in context switch in the fifo mmu fault handler,
dump falcon stats and gr stats for each engine.

Bug 1544766

Change-Id: Idfa9772b7e67072941144ac3bdd73e791fdc2b23
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/553205
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:39 -07:00
Konsta Holtta
719923ad9f gpu: nvgpu: rename gpu ioctls and structs to nvgpu
To help remove the nvhost dependency from nvgpu, rename ioctl defines
and structures used by nvgpu such that nvhost is replaced by nvgpu.
Duplicate some structures as needed.

Update header guards and such accordingly.

Change-Id: Ifc3a867713072bae70256502735583ab38381877
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/542620
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:33 -07:00
Terje Bergstrom
d65f23cb9a gpu: nvgpu: Support 512 channels in gm20b
Retrieve channel count from gm20b specific header instead of the
gk20a header. This increases channel count from 128 to 512.

Change-Id: I96d4887432852795f7f526e33f0d3d2458f3af0e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/500623
2015-03-18 12:11:23 -07:00
Terje Bergstrom
fc7280ae91 gpu: nvgpu: Clear invalid method
Invalid method needs to be cleared in gm20b to prevent getting same
interrupt again.

Change-Id: I4d83d1a27e5c711b5d82b95552be84d5f16a13e0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/500286
2015-03-18 12:11:20 -07:00
Terje Bergstrom
4a63e80a26 gpu: nvgpu: Use polling to detect runlist switch
Runlist event is not sent in gm20b for updated runlist. Polling is
the preferred way also for gk20a.

Bug 1555239

Change-Id: I60de084db69f848f63451f1f3078f183ca51ba50
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/500241
2015-03-18 12:11:19 -07:00
Deepak Nibade
1c7dcfdeef gpu: nvgpu: use TSG recover API
Use TSG specific API gk20a_fifo_recover_tsg() in following cases :
- IOCTL_CHANNEL_FORCE_RESET
  to force reset a channel in TSG, reset all the channels
- handle pbdma intr
  while resetting in case of pbdma intr, if channel is part of
  TSG, recover entire TSG
- TSG preempt failure
  when TSG preempt times out, use TSG recover API

Use preempt_tsg() API to preempt if channel is part of TSG

Add below two generic APIs which will take care of preempting/
recovering either of channel or TSG as required
gk20a_fifo_preempt()
gk20a_fifo_force_reset_ch()

Bug 1470692

Change-Id: I8d46e252af79136be85a9a2accf8b51bd924ca8c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/497875
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:17 -07:00
Deepak Nibade
2f232348e6 gpu: nvgpu: handle MMU fault for TSG
- add support to handle MMU faults on a channel in TSG
- first get the ID and type of channel that engine is running
- if TSG, abort each channel in it
- if regular channel, abort that channel

- also, add two versions of API set_ctx_mmu_error(), one for
  regular channel and another for TSG

Bug 1470692

Change-Id: Ia7b01b81739598459702ed172180adb00e345eba
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/497874
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:17 -07:00
Deepak Nibade
e4a7bc1602 gpu: nvgpu: add API to recover TSG
- add and export API "gk20a_fifo_recover_tsg()" to
  recover a TSG
- if TSG is running on any engine, then trigger MMU fault
  on those engines
- otherwise, abort each channel in TSG

- modify channel specific API engines_on_ch() to generic
  engines_on_id() which will take an ID and a flag to specify
  whether ID is for channel or TSG and return engines running
  on that ID

- modify channel specific API get_faulty_channel() to generic
  get_faulty_id_type() which will take pointers to ID and type
  of ID (either a regular channel or TSG)

- remove runlist update from recover_ch() since
  no need to touch runlist during recovery

- set error notifier first and then only abort the channels
  for TSG recovery path

- also, add necessary accessors to get engine
  status type as TSG

Bug 1470692

Change-Id: I7137f611f80916b3d256d4b0dc6e5cf1e93eef6f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/497873
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:17 -07:00
Deepak Nibade
f9cb1a93d1 gpu: nvgpu: do not bind already active channels to TSG
If a channel is already scheduled as regular channel, we should not
allow it to be marked as TSG since it will fail book keeping of
number of active channels in a TSG

This way we can force to bind the channels first and then only
make them active

Also, remove duplicate function declaration added during branch merge
and one unnecessary comparison with zero

Bug 1470692

Change-Id: I88f9678919e4b76de472c6dda21e4537520241c4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/497903
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:14 -07:00
Deepak Nibade
48e19c6c28 gpu: nvgpu: add API to preempt TSG
Add API gk20a_fifo_preempt_tsg() which takes ID of tsg
and preempts it

Bug 1514064
Bug 1470692

Change-Id: I1d52c1dd7a9aecc1314b0f223fe4eedecc033629
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/495583
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:13 -07:00
Terje Bergstrom
8be2f2bf4c gpu: nvgpu: gm20b: Regenerate clock gating lists
Regenerate clock gating lists. Add new blocks, and takes them into
use. Also moves some clock gating settings to be applied at the
earliest possible moment right after reset.

Change-Id: I21888186c200f7a477c63bd3332e8ed578f63741
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/457698
2015-03-18 12:11:09 -07:00
Aingara Paramakuru
1fd722f592 gpu: nvgpu: support gk20a virtualization
The nvgpu driver now supports using the Tegra graphics virtualization
interfaces to support gk20a in a virtualized environment.

Bug 1509608

Change-Id: I6ede15ee7bf0b0ad8a13e8eb5f557c3516ead676
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/440122
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:11:01 -07:00
Terje Bergstrom
154360e1a2 gpu: nvgpu: Set PB timeout only in gk20a
PB timeout has been removed in gm20b, so write it only in gk20a.

Change-Id: I2aab92fe7d1d5de151dad768f8b3f6901ec0bbb0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/486358
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Kevin Huang (Eng-SW) <kevinh@nvidia.com>
2015-03-18 12:10:59 -07:00
Kevin Huang
87373abc95 gpu: nvgpu: gm20b: use gpc_mmu to check debug mode
Bug 1534793

Change-Id: I8a4c35914b58dd13a7c10c668de9d4662d947d8c
Signed-off-by: Kevin Huang <kevinh@nvidia.com>
Reviewed-on: http://git-master/r/441377
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:53 -07:00
Terje Bergstrom
92c9a2d06e gpu: nvgpu: Increase PBDMA timeout
PBDMA timeout can cause stale data in FIFO. Default value equals 1ms.
Increase it to max.

Bug 1537636

Change-Id: I1c6c6b10abaece3a64b77b9b3ef77ff726ff67cf
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/457047
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Allen Chang <allchang@nvidia.com>
Tested-by: Allen Chang <allchang@nvidia.com>
2015-03-18 12:10:52 -07:00
Terje Bergstrom
5dea7c4729 gpu: nvgpu: Set error notifier on PBDMA error
Change-Id: Idf1261fe6561477f5dceea54de63326ef8a4a1b3
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/455041
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
2015-03-18 12:10:49 -07:00
Deepak Nibade
ae3ba04955 gpu: nvgpu: verify runnable channel count in TSG
In runlist we first write channel count in TSG entry and then
follow those many channel entries
If no. of channel entries does not match to count then it is
considered as error

To detect this, add a counter while adding channel entries and
give warning if channel count does not match with this counter

bug 1470692

Change-Id: I4bbfd9b696fbfafa25dffb27979373f057a7f35a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/449228
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:46 -07:00
Deepak Nibade
478e659ae4 gpu: nvgpu: do not touch runlist during recovery
Currently we clear the runlist and re-create it in
scheduled work during fifo recovery process

But we can post-pone this runlist re-generation for
later time i.e. when channel is closed

Hence, remove runlist locks and re-generation from
handle_mmu_fault() methods. Instead of that, disable
gr fifo access at start of recovery and re-enable
it at end of recovery process.

Also, delete scheduled work to re-create runlist.
Re-enable EPLG and fifo access in
finish_mmu_fault_handling() itself.

bug 1470692

Change-Id: I705a6a5236734c7207a01d9a9fa9eca22bdbe7eb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/449225
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:46 -07:00
Deepak Nibade
76993ba18c gpu: nvgpu: rework TSG's channel list
Modify TSG's channel list as "ch_list" for all channels
instead of "ch_runnable_list" for only runnable list
We can traverse this list and check runnable status of
channel in active_channels to get runnable channels

Remove below APIs as they are no longer required :
gk20a_bind_runnable_channel_to_tsg()
gk20a_unbind_channel_from_tsg()

While closing the channel, call gk20a_tsg_unbind_channel()
to unbind the channel from TSG

bug 1470692

Change-Id: I0178fa74b3e8bb4e5c0b3e3b2b2f031491761ba7
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/449227
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:45 -07:00
Deepak Nibade
a20bb7dde2 gpu: nvgpu: fix error handling for mutex_acquire()
Currently if pmu_mutex_acquire() fails, we disable ELPG
and move ahead. But it is not clear why it is required
to disable ELPG in case where we fail to acquire mutex.

Hence skip disabling ELPG if mutex_acquire() fails

Bug 1533644

Change-Id: I7e8e99a701d0ba071eb31ac17582b04072ee55eb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/448131
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:41 -07:00
Arto Merilainen
d608aa53ee Revert "gpu: nvgpu: Dump offending push buffer fragment"
Channel and gpfifo allocations are entirely separated from each
other, however, the code here assumes that active channel means
that the channel also has a gpfifo.

This reverts commit a24602f094380539788696d1b1567a4f4d914b17 which
added gpfifo dump. Changing debug dumping to be safe requires
refactoring the channel release code to use proper locking.

Bug 1530226

Change-Id: I2fb02542a17dd56a0a9ce732b327e34b85ade8b9
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/434038
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2015-03-18 12:10:24 -07:00
Terje Bergstrom
e2638d73fd gpu: nvgpu: Wait for idle via FIFO registers
Wait for engine idle via FIFO's engine status instead of submitting
WFI to channel. Submitting WFI and waiting is not robust, and wait
might invoke debug dump which cannot be done while powering down.

Bug 1499214

Change-Id: I4d52e8558e1a862ad4292036594d81ebfbd5f36b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/432151
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
2015-03-18 12:10:20 -07:00
Deepak Nibade
b6466fbe07 gpu: nvgpu: add TSG support to runlists
- when a TSG channel is made runnable, add it to TSG's
  runnable list
- when a TSG channel is removed from runlist, remove it
  from TSG's runnable list

When we rewrite the entire runlist :
- first add all the channels which are not part of any TSG
- then find all active TSGs, add an entry in runlist for the TSG
  (with TSG id and length of TSG)
- then write entries for each channel in that TSG

Bug 1470692

Change-Id: Ic55a4d5959abc72cd20b8224eb4c31d3ff411861
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/416612
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:16 -07:00
Deepak Nibade
e6eb4b59f6 gpu: nvgpu: add kernel APIs for TSG support
Add support to create/destroy TSGs using node "/dev/nvhost-tsg-gpu"

Provide below IOCTLs to bind/unbind channels to/from TSGs :

NVGPU_TSG_IOCTL_BIND_CHANNEL
NVGPU_TSG_IOCTL_UNBIND_CHANNEL

Bug 1470692

Change-Id: Iaf9f16a522379eb943906624548f8d28fc6d4486
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/416610
2015-03-18 12:10:16 -07:00
Terje Bergstrom
c32ac10b0b gpu: nvgpu: Dump offending push buffer fragment
When outputting debug dump, print the contents of current push buffer
segment.

Also changes the debug dump to use pr_cont when applicable, and dumps
state before recovering in case channel was not loaded to an engine.

Bug 1498688

Change-Id: I5ca12f64bae8f12333d82350278c700645d5007e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/422198
2015-03-18 12:10:13 -07:00
Alex Waterman
bea4bb915a gpu: nvgpu: Implement L2 flush in fifo recovery
Implement a full L2 flush (clean and invalidate) for gm20b in
the fifo recovery path.

Bug 1512176

Change-Id: Ibf89ede9cca65a6868ebff89825869053302a007
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/416435
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:11 -07:00
Lauri Peltonen
f575bc6676 gpu: nvgpu: Support semaphore sync when aborting jobs
When aborting jobs on channel error situations, we manually set
the channel syncpoint's min == max in gk20a_disable_channel_no_update.
Nvhost will notice this manual syncpoint increment, and will call back
to gk20a_channel_update, which will clean up the job.

With semaphore synchronization, we don't have anybody calling back to
gk20a_channel_update, so we need to call it ourselves. Release job
semaphores (the equivalent of set_min_eq_max) on
gk20a_disable_channel_no_update, and if any semaphores were released,
call gk20a_channel_update afterwards.

Because we are actually calling gk20a_channel_update in some situations,
gk20a_disable_channel_no_update is no longer an appropriate name for the
function. Rename it to gk20a_channel_abort.

Bug 1450122

Change-Id: I1267b099a5778041cbc8e91b7184844812145b93
Signed-off-by: Lauri Peltonen <lpeltonen@nvidia.com>
Reviewed-on: http://git-master/r/422161
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-03-18 12:10:10 -07:00