In pmu_mutex_acquire(), we return zero (success) if
pmu->initialized is not set
Since mutex_acquire() was successful, we then call
pmu_mutex_release()
If now pmu->initialized is set in some other thread
then we proceed to validate the mutex owner and
end up causing below warning :
pmu_mutex_release: requester 0x00000000 NOT match owner 0x00000008
Hence to fix this return error from mutex_acquire()
and mutex_release() if pmu->initialized is not yet set
and in that case we proceed to call elpg enable/disable
Bug 1533644
Change-Id: Ifbb9e6a8e13f6478a13e3f9d98ced11792cc881f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/439333
GVS: Gerrit_Virtual_Submit
Reviewed-by: Naveen Kumar S <nkumars@nvidia.com>
Tested-by: Naveen Kumar S <nkumars@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
gm20b_ltc_cbc_ctrl sent the wrong register value to
clear the CBC.
Bug 1507804
Change-Id: Ib0d867a122466e50cb15fef3b320fb2ee8455ef2
Signed-off-by: Wei Sun <wsun@nvidia.com>
Reviewed-on: http://git-master/r/435297
Reviewed-by: Kevin Huang (Eng-SW) <kevinh@nvidia.com>
Tested-by: Kevin Huang (Eng-SW) <kevinh@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Add a way to force idle and reset the GPU in case where GPU
rail gating is not supported
(i.e. platform->can_railgate = false)
In this case, we follow below sequence :
- once GPU is idle, get runtime reference which enables the clocks
- call prepare_poweroff() to save the state explicitly
- perform explicit reset assert/deassert
- call finalize_poweron() to restore the state
- drop the runtime reference taken earlier
Bug 1525284
Change-Id: Id5f3ec152093acd585631dfbf785d8e0561f9048
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/435620
GVS: Gerrit_Virtual_Submit
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Tested-by: Arto Merilainen <amerilainen@nvidia.com>
Increase the wait delays in do_idle() to 2000 mS and make use
of msleep instead of mdelays
Also, to check if GPU is rail gated or not, add a do-while()
loop which will keep checking the status and bail out as soon
as GPU is rail gated
This increase in delays is required to allow GPU sufficient
time to complete its work and get rail gated
These delays are specially needed during stress testing where
it is possible that a large amount of GPU work is blocked
during do_idle() and then it might take more time to complete
it while next do_idle() is waiting for it
Also, remove waiting on API gk20a_wait_channel_idle() for each
channels since it is sufficient to wait for refcount to be 1
bug 1529160
Change-Id: Ie541485fbdda76d79ae4a75dda928da240fc5d8f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/434192
(cherry picked from commit 5a621bf2aaf3355e1330a662dc98e943d68ef86d)
Reviewed-on: http://git-master/r/435133
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
gk20a_busy() call in channel_syncpt_incr() and corresponding
gk20a_idle() call in channel_update() are redundant since they
are already encapsulated inside another pair of busy/idle calls
This busy/idle pair will be called only from submit_gpfifo()
and submit_gpfifo() already has its own busy/idle which it
preserves for whole path and hence this redundant pair can be
removed
Also, this prevents a dead lock scenario while do_idle() is in
progress as follows :
- in submit_gpfifo() we call first gk20a_busy() which acquires
busy read semaphore
- in do_idle() we acquire busy write semaphore and wait for
current jobs to finish
- now submit_gpfifo() encounters second gk20a_busy() and requests
busy read semaphore again
- this results in dead lock where do_idle() is waiting for
submit_gpfifo() to complete and submit_gpfifo() is waiting for
busy lock held by do_idle() and hence it cannot complete
bug 1529160
Change-Id: I96e4368352f693e93524f0f61689b4447e5331ea
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/434191
(cherry picked from commit c4315c6caa42bab72ba6017c7ded25f4e9363dec)
Reviewed-on: http://git-master/r/435132
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
While we are executing do_idle() API, it is possible that
unrailgate() gets invoked in midst of idling the GPU and
this can result in failure of do_idle()
To prevent simultaneous execution of these methods,
add a mutex railgate_lock and acquire it during
do_idle() and unrailgate() APIs
Also, keep this lock held if do_idle() is successful.
In success, lock will be released in do_unidle(),
otherwise release this lock before returning
Note that this lock should not be held in railgate() API
since we do not want it to be blocked during do_idle()
bug 1529160
Change-Id: I87114b5367eaa217376455a2699c0d21c451c889
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/434190
(cherry picked from commit 561dc8e0933ff2d72573292968b893a52f5f783a)
Reviewed-on: http://git-master/r/435131
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Channel and gpfifo allocations are entirely separated from each
other, however, the code here assumes that active channel means
that the channel also has a gpfifo.
This reverts commit a24602f094380539788696d1b1567a4f4d914b17 which
added gpfifo dump. Changing debug dumping to be safe requires
refactoring the channel release code to use proper locking.
Bug 1530226
Change-Id: I2fb02542a17dd56a0a9ce732b327e34b85ade8b9
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/434038
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
This patch adds support to check if the host1x link exists and is
supported using the gpu characteristics ioctl.
Bug 1459653
Change-Id: I832eea217ed7f007e341dfde5769887e0882d6bb
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/433058
This is to weed out accesses to the GPU while it is gated. Otherwise
the accesses are silently dropped or cause a HW hang (on older chips).
Bug 1514949
Change-Id: Ice4cdb9f1f736978ebb3db847f39c7439bf98134
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/416339
Reviewed-by: Mitch Luban <mluban@nvidia.com>
Wait for engine idle via FIFO's engine status instead of submitting
WFI to channel. Submitting WFI and waiting is not robust, and wait
might invoke debug dump which cannot be done while powering down.
Bug 1499214
Change-Id: I4d52e8558e1a862ad4292036594d81ebfbd5f36b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/432151
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
In simulation and emulation 50ms is not enough to ensure a job is
complete. Bump it to 5s when not running on silicon.
Bug 1510751
Change-Id: I90883b70ce2a75a8f07344f713d647b3fa0d0c7d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/432044
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Chris Dragan <kdragan@nvidia.com>
Tested-by: Chris Dragan <kdragan@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Channel class needs to be cleared when a channel is opened. Otherwise
previously used channel remains, and we can accidentally use KEPLER_C
methods even if KEPLER_C is not allocated.
Bug 1487928
Bug 200000669
Change-Id: I3e1ae8d5edbdd82fa569b38a89a89dedb69ee773
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/428866
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Possible race description :
- while PMU is booting, it sends messages to kernel which we process
in gk20a_pmu_isr()
- but when messages are processed it is possible that we are on the way
to rail gate the GPU and we have already called pmu_destroy()
- this could lead to hangs if while processing messages, GR is
already off
To fix this, introduce another mutex isr_enable_lock and a flag to
turn on/off ISRs
- when we enable PMU, get the lock and set the flag
- in pmu_destroy(), get the lock and remove the flag
- in pmu_isr(), take the lock, check if flag is set or not. If flag
is not set return, otherwise proceed with the messages
Bug 200014542
Bug 200014887
Change-Id: I0204d8a00e4563859eebc807d4ac7d26161316ea
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/428371
(cherry picked from commit 9a37528314f2a2504e4530719f817a93db9a5bf0)
Reviewed-on: http://git-master/r/428352
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
PMU ucode and ACR ucode need 0th ctx dma to be programmed
for Physical access. To stay in sync with ucodes,
modified 0th transcfg to be physical access, and suitably
modified all other ctx dma's sent.
Bug 1509680
Change-Id: Ib3a24ebb8478488af57bb465d782e4045ca7d0d0
Signed-off-by: Supriya <ssharatkumar@nvidia.com>
Reviewed-on: http://git-master/r/432084
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
ELPG has to disabled before we write to clock gating registers
If ELPG is engaged during clock gating register write it will
cause error in ELPG engine
Bug 200013495
Bug 200014542
Change-Id: I57d1c59fc9311686829d898faddc90149df4cb46
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/432117
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Supriya Sharatkumar <ssharatkumar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mitch Luban <mluban@nvidia.com>
We broadcast CBC operation to all LTCs, but we wait for only one
to finish.
Bug 1507804
Change-Id: Ib10aa5fe3a34b31862b2d5162c77441f7444a7ba
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/428123
Add PMU state ELPG booting. Prevent ISR processing when PMU is in OFF
state.
Bug 200006956
Change-Id: Ibcf69a2d81965cc87f520bf864c4425681f04531
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/424769
Separate the code to load PMU firmware from the software init. This
allows folding ACR and non-ACR PMU software initialization sequences.
Bug 200006956
Change-Id: I74b289747852167e8ebf1be63036c790ae634da4
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/424768
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Do not abort GPU probe if secure page alloc fails.
We can just note that this allocation failed (using bool
secure_alloc_ready) and prevent further secure memory
allocation if this flag is not set.
Bug 1525465
Change-Id: Ie4eb6393951690174013d2de3db507876d7b657f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/427730
GVS: Gerrit_Virtual_Submit
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a timeout mechanism to the L2 flushing code for gm20b.
Previously the code could spin forever in a loop if some
issue were to occur with the L2 causing the flush to fail.
Change-Id: I742c7671bac92aeb8e9674c43d30c45b2de4a836
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/423842
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
pmu_gk20a has a pointer to struct gk20a *. As pmu_gk20a is part of
gk20a, there's no need to have the circular dependency.
Bug 200006956
Change-Id: I6d5d10a93b2fba4a26a1e28b3c5206506dc6cc04
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/424767
All channels in a TSG need to share same engine context
i.e. pointer in RAMFC of all channels in a TSG must point
to same NV_RAMIN_GR_WFI_TARGET
To get this, add a pointer to gr_ctx inside TSG struct so
that TSG can maintain its own unique gr_ctx
Also, change the type of gr_ctx in a channel to pointer
variable so that if channel is part of TSG it can point
to TSG's gr_ctx otherwise it will point to its own gr_ctx
In gk20a_alloc_obj_ctx(), allocate gr_ctx as below :
1) If channel is not part of any TSG
- allocate its own gr_ctx buffer if it is already not allocated
2) If channel is part of TSG
- Check if TSG has already allocated gr_ctx (as part of TSG)
- If yes, channel's gr_ctx will point to that of TSG's
- If not, then it means channels is first to be bounded to
this TSG
- And in this case we will allocate new gr_ctx on TSG first
and then make channel's gr_ctx to point to this gr_ctx
Also, gr_ctx will be released as below ;
1) If channels is not part of TSG, then it will be released
when channels is closed
2) Otherwise, it will be released when TSG itself is closed
Bug 1470692
Change-Id: Id347217d5b462e0e972cd3d79d17795b37034a50
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/417065
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
- when a TSG channel is made runnable, add it to TSG's
runnable list
- when a TSG channel is removed from runlist, remove it
from TSG's runnable list
When we rewrite the entire runlist :
- first add all the channels which are not part of any TSG
- then find all active TSGs, add an entry in runlist for the TSG
(with TSG id and length of TSG)
- then write entries for each channel in that TSG
Bug 1470692
Change-Id: Ic55a4d5959abc72cd20b8224eb4c31d3ff411861
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/416612
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add accessors to modify contents of runlist ram (RAMRL) entry.
Using these accessors we can modify a runlist entry to specify
it as regular channel or TSG entry
Bug 1470692
Change-Id: If39759941ecb07af11152dbddb6fb5a67c14b26e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/416611
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Randy Spurlock <rspurlock@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Make gpu power management feature configurations
as platform data. Keep existing sttaus for gk20a
and disable all power features for gm20b.
Bug 1523728
Change-Id: Ife7786863f18e21b882ac77085c7abc7c84d4cfc
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/426369
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Supriya Sharatkumar <ssharatkumar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add explict memory barrier wmb() before writing to register values.
Also call writel_relaxed() instead of writel() to skip internal
wmb() call which is conditional on some configs.
Bug 200012037
Change-Id: I9c545138314b6e73fec2a4aff2b1956444fac806
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/421463
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Tested-by: Krishna Reddy <vdumpa@nvidia.com>
When outputting debug dump, print the contents of current push buffer
segment.
Also changes the debug dump to use pr_cont when applicable, and dumps
state before recovering in case channel was not loaded to an engine.
Bug 1498688
Change-Id: I5ca12f64bae8f12333d82350278c700645d5007e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/422198
While suspending the device, do not submit WFI on
timed out channels
Submitting WFI on timed out channels will cuase submit_wfi()
to return error and as result of this, rail gating of device
will be prevented
Bug 200010416
Change-Id: Ic097bfdae59dbf9e1f2aea5d8d0431b5f1c3721b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/422743
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
During gk20a_pm_prepare_poweroff(), if call to gk20a_channel_suspend()
fails, we proceed to disable other components and then return error.
But when genpd sees the error, it will abort the suspend sequence and
keep the device state as active.
But since we have already disabled all the components, GPU lands in
invalid state.
Hence, if channel_suspend() fails then do not proceed but return
the error immediately
Bug 200010416
Change-Id: I553a2a25832a1be4941bb6b6ce490c950cdbe7fa
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/422248
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
VPR resize is done by forcing GPU to idle and then updating
VPR size from TLK.
There is no need now to call vpr_resize funtion from kernel
and hence these functions can be removed.
Bug 1487804
Change-Id: I758a6e0a99a58757866f1138b0a89594e2a33908
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/421703
(cherry picked from commit 391d9bacf053fe0dacffc76c36768f82912ad1f4)
Reviewed-on: http://git-master/r/419612
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allocate dummy secure buffer of size PAGE_SIZE during gk20a_probe().
This will also help to initiate first secure memory (VPR)
resize call while GPU is rail gated and in reset.
This dummy buffer is released after we allocate some more
secure memory buffers in alloc_global_ctx_buffers()
Bug 1487804
Change-Id: I61604d9e5ffb585801ee893435c98a0d3e69d666
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/421701
(cherry picked from commit 4236ab3323ee3c02fac562740d8b80d763589dea)
Reviewed-on: http://git-master/r/419610
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement a full L2 flush (clean and invalidate) for gm20b in
the fifo recovery path.
Bug 1512176
Change-Id: Ibf89ede9cca65a6868ebff89825869053302a007
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/416435
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add the necessary cache management registers for doing a
full L2 flush in GM20b.
Bug 1512176
Change-Id: I7799e5e584238a0af02abbf4f49917d7590d97dc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/417260
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
When aborting jobs on channel error situations, we manually set
the channel syncpoint's min == max in gk20a_disable_channel_no_update.
Nvhost will notice this manual syncpoint increment, and will call back
to gk20a_channel_update, which will clean up the job.
With semaphore synchronization, we don't have anybody calling back to
gk20a_channel_update, so we need to call it ourselves. Release job
semaphores (the equivalent of set_min_eq_max) on
gk20a_disable_channel_no_update, and if any semaphores were released,
call gk20a_channel_update afterwards.
Because we are actually calling gk20a_channel_update in some situations,
gk20a_disable_channel_no_update is no longer an appropriate name for the
function. Rename it to gk20a_channel_abort.
Bug 1450122
Change-Id: I1267b099a5778041cbc8e91b7184844812145b93
Signed-off-by: Lauri Peltonen <lpeltonen@nvidia.com>
Reviewed-on: http://git-master/r/422161
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>