Add support for CDE scatter buffers. When the bus addresses for
surfaces are not contiguous as seen by the GPU (e.g., when SMMU is
bypassed), CDE swizzling needs additional per-page information. This
information is populated in a scatter buffer when required.
Bug 1604102
Change-Id: I3384e2cfb5d5f628ed0f21375bdac8e36b77ae4f
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/789436
Reviewed-on: http://git-master/r/791243
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement per-channel watchdog/timer as per below rules :
- start the timer while submitting first job on channel or if
no timer is already running
- cancel the timer when job completes
- re-start the timer if there is any incomplete job left
in the channel's queue
- trigger appropriate recovery method as part of timeout
handling mechanism
Handle the timeout as per below :
- get timed out channel, and job data
- disable activity on all engines
- check if fence is really pending
- get information on failing engine
- if no engine is failing, just abort the channel
- if engine is failing, trigger the recovery
Also, add flag "ch_wdt_enabled" to enable/disable channel
watchdog mechanism. Watchdog can also be disabled using
global flag "timeouts_enabled"
Set the watchdog time to be 5s using macro
NVGPU_CHANNEL_WATCHDOG_DEFAULT_TIMEOUT_MS
Bug 200133289
Change-Id: I401cf14dd34a210bc429f31bd5216a361edf1237
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/797072
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In case of CDE channel, T1 (Tex) unit needs to be promoted to 128B
aligned, otherwise causes a HW deadlock. Gpu driver makes changes in
FECS header which FECS uses to configure the T1 promotions to aligned
128B accesses.
Bug 200096226
Change-Id: Ic006b2c7035bbeabe1081aeed968a6c6d11f9995
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/802327
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Currently, while releasing the debug session we enable powergate
only if a channel is bound to session
If a session has no channel bound to it, and has powergate
disabled, then we do not enable powergate when that session
is closed
Fix this by calling dbg_set_powergate(POWERGATE_ENABLE) always
while releasing the session
Refcounting and sanity checks in dbg_set_powergate() will take
care of situation if powergate was not disabled by the session
in first place
Bug 1679372
Change-Id: I4e027393c611d3e8ab4f20e195f31871086da736
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/796999
Tested-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
bug 1625901
1) disable ELPG before doing GR reset when runlist update times out
2) add mutex for GR reset to avoid multiple threads resetting GR
3) protect GR reset with FECS mutex so that no one else submits methods
Change-Id: I02993fd1eabe6875ab1c58a40a06e6c79fcdeeae
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/793643
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
As per current sequence in gk20a_channel_abort(),
we first balance the syncpoint values associated with
failing channel, and then abort it
Reverse this sequence so that we first disable the channel
and then only balance the syncpoints
Bug 200133289
Change-Id: I5a748afce437e728a5ff6c8a030a75d0f627c622
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/797071
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_fifo_handle_sched_error(), we currently have a sequence
to identify failing engine (stuck on context switch) and
corresponding failing channel with its type
Separate out this sequence in new API
gk20a_fifo_get_failing_engine_data() so that it can be
reused from else where too
Bug 200133289
Change-Id: I3cef395170cf8990c014c7505c798fd6f2e37921
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/797070
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Separate the kernel and userspace regions in the GPU virtual address
space. Do this by reserving the last part of the GPU VA aperture for
the kernel, and extend GPU VA aperture accordingly for regular address
spaces. This prevents the kernel polluting the userspace-visible GPU
VA regions, and thus, makes the success of fixed-address mapping more
predictable.
Bug 200077571
Change-Id: I63f0e73d4c815a4a9fa4a9ce568709974690ef0f
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/747191
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
consider buffer size as well when calculating the required alignment
for a buffer else we would be mapping a VA range greater than requested
thus allowing access to entire large page even when not needed creating
a security hole.
Bug 1492689
Change-Id: Ic404708d238621ea64c26cafd05bc30ba8e02e12
Signed-off-by: Sri Krishna chowdary <schowdary@nvidia.com>
Reviewed-on: http://git-master/r/793229
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Cyclestats snapshot feature is expected for new devices.
The detection code was isolated in separate function and run-time
check added to validate/allow ioctl calls on the current GPU.
Bug 1674079
Change-Id: Icc2f1e5cc50d39b395d31d5292c314f99d67f3eb
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/781697
(cherry picked from commit bdd23136b182c933841f91dd2829061e278a46d4)
Reviewed-on: http://git-master/r/793630
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This patch inserts the function address of
gk20a_debug_dump_device into host data struct once
the nvgpu module loads and removes it during unload.
Bug 1476801
Change-Id: If49262208325b2aa0807705c26086e6d7c81632c
Signed-off-by: Yogesh Bhosale <ybhosale@nvidia.com>
Reviewed-on: http://git-master/r/779397
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Add gpu_ops for CDE, and add get_program_numbers function pointer for
determining horizontal and vertical CDE swizzler programs. This allows
different GPUs to have their own specific requirements for choosing
the CDE firmware programs.
Bug 1604102
Change-Id: Ib37c13abb017c8eb1c32adc8cbc6b5984488222e
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/784899
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_ce2_nonstall_isr(), we first invoke semaphore workqueue
on all channels and then clear the interrupt
This delay in clearing the interrupt can sometimes lead to
dropping of new interrupt
If that happens, we never invoke gk20a_channel_semaphore_wakeup()
for new semaphore interrupts and semaphore waiting
never completes.
Fix this by moving gk20a_channel_semaphore_wakeup() after
we clear the interrupt
Bug 200131938
Change-Id: I26d72f04a8b49f4a3ac326bf6037cd04c741a920
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/784771
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Currently, we first invoke semaphore workqueue on all channels
and then clear the interrupt
This delay in clearing the interrupt can sometimes lead to
dropping of new interrupt
If that happens, we never invoke gk20a_channel_semaphore_wakeup()
for new semaphore interrupts and semaphore waiting
never completes.
Fix this by moving gk20a_channel_semaphore_wakeup() after
we clear the interrupt
Bug 200083084
Bug 200117718
Change-Id: I7278cb378728e3799961411c4ed71d266d178a32
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/783175
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
1. Before destroying the allocator for PMU dmem check if it was already
initialized. It is only initialized through certain paths like PMU ISRs.
So while testing the nvgpu module using nvgpu_submit_twod test I found
that it was never initialized.
2. Inside gk20a_init_gr_setup_sw, cleanup part calls for de-allocating
the already allocated chunk of memory. Whereas, cleanup also gets called
when memory allocation inside the same function fails. In such cases,
we should have a non-null check else we attempt to free a non-allocated
memory and kernel panics.
Bug 1476801
Change-Id: Ia2f0599ac0c35d58709acd149033e114b898b426
Signed-off-by: Yogesh Bhosale <ybhosale@nvidia.com>
Reviewed-on: http://git-master/r/777118
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
- pmu version update to sync with CL-19816709
- GPCCS version update to sync with CL-19816709
Change-Id: Ia60bb538ddba35c973183ca2d4d3a7a0013b4b59
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: http://git-master/r/779628
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
gk20a_busy() is already called on all the paths to
__gk20a_channel_syncpt_incr() i.e. in gk20a_submit_channel_gpfifo()
hence remove the redundant gk20a_busy() call since it causes
deadlock scenario with VPR resize use case
Bug 200128257
Bug 1645760
Bug 200114947
Bug 200124519
Change-Id: I4cd47b7e7cdc92aaeda17256a99f2ba93833a3b3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/778341
(cherry picked from commit 5a5dc5b5a9d38a5e8d5c1ca29dc6de425c00b605)
Reviewed-on: http://git-master/r/779070
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
gk20a_busy() was added to gk20a_channel_syncpt_update() for possible
case of channel deletion
But API to delete a channel (i.e. gk20a_free_channel()) is already
called in paths which ensure gk20a_busy() is called before
deleting the channel
Hence, remove redundant gk20a_busy()/idle() calls
This also fixes a deadlock scenario with VPR resize use case
Bug 200128257
Bug 1645760
Bug 200114947
Bug 200124519
Change-Id: I05dc739b3be88af2ba22b0a667e5004d8100bf6f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/778340
(cherry picked from commit 306282aa950201cf1ae91a5cc48d75719b179d19)
Reviewed-on: http://git-master/r/779069
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Add device-tree support for tegra power-domains and power-gating
for t186, then perform the related cleanup.
Also enable TEGRA_MC_DOMIANS, PM_GENERIC_DOMAINS_OF and
TEGRA_POWERGATE for t186.
Bug 200105664
Change-Id: I548c6b71a1577afa439a39a0eafc317a1c3cbc68
Signed-off-by: Sumit Singh <sumsingh@nvidia.com>
gk20a_pm_shutdown() is the last callback before
GPU railgate will be forced by platform code
Hence we need to call prepare_poweroff() before
returning from shutdown() to clean up below things
mainly,
1. disable interrupts to ensure that GPU is not
processing any interrupts while railgating
2. disable clocks (and related flags) to ensure
no h/w access from exported clock ops
Note that GPU railgate will be triggered by platform
code since config CONFIG_PM_GENERIC_DOMAINS_OF is
enabled by default
Bug 200123584
Change-Id: Ifaa0d1ba9b01d49bf5cc85d9c9a9feb3815866d8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/770485
Reviewed-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
As CONFIG_PM_GENERIC_DOMAINS_OF is enabled, so cleaning-up
the code which remains unused when this config is enabled.
Bug 200070810
Change-Id: I884ca3d6fb8fa6acdff8c1b2fbe66a672758274a
Signed-off-by: Sumit Singh <sumsingh@nvidia.com>
Make modification to add DT support for gpu
power-domain for T186 chip.
Bug 200105664
Change-Id: Ief8d0a6c84918578c52d153db7eac02587b67ee7
Signed-off-by: Sumit Singh <sumsingh@nvidia.com>
Now that the buddy allocator is merged we can increase the VA space
without dramatically increasing memory usage by the allocator.
40 bits is the max VA space available on gk20a and gm20b.
Change-Id: I7bc8d86e35b28f041e9a435f2571c8288970c8ee
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/745076
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/771152
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
The address space limit was being computed with the assumption
that the va_limit field is inclusive. The va_limit field is
actually not inclusive. It points to the first invalid byte.
Thus when generating the adr_limit register the code incorrectly
calculated that the address limit should be 0. To fix this the
computation now just uses va_limit - 1.
Also, the bitwise OR of 0xfff into the lower limit word was
incorrect. The bottom 12 bits of the lower 32 bit word are
ignored by the GPU and as such should not be populated.
Change-Id: Ifcc13343aaf50776f3cf1a1e3726e73ffde5003f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/756690
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/771151
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Fix an issue where large ( > 4GB) allocations were not being computed
correctly. The two fields, pages and page_size, were both 32 bits so
when multiplied they easily overflowed. Simple fix is to cast them to
64 bits before multiplying them.
Change-Id: I63fa54679e485de5c3a99684cbeb72c6cdc65504
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/747429
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/771148
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
While evaluating the broadcast register, use the correct max_tpc_per_gpc for gm20b.
Bug 200118793
Change-Id: Icdc506c05895e5ecdd424dfa2729d0d53460ff15
Reviewed-on: http://git-master/r/765147
(cherry picked from commit be5add9a2f13f787ea408d2a28b0b82c776227d4)
Signed-off-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-on: http://git-master/r/771254
Reviewed-by: Ken Adams <kadams@nvidia.com>
Tested-by: Ken Adams <kadams@nvidia.com>
bug 200114561
1) when handling sched error, if CTXSW status reads switch
check FECS mailbox register to know whether next or current
channel caused error
2) Update recovery function to use ch id passed to it
3) Recovery function now passes mmu_engine_id to mmu fault
handler instead of fifo_engine_id
Change-Id: I3576cc4a90408b2f76b2c42cce19c27344531b1c
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/763538
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
if GPU is not powered before L2 is flushed, then
L2 cache flush is a noop. Same behavior as
gk20a_mm_L2_Invalidate()
bug 1661228
Change-Id: I0f590628928a73b7277d1b16a5a79a86e0213648
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/768068
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
(cherry picked from commit cb4d29d34d0736aa753afa323bfb216481cc8640)
Reviewed-on: http://git-master/r/771113
GVS: Gerrit_Virtual_Submit
In gk20a_cde_remove_ctx(), current sequence is as below
- gk20a_channel_close()
- gk20a_deinit_cde_img()
- gk20a_free_obj_ctx()
But gk20a_free_obj_ctx() needs reference to channel and hence
below crash is seen :
[ 3901.466223] Unable to handle kernel paging request at virtual address
00001624
...
[ 3901.535218] PC is at gk20a_free_obj_ctx+0x14/0xb0
[ 3901.539910] LR is at gk20a_deinit_cde_img+0xd8/0x12c
Fix this by closing the channel after gk20a_deinit_cde_img()
Bug 1625901
Change-Id: Ic2dc5af933b6d6ef8982c2b9f0caa28df204051f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/770322
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Fixed the following sparse warning by including the "fb_gk20a.h" header file:
- fb_gk20a.c: warning: symbol 'fb_gk20a_reset' was not declared.
Should it be static?
Bug 200088648
Change-Id: I1ba6051455a22e81da6598eebdccfa8b45b78c3e
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/768203
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-on: http://git-master/r/770654
The cyclestats mode-e feature supported by userspace only
for t210 devices, so kernel should advertize it only for t210.
Also small check added to prevent BUG in dma-buf.c:826
if device has lack of memory.
Bug 1662506
Change-Id: I8417a8cdd9092e64126382f379d171932e4592a1
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/767073
(cherry picked from commit 06f86b6e78bae5e26e32466716c18e7918efb1b1)
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/767148
Reviewed-by: Automatic_Commit_Validation_User