While mapping the buffer, if kind argument is -1,
we extract kind value from nvmap
but kind information from nvmap is going away
and hence remove respective call to nvmap
Bug 1616899
Change-Id: I2764655f60df691ac8a86484c6ec929d2b83b2e3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1012239
GVS: Gerrit_Virtual_Submit
Reviewed-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_gr_handle_fecs_error(), we always return error
value and that triggers recovery in each case
Return error only if we need to trigger recovery
(depending on case)
Otherwise, clear the interrupt and return success
Bug 200156699
Change-Id: I117f3702b751e8bbc1cd3834b1b72b6533e246f9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1001694
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add IOCTL NVGPU_DBG_GPU_IOCTL_SET_NEXT_STOP_TRIGGER_TYPE
to set next stop_trigger type (either single SM or
broadcast to all SMs)
Also, expose below APIs to check and clear broadcast flag:
gk20a_dbg_gpu_broadcast_stop_trigger()
gk20a_dbg_gpu_clear_broadcast_stop_trigger()
Bug 200156699
Change-Id: I5e6cd4b84e601889fb172e0cdbb6bd5a0d366eab
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925882
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently clean up the jobs in gk20a_channel_update()
which is called from nvhost worker thread
Instead of doing this, schedule another delayed worker thread
clean_up_work to clean up the jobs (with delay of 1 jiffies)
Keep update_gp_get() in channel_update() and not in
delayed worker since this will help in better book
keeping of gp_get
Also, this scheduling will help delay job clean-up so
that more number of jobs are batched for clean up
and hence less time is consumed by worker
Bug 1718092
Change-Id: If3b94b6aab93c92da4cf0d1c74aaba756f4cd838
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931701
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add TEX exception handling support. Also make SM exception handler into
a function pointer, which should allow different chips to implement
their own SM exception handling routine.
Bug 1635727
Bug 1637486
Change-Id: I429905726c1840c11e83780843d82729495dc6a5
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: http://git-master/r/935329
When gpu rail-gating is enabled, it is possible that
both rail gating code and system shudown can start
executing gk20a_pm_prepare_poweroff() in parallel.
To synchronize this execution, protect gk20a_pm_prepare_poweroff()
with a mutex lock.
Bug 200168805
Change-Id: I19536a43ed20c3e82b32c316922dc3e19e3f59bb
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/999548
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
global_esr and warp_esr are edge-triggered
and are cleared in kernel isr so skip checking
them when wait_for_pause is called from UMD via
ioctl.
Bug 1619430
Change-Id: I2ae54f23ba5c8bfaab35a476f88ccca0bbb10202
Signed-off-by: Ashutosh Jain <ashutoshj@nvidia.com>
Reviewed-on: http://git-master/r/935808
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Increase the semaphore count per channel. Some channels
were running out of semaphores. The original limit was
255 (256 fits in 1 page, but the 0th semaphore is used
to return error codes from the allocator).
Easy fix was to simply increase the number of semaphores
each channel is allocated to 1024.
Bug 1604892
Change-Id: I163e24b8d42a3dc1bb9b418dadc0c8532aff9adb
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/935911
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
A race condition existed in gk20a_channel_semaphore_wait_fd().
In some instances the semaphore underlying the sync_fence being
waited on would have already signaled. This would cause the
subsequent sync_fence_wait_async() call to return 1 and do
nothing. Normally, the sync_fence_wait_async() call would
release the newly created semaphore but in the above case that
would not happen and hang any channel waiting on that semaphore.
To fix this problem if sync_fence_wait_async() returns 1
immediately release the newly created semaphore.
Bug 1604892
Change-Id: I1f5e811695bb099f71b7762835aba4a7e27362ec
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/935910
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Fix a reentrancy problem in gk20a_sync_pt_has_signaled()
where one thread could clear a pointer before another
thread tried to access it. A spinlock was added to the
gk20a_sync_pt struct which is used to ensure that the
underyling gk20a_sync_pt data is accessed in a sane
manner.
Bug 1604892
Change-Id: I270d89def7b986405a3167285d51ceda950c7b82
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/935909
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
We currently call gk20a_pmu_load_update() before calling
update_devfreq()
But it is possible to disable governor and set a
constant/max frequency.
In that case we will unnecessarily keep executing
gk20a_pmu_load_update() for each submit
Hence. move gk20a_pmu_load_update() to gk20a_scale_get_dev_status()
so that we call gk20a_pmu_load_update() only when we really
have to scale the frequency
Bug 200161377
Change-Id: Ifac5a659a3a2d088b636f048213c2fbec801bdb9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929509
(cherry picked from commit f857a1b31400dfc0c35c58c6424aaac36bc09e7c)
Reviewed-on: http://git-master/r/933704
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently have this sequence :
- acquire sync_lock
- sync_create
- resetup_ramfc()
- release sync_lock
but this can lead to deadlock in case resetup_ramfc()
triggers below stack :
- resetup_ramfc()
- channel_preempt() - preemption fails
- trigger recovery
- channel_abort()
- acquire sync_lock
Fix this by moving resetup_ramfc() out of sync_lock.
resetup_ramfc() is still protected by submit_lock and
hence we cannot free sync after allocation and
before resetup
Bug 200165811
Change-Id: Iebf74d950d6f6902b6d180c2cd8cd2d50493062c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931726
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Few times cde is getting deadlocked because of pending cde
operation. So do the things cleanly, first suspend cde then
do channel suspend.
Bug 1709757
Change-Id: Iaf566b63d9efb13aa2691c19e2df676c70f26afc
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/926574
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Restore comptags to be bitmap-allocated, like they were before we had
the buddy allocator.
The new buddy allocator introduced by
e99aa2485f8992eabe3556f3ebcb57bdc8ad91ff (originally
6ab2e0c49cb79ca68d2f83f1d4610783d2eaa79b) is fine for the big VAs, but
unsuitable for the small compbit store.
This commit reverts partially the combination of the above commit and
also one after it, 86fc7ec9a05999bea8de320840b962db3ee11410, that fixed
a bug which is not present when using a bitmap. With a bitmap allocator,
pruning the extra allocation necessary for user-mapped mode is possible,
so that is also restored.
The original generic bitmap allocator is not restored; instead, a
comptag-only allocator is introduced.
Bug 200145635
Change-Id: I87f3a911826a801124cfd21e44857dfab1c3f378
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/837180
(cherry picked from commit 5a504aeb54f3e89e6561932971158a397157b3f2)
Reviewed-on: http://git-master/r/839742
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Do not read back L2 ZBC RAM. That can conflict with in-flight
transactions causing a live-lock.
Change-Id: I6122af48513b5a4b801202dc611eba58ce86aa4d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/929580
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
If we run out of gpfifo space or private command
buffer space, we currently return EAGAIN as error code
Instead of EAGAIN, return ENOSPC as error code so
that caller (user space) can read the error code
and do some re-trials
As the jobs are processed, it is possible to free up
some space. And hence such re-trials could succeed
Bug 1715291
Change-Id: I9a2ed7134d2496b383899b3c02c0e70452b26115
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929402
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
We currently enable per-channel wdt flag at channel
initialization time only
But if process disables channel's wdt via per-channel
IOCTL, and closes the channel without re-enabling it,
we leave the wdt disabled on that channel
And if same channel is assigned to some other process,
then that process might have wdt disabled already
Fix this by setting ch->wdt_enabled = true during
gk20a_open_new_channel()
Bug 200165797
Change-Id: I3ab482ce7cfbcbbd2178041f01f97457ff24f7bb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931128
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
Add below API pointer to support masking of hww_warp_esr
after hardware read of register and before using it
further
u32 (*mask_hww_warp_esr)(u32 hww_warp_esr)
If needed, this API will mask value of hww_warp_esr
appropriately and return it
Bug 200156699
Change-Id: I1afb1347e650fab607009c1ee55691484653a4c1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927133
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add new API gr_gk20a_get_ctx_id() to get/extract
context id from GR context
Bug 200156699
Change-Id: If0e8887a9a6b139cd795bf03f5def64fd664d12b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927130
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add below APIs to suspend or resume single SM :
gk20a_suspend_single_sm()
gk20a_resume_single_sm()
Also, update gk20a_suspend_all_sms() to make it
more generic by passing global_esr_mask and
check_errors flag as parameter
Bug 200156699
Change-Id: If40f4bcae74a8132673b4dca10b7d9898f23c164
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925884
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Support preprocessing of SM exceptions if API
pointer pre_process_sm_exception() is defined
Also, expose some common APIs
Bug 200156699
Change-Id: I1303642c1c4403c520b62efb6fd83e95eaeb519b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925883
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_channel_timeout_handler(), below deadlock scenario
is possible :
thread 1:
- take global lock g->ch_wdt_lock
- identify timed out channel (as ch1)
- check engine status which is stuck
- identify failing channel on engine as ch2
- we need to trigger recovery with ch2
- as part of recovery, call channel_abort() for ch2
- in channel_abort(), we wait to cancel the timer wq
- but timer wq for ch2 never completes due to thread 2
thread 2:
- ch2 has already timed out
- to process, we wait for global lock g->ch_wdt_lock
- this lock needs to be released by thread 1
To fix this, cancel the timer (through flag) of ch2
(failing channel on engine) before triggering recovery
on that channel
Bug 200164753
Change-Id: Idb42d01c8440a53f43cb5e87e41f1c283f7e8fcf
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929924
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_channel_timeout_handler(), we currently disable
all engine activity before checking for fence completion
and before we identify timed out channel
But disabling all engine activity could be overkill for
this process.
Also, as part of disabling engine activity we preempt
the channel on engine.
But it is possible that channel preemption times out
since channel has already timed out
And this can lead to races and deadlock
Hence, instead of disabling all engine activity, just
disable the context switch which should also do the
same trick
Bug 1716062
Change-Id: I596515ed670a2e134f7bcd9758488a4aa0bf16f7
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929421
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Interleave all high priority channels between all other channels.
This reduces the latency for high priority work when there
are a lot of lower priority work present, imposing an upper
bound on the latency. Change the default high priority timeslice
from 5.2ms to 3.0 in the process, to prevent long running high priority
apps from hogging the GPU too much.
Introduce a new debugfs node to enable/disable high priority
channel interleaving. It is currently enabled by default.
Adds new runlist length max register, used for allocating
suitable sized runlist.
Limit the number of interleaved channels to 32.
This change reduces the maximum time a lower priority job
is running (one timeslice) before we check that high priority
jobs are running.
Tested with gles2_context_priority (still passes)
Basic sanity testing is done with graphics_submit
(one app is high priority)
Also more functional testing using lots of parallel runs with:
NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 –finish
plus multiple:
NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 -finish
Previous to this change, the relative performance between
high priority work and normal priority work comes down
to timeslice value. This means that when there are many
low priority channels, the high priority work will still
drop quite a lot. But with this change, the high priority
work will roughly get about half the entire GPU time, meaning
that after the initial lower performance, it is less likely
to get lower in performance due to more apps running on the system.
This change makes a large step towards real priority levels.
It is not perfect and there are no guarantees on anything,
but it is a step forwards without any additional CPU overhead
or other complications. It will also serve as a baseline to
judge other algorithms against.
Support for priorities with TSG is future work.
Support for interleave mid + high priority channels,
instead of just high, is also future work.
Bug 1419900
Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98
Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com>
Reviewed-on: http://git-master/r/805961
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
It'll detect dead semaphore acquire. The worst case is when
ACQUIRE_SWITCH is disabled, semaphore acquire will poll and
consume full gpu timeslicees.
The timeout value is set to half of channel WDT.
Bug 1636800
Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/928827
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>