Commit Graph

353 Commits

Author SHA1 Message Date
Alex Waterman
a191fd905a gpu: nvgpu: Schedule channel update when jobs actually finish
Do not schedule channel update call backs unless a job is actually
finished. This saves a lot of call backs to the CDE code that don't
do anything when semaphores are enabled.

Bug 1732449
JIRA DNVGPU-12

Change-Id: I2f9a78498b08ebca44ee6a5171931a07721767f1
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1133786
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-29 14:44:24 -07:00
Deepak Nibade
63319c0fb9 gpu: nvgpu: fix sparse warnings
Fix below sparse warnings :
drivers/gpu/nvgpu/gk20a/gk20a.c:764:5: warning: symbol
'gk20a_pm_finalize_poweron' was not declared. Should it be static?
drivers/gpu/nvgpu/gk20a/channel_gk20a.c:2504:14:
warning: symbol 'gk20a_event_id_poll' was not declared. Should it be
static?
drivers/gpu/nvgpu/gk20a/channel_gk20a.c:2538:5:
warning: symbol 'gk20a_event_id_release' was not declared. Should it be
static?

Bug 200067946
Bug 200088648

Change-Id: I5c23e7ee09c1a18fe2eeff12f80a3c2bf73120ef
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1128060
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sri Krishna Chowdary <schowdary@nvidia.com>
2016-04-20 02:16:14 -07:00
Deepak Nibade
2ac8c9729a gpu: nvgpu: make jobs_lock more fine grained
While processing all the jobs in gk20a_channel_clean_up_jobs(),
We currently acquire jobs_lock, traverse the list,
clean up the jobs, and then release the lock

But in this case we might hold the lock for too long
blocking the submit path

Hence make jobs_lock more fine grained by restricting
it for list accesses only

Bug 200187553

Change-Id: If82af8ff386f7bc29061cfd57fdda7df62f11c17
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120412
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-19 08:17:30 -07:00
Deepak Nibade
c8d17e9167 gpu: nvgpu: remove submit lock
Remove submit lock since we have moved to use
more fine-grained locks

Remove API check_gp_put() since we cannot call
it in submit path due to latencies and we cannot
call it in gk20a_channel_clean_up_jobs() anymore
since it will fail there without the lock

Bug 200187553

Change-Id: I05b9fa95c9009000e13232d8fa567336eeee11c6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120411
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-19 08:16:44 -07:00
Deepak Nibade
e0c9da1fe9 gpu: nvgpu: implement sync refcounting
We currently free sync when we find job list empty
If aggressive_sync is set to true, we try to free
sync during channel unbind() call

But we rarely free sync from channel_unbind() call
since freeing it when job list is empty is
aggressive enough

Hence remove sync free code from channel_unbind()

Implement refcounting for sync:
- get a refcount while submitting a job (and
  allocate sync if it is not allocated already)
- put a refcount while freeing the job
- if refcount==0 and if aggressive_sync_destroy is
  set, free the sync
- if aggressive_sync_destroy is not set, we will
  free the sync during channel close time

Bug 200187553

Change-Id: I74e24adb15dc26a375ebca1fdd017b3ad6d57b61
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120410
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-19 08:16:13 -07:00
Deepak Nibade
1c96bc6942 gpu: nvgpu: add lock for fences
All pre/post fence accesses in last_submit are
currently protected by submit lock
In order to remove the submit lock, move all fence
accesses under own lock i.e. fence_lock

Bug 200187553

Change-Id: I0132d1933dc92db8c5ed8c9311e49a030aa2d38c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120409
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-19 08:15:37 -07:00
Deepak Nibade
dfac8ce704 gpu: nvgpu: support binding multiple channels to a debug session
We currently bind only one channel to a debug session
But some use cases might need multiple channels bound
to same debug session

Add this support by adding a list of channels to debug session.
List structure is implemented as struct dbg_session_channel_data

List node dbg_s_list_node is currently defined in struct
dbg_session_gk20a. But this is inefficient when we need to
add debug session to multiple channels

Hence add new reference structure dbg_session_data to
store dbg_session pointer and list entry

For each NVGPU_DBG_GPU_IOCTL_BIND_CHANNEL call, create
two reference structure dbg_session_channel_data for channel
and dbg_session_data for debug session and bind them together

Define API nvgpu_dbg_gpu_get_session_channel() which will
get first channel in the list of debug session
Use this API wherever we refer to channel bound to debug
session

Remove dbg_sessions define in struct gk20a since it is
not being used anywhere

Add new API NVGPU_DBG_GPU_IOCTL_UNBIND_CHANNEL to support
unbinding of channel from debug sesssion

Bug 200156699

Change-Id: I3bfa6f9cd5b90e7254a75c7e64ac893739776b7f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120331
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-19 08:07:34 -07:00
Terje Bergstrom
7d8e219389 gpu: nvgpu: Use sysmem aperture for SoC memory
In Tegra GPU, SoC memory has to be accessed as vidmem. In discrete GPU, it
has to be accessed as sysmem.

Change-Id: I4efe71b54a9a32f0bf1f02ec4016ed74405a14c5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1120468
2016-04-15 08:50:34 -07:00
Terje Bergstrom
9b5427da37 gpu: nvgpu: Support GPUs with no physical mode
Support GPUs which cannot choose between SMMU and physical
addressing.

Change-Id: If3256fa1bc795a84d039ad3aa63ebdccf5cc0afb
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1120469
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
2016-04-13 13:12:41 -07:00
Terje Bergstrom
e8bac374c0 gpu: nvgpu: Use device instead of platform_device
Use struct device instead of struct platform_device wherever
possible. This allows adding other bus types later.

Change-Id: I1657287a68d85a542cdbdd8a00d1902c3d6e00ed
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1120466
2016-04-08 09:42:41 -07:00
Thomas Fleury
b2dd107455 gpu: nvgpu: add trace event for channel reset
Change-Id: I319e877978b7f483108ef8f67c05702b71709f62
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1120501
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-07 13:56:38 -07:00
Deepak Nibade
16658fd39d gpu: nvgpu: post BPT_INT/PAUSE and BLOCKING_SYNC events
Post EVENT_ID_BPT_INT when bpt.int is pending
Post EVENT_ID_BPT_PAUSE when bpt.pause is pending
Post EVENT_ID_BLOCKING_SYNC whenever there is
non-stalling semaphore interrupt indicating work
completion from GR/CE2 engine

Bug 200089620

Change-Id: I91b7bf48f8585f0d318298fc0c4a66d42055f0a7
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1112274
(cherry picked from commit d2b744b1f9acac56435cd7e7ab9a7a845579ef24)
Reviewed-on: http://git-master/r/1120321
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-04-07 08:45:47 -07:00
Deepak Nibade
ce04ae15bb gpu: nvgpu: APIs to post event id events
Add below channel and TSG APIs to post events
on event_id interface

gk20a_channel_event_id_post_event()
gk20a_tsg_event_id_post_event()

Bug 200089620

Change-Id: I0cfadc9ffdb880b2410f97758fad47905c620db1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1112267
(cherry picked from commit 9f50d7da4500af4dbf4dabe7916eda6fc220f4fb)
Reviewed-on: http://git-master/r/1120320
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2016-04-07 08:44:58 -07:00
Deepak Nibade
5f10073540 gpu: nvgpu: add TSG support to channel event id
Add NVGPU_IOCTL_TSG_EVENT_ID_CTRL API for channel
event id support to TSGs

This API will accept an event_id (like BPT.INT or
BPT.PAUSE), a command to enable
the event, and return a file descriptor on which
we can raise the event (if cmd=enable)

Events generated for TSGs will reuse file
operations "gk20a_event_id_ops"

Bug 200089620

Change-Id: I2f563c6d3a0988eb670caac2d3c7c6795724792c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1030776
(cherry picked from commit 72b61fa266279038f013e582be80c21808e1038d)
Reviewed-on: http://git-master/r/1120319
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2016-04-07 08:44:38 -07:00
Deepak Nibade
e87ba53235 gpu: nvgpu: add channel event id support
With NVGPU_IOCTL_CHANNEL_EVENTS_CTRL, nvgpu can
raise events to User space. But user space
cannot distinguish between various types of events.
To overcome this, we need finer-grained API to
deliver various events to user space.

Remove old API NVGPU_IOCTL_CHANNEL_EVENTS_CTRL,
and all the support for this API (we can remove
this since User space has not started using this
API at all)

Add new API NVGPU_IOCTL_CHANNEL_EVENT_ID_CTRL
which will accept an event_id (like BPT.INT or
BPT.PAUSE), a command to enable the event,
and return a file descriptor on which
we can raise the event (if cmd=enable)
Event is disabled when file descriptor is closed

Add file operations "gk20a_event_id_ops"
to support polling on event fd

Also add API gk20a_channel_get_event_data_from_id()
to get event_data of event from its id

Bug 200089620

Change-Id: I5288f19f38ff49448c46338c33b2a927c9e02254
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1030775
(cherry picked from commit 5721ce2735950440bedc2b86f851db08ed593275)
Reviewed-on: http://git-master/r/1120318
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2016-04-07 08:43:49 -07:00
Seshendra Gadagottu
f5ce107f19 gpu: nvgpu: dump gpu status on channel wdt
To get more useful debug info, dump gpu status
on channel watchdog timeout.

Bug 200163782

Change-Id: Ie160e3a65650b87f812e0c1d1d9b54814b45a1d7
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1115439
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-29 11:14:21 -07:00
Anton Vorontsov
1c40d09c4c gpu: nvgpu: Add support for FECS ctxsw tracing
bug 1648908

This commit adds support for FECS ctxsw tracing. Code is compiled
conditionnaly under CONFIG_GK20_CTXSW_TRACE.
This feature requires an updated FECS ucode that writes one record to a ring
buffer on each context switch. On RM/Kernel side, the GPU driver reads records
from the master ring buffer and generates trace entries into a user-facing
VM ring buffer. For each record in the master ring buffer, RM/Kernel has
to retrieve the vmid+pid of the user process that submitted related work.

Features currently implemented:
- master ring buffer allocation
- debugfs to dump master ring buffer
- FECS record per context switch (with both current and new contexts)
- dedicated device for ctxsw tracing (access to VM ring buffer)
- SOF generation (and access to PTIMER)
- VM ring buffer allocation, and reconfiguration
- enable/disable tracing at user level
- event-based trace filtering
- context_ptr to vmid+pid mapping
- read system call for ctxsw dev
- mmap system call for ctxsw dev (direct access to VM ring buffer)
- poll system call for ctxsw dev
- save/restore register on ELPG/CG6
- separate user ring from FECS ring handling

Features requiring ucode changes:
- enable/disable tracing at FECS level
- actual busy time on engine (bug 1642354)
- master ring buffer threshold interrupt (P1)
- API for GPU to CPU timestamp conversion (P1)
- vmid/pid/uid based filtering (P1)

Change-Id: I8e39c648221ee0fa09d5df8524b03dca83fe24f3
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1022737
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-23 07:48:47 -07:00
Aingara Paramakuru
82da6ed595 gpu: nvgpu: add support to set channel timeslice
As part of improving GPU scheduling, userspace can now set a
channel's timeslice, within reasonable limits imposed by the
kernel driver.

JIRA VFND-1312
Bug 1729664

Change-Id: I4c3430c43437889b8685f12988d4b967bb7877bb
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1020917
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-22 10:42:45 -07:00
Richard Zhao
97108797a2 gpu: nvgpu: enable semaphore acquire timeout only when timeouts_enabled is set
Bug 1727687

Change-Id: I7a7a4a2011b029474122fdbfbeb02b6302a5902b
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/1011486
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-22 10:19:23 -07:00
Aingara Paramakuru
2a58d3c27b gpu: nvgpu: improve channel interleave support
Previously, only "high" priority bare channels were interleaved
between all other bare channels and TSGs. This patch decouples
priority from interleaving and introduces 3 levels for interleaving
a bare channel or TSG: high, medium, and low. The levels define
the number of times a channel or TSG will appear on a runlist (see
nvgpu.h for details).

By default, all bare channels and TSGs are set to interleave level
low. Userspace can then request the interleave level to be increased
via the CHANNEL_SET_RUNLIST_INTERLEAVE ioctl (TSG-specific ioctl will
be added later).

As timeslice settings will soon be coming from userspace, the default
timeslice for "high" priority channels has been restored.

JIRA VFND-1302
Bug 1729664

Change-Id: I178bc1cecda23f5002fec6d791e6dcaedfa05c0c
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1014962
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-15 16:23:44 -07:00
Konsta Holtta
f07a046a52 gpu: nvgpu: validate wait notification offset
Make sure that the notification object fits within the supplied buffer.

Bug 1739182

Change-Id: Ifb66f848e3758438f37645be6f534f5b60260214
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1026431
(cherry picked from commit 2484c47f123c717030aa00253446e8756e1a0807)
Reviewed-on: http://git-master/r/1030875
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-15 16:22:32 -07:00
Konsta Holtta
ec023c3ff7 gpu: nvgpu: validate error notifier offset
Make sure that the notifier object fits within the supplied buffer.

Bug 1739183
Bug 1739932

Change-Id: I713574ce797ffc23cec10b5114f469dbadc68f1e
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1026410
(cherry picked from commit f476b93eb19b962b8760457102448bd533efc54d)
Reviewed-on: http://git-master/r/1028737
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-03-15 16:21:44 -07:00
Deepak Nibade
8b665ac6b2 gpu: nvgpu: move clean up of jobs to separate worker
We currently clean up the jobs in gk20a_channel_update()
which is called from nvhost worker thread

Instead of doing this, schedule another delayed worker thread
clean_up_work to clean up the jobs (with delay of 1 jiffies)

Keep update_gp_get() in channel_update() and not in
delayed worker since this will help in better book
keeping of gp_get

Also, this scheduling will help delay job clean-up so
that more number of jobs are batched for clean up
and hence less time is consumed by worker

Bug 1718092

Change-Id: If3b94b6aab93c92da4cf0d1c74aaba756f4cd838
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931701
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-02-05 08:07:10 -08:00
Richard Zhao
8fb33d92b0 gpu: nvgpu: vgpu: add channel_set_priority support
- add gops.fifo.channel_set_priority and move current code
  as native callback.
- implement the callback for vgpu

Bug 1701079

Change-Id: If1cd13ea4478d11d578da2f682598e0c4522bcaf
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/932829
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-25 15:22:22 -08:00
Deepak Nibade
cd09ac26c7 gpu: nvgpu: move resetup_ramfc() out of sync_lock
We currently have this sequence :
- acquire sync_lock
- sync_create
- resetup_ramfc()
- release sync_lock

but this can lead to deadlock in case resetup_ramfc()
triggers below stack :
- resetup_ramfc()
  - channel_preempt() - preemption fails
  - trigger recovery
  - channel_abort()
    - acquire sync_lock

Fix this by moving resetup_ramfc() out of sync_lock.

resetup_ramfc() is still protected by submit_lock and
hence we cannot free sync after allocation and
before resetup

Bug 200165811

Change-Id: Iebf74d950d6f6902b6d180c2cd8cd2d50493062c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931726
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-21 07:52:11 -08:00
Richard Zhao
7095a72e56 gpu: nvgpu: fix tsg bugs
- correct runlist entry type for tsg
- consider tsg when preempt channel

Bug 1617046

Change-Id: Ie067df17fb53ae91c49403637a5f35fc3710e0b3
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/926571
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-19 08:34:55 -08:00
Deepak Nibade
e153458aad gpu: nvgpu: return ENOSPC if no private command buffer space
If we run out of gpfifo space or private command
buffer space, we currently return EAGAIN as error code

Instead of EAGAIN, return ENOSPC as error code so
that caller (user space) can read the error code
and do some re-trials

As the jobs are processed, it is possible to free up
some space. And hence such re-trials could succeed

Bug 1715291

Change-Id: I9a2ed7134d2496b383899b3c02c0e70452b26115
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929402
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
2016-01-12 23:23:43 -08:00
Deepak Nibade
f7285577d7 gpu: nvgpu: enable wdt for each channel open
We currently enable per-channel wdt flag at channel
initialization time only

But if process disables channel's wdt via per-channel
IOCTL, and closes the channel without re-enabling it,
we leave the wdt disabled on that channel

And if same channel is assigned to some other process,
then that process might have wdt disabled already

Fix this by setting ch->wdt_enabled = true during
gk20a_open_new_channel()

Bug 200165797

Change-Id: I3ab482ce7cfbcbbd2178041f01f97457ff24f7bb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/931128
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
2016-01-12 23:22:36 -08:00
Deepak Nibade
43de9024fe gpu: nvgpu: API to post channel events
Add new API gk20a_channel_post_event() which adds
channel event and also calls wake_up() for channel's
semaphore wq

Bug 200156699

Change-Id: If56f1bf8edcce79c9248809f8476ed853b7d2d9d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927132
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-12 23:00:46 -08:00
Deepak Nibade
544873525d gpu: nvgpu: APIs to enable/disable TSG
export below APIs for TSGs :

gk20a_enable_tsg() - enable only TSG
gk20a_disable_tsg() - disable only TSG

gk20a_enable_channel_tsg() -
if channel is part of TSG, enable TSG
otherwise enable channel

gk20a_disable_channel_tsg() -
if channel is part of TSG, disable TSG
otherwise disable channel

Bug 200156699

Change-Id: Icdaca35235c3f323687f839fe32c6c5fe964b230
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927131
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-12 23:00:26 -08:00
Deepak Nibade
0ce201e8de gpu: nvgpu: stop timer on failing channel
In gk20a_channel_timeout_handler(), below deadlock scenario
is possible :

thread 1:
- take global lock g->ch_wdt_lock
- identify timed out channel (as ch1)
- check engine status which is stuck
- identify failing channel on engine as ch2
- we need to trigger recovery with ch2
- as part of recovery, call channel_abort() for ch2
- in channel_abort(), we wait to cancel the timer wq
- but timer wq for ch2 never completes due to thread 2

thread 2:
- ch2 has already timed out
- to process, we wait for global lock g->ch_wdt_lock
- this lock needs to be released by thread 1

To fix this, cancel the timer (through flag) of ch2
(failing channel on engine) before triggering recovery
on that channel

Bug 200164753

Change-Id: Idb42d01c8440a53f43cb5e87e41f1c283f7e8fcf
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929924
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-11 09:06:31 -08:00
Deepak Nibade
9713e3572a gpu: nvgpu: disable ctxsw instead of all engines activity
In gk20a_channel_timeout_handler(), we currently disable
all engine activity before checking for fence completion
and before we identify timed out channel

But disabling all engine activity could be overkill for
this process.

Also, as part of disabling engine activity we preempt
the channel on engine.
But it is possible that channel preemption times out
since channel has already timed out
And this can lead to races and deadlock

Hence, instead of disabling all engine activity, just
disable the context switch which should also do the
same trick

Bug 1716062

Change-Id: I596515ed670a2e134f7bcd9758488a4aa0bf16f7
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/929421
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-11 09:05:58 -08:00
Peter Pipkorn
2b064ce65e gpu: nvgpu: add high priority channel interleave
Interleave all high priority channels between all other channels.
This reduces the latency for high priority work when there
are a lot of lower priority work present, imposing an upper
bound on the latency. Change the default high priority timeslice
from 5.2ms to 3.0 in the process, to prevent long running high priority
apps from hogging the GPU too much.

Introduce a new debugfs node to enable/disable high priority
channel interleaving. It is currently enabled by default.

Adds new runlist length max register, used for allocating
suitable sized runlist.

Limit the number of interleaved channels to 32.

This change reduces the maximum time a lower priority job
is running (one timeslice) before we check that high priority
jobs are running.

Tested with gles2_context_priority (still passes)
Basic sanity testing is done with graphics_submit
(one app is high priority)

Also more functional testing using lots of parallel runs with:
NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw
 –drawsperframe 20000 –triangles 50 –runtime 30 –finish
plus multiple:
NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 -finish

Previous to this change, the relative performance between
high priority work and normal priority work comes down
to timeslice value. This means that when there are many
low priority channels, the high priority work will still
drop quite a lot. But with this change, the high priority
work will roughly get about half the entire GPU time, meaning
that after the initial lower performance, it is less likely
to get lower in performance due to more apps running on the system.

This change makes a large step towards real priority levels.
It is not perfect and there are no guarantees on anything,
but it is a step forwards without any additional CPU overhead
or other complications. It will also serve as a baseline to
judge other algorithms against.

Support for priorities with TSG is future work.
Support for interleave mid + high priority channels,
instead of just high, is also future work.

Bug 1419900

Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98
Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com>
Reviewed-on: http://git-master/r/805961
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-01-11 09:04:01 -08:00
Richard Zhao
a9c6f59539 gpu: nvgpu: enable semaphore acquire timeout
It'll detect dead semaphore acquire. The worst case is when
ACQUIRE_SWITCH is disabled, semaphore acquire will poll and
consume full gpu timeslicees.

The timeout value is set to half of channel WDT.

Bug 1636800

Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/928827
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
2016-01-10 20:07:53 -08:00
Deepak Nibade
c4ac1ed369 gpu: nvgpu: preempt before adjusting fences
Current sequence in gk20a_disable_channel() is
- disable channel in gk20a_channel_abort()
- adjust pending fence in gk20a_channel_abort()
- preempt channel

But this leads to scenarios where syncpoint has
min > max value

Hence to fix this, make sequence in gk20a_disable_channel()
- disable channel in gk20a_channel_abort()
- preempt channel in gk20a_channel_abort()
- adjust pending fence in gk20a_channel_abort()

If gk20a_channel_abort() is called from other API where
preemption is not needed, then use channel_preempt
flag and do not preempt channel in those cases

Bug 1683059

Change-Id: I4d46d4294cf8597ae5f05f79dfe1b95c4187f2e3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/921290
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-12-10 08:39:42 -08:00
Deepak Nibade
54f76d1ac6 gpu: nvgpu: call channel_update() always in channel_abort()
In gk20a_channel_abort(), we disable the channel and
adjust the fences. And then we call gk20a_channel_update()
only if we released the post-fence semaphore

Instead of this, call gk20a_channel_update() always to ensure
that all the clean up after fence completion is done always

Bug 1683059

Change-Id: Ife00ba43e808c0833264d79c98c74417ffadf802
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/842999
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-12-10 08:39:02 -08:00
Terje Bergstrom
e04e73c580 gpu: nvgpu: Immediate channel release
When closing channel, disable and preempt it immediately instead of
waiting for it to finish all work.

Bug 1683059

Change-Id: Ia5f5fc6a072dc3ddb1e9bf63534814ff0a60b5b4
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/836746
2015-12-10 08:38:20 -08:00
Deepak Nibade
52753b51f1 gpu: nvgpu: create sync_fence only if needed
Currently, we create sync_fence (from nvhost_sync_create_fence())
for every submit
But not all submits request for a sync_fence.

Also, nvhost_sync_create_fence() API takes about 1/3rd of the total
submit path.

Hence to optimize, we can allocate sync_fence
only when user explicitly asks for it using
(NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET &&
NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE)

Also, in CDE path from gk20a_prepare_compressible_read(),
we reuse existing fence stored in "state" and that can
result into not returning sync_fence_fd when user asked
for it
Hence, force allocation of sync_fence when job submission
comes from CDE path

Bug 200141116

Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/812845
(cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98)
Reviewed-on: http://git-master/r/837662
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-12-08 01:18:04 -08:00
Deepak Nibade
937de14907 gpu: nvgpu: move check_gp_put() and update_gp_get() to worker
We currently call check_gp_put() and update_gp_get()
in submit path and this takes about 5uS for both checks
check_gp_put() - 3.5 uS
update_gp_get() - 1.5 uS

But this book-keeping can be moved to gk20a_channel_update()
to save some submit time

Note that check_gp_put() needs to be done inside submit
lock

Bug 200141116

Change-Id: I276400111be0421eb673695e2f2899ff52e344b4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/839232
(cherry picked from commit 289617e8bf01bde9aab45dfa3a1c6a1241e6eb78)
Reviewed-on: http://git-master/r/839713
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-12-08 01:17:21 -08:00
Deepak Nibade
632a86f7cd gpu: nvgpu: IOCTL to disable watchdog per-channel
Add IOCTL NVGPU_IOCTL_CHANNEL_WDT to disable/enable
watchdog per-channel

Also, if watchdog is disabled, we currently schedule
the worker with MAX timeout.
Instead of this, do not schedule any worker if
watchdog is disabled

Bug 1683059
Bug 1700277

Change-Id: I7f6bec84adeedb74e014ed6d1471317b854df84c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/837962
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-30 08:48:59 -08:00
Deepak Nibade
c45f8cadee gpu: nvgpu: fix possible memory leak
Fix possible memory leak in error condition
Coverity id : 20022

Bug 1703084

Change-Id: Ie1085a6e9c206ba0d71fe6677986df2a357bb5d1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/838776
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-11-30 01:31:04 -08:00
Deepak Nibade
10f6da09eb gpu: nvgpu: fix Coverity issues
- operands not affecting result (id = 12845)
- logically dead code (id = 12890)
- dereference after null check (id = 12968)
- unsigned compared to 0 (id = 13176)
- resource leak (id = 13338, 18673)
- unused pointer value (id = 13916)

Bug 1703084

Change-Id: I2f401dd93126af27748c53fa1b3a59cb154af36b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/835143
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
2015-11-25 00:45:58 -08:00
Deepak Nibade
7f79d647d6 gpu: nvgpu: set aggressive_sync_destroy at runtime
We currently set "aggressive_destroy" flag to destroy
sync object statically and for each sync object

Move this flag to per-platform structure so that it
can be set per-platform for all the sync objects

Also, set the default value of this flag as "false"
and set it to "true" once we have more than 64
channels in use

Bug 200141116

Change-Id: I1bc271df4f468a4087a06a27c7289ee0ec3ef29c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/822041
(cherry picked from commit 98741e7e88066648f4f14490c76b61dbff745103)
Reviewed-on: http://git-master/r/835800
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-23 08:43:49 -08:00
Deepak Nibade
2d40ebb1ca gpu: nvgpu: rework private command buffer free path
We currently allocate private command buffers (wait_cmd
and incr_cmd) before submitting the job but we never
free them explicitly.
When private command queue of the channel is full, we
then try to recycle/remove free command buffers.
But this recycling happens during submit path, and
hence that particular submit path takes much longer

Rework this as below :
- add reference of command buffers to job structure
- when job completes, free the command buffers
  explicitly
- remove the code to recycle buffers since it should
  not be needed now

Note that command buffers need to be freed in order of
their allocation. Ensure this with error print before
freeing the command buffer entry

Bug 200141116
Bug 1698667

Change-Id: Id4b69429d7ad966307e0d122a71ad55076684307
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/827638
(cherry picked from commit c6cefd69b71c9b70d6df5343b13dfcfb3fa99598)
Reviewed-on: http://git-master/r/835802
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-23 08:33:01 -08:00
Deepak Nibade
f50d0ffb15 gpu: nvgpu: support skipping buffer refcounting in submit
In job submission path, we always take refcount on all
the mapped buffers to safeguard against case where user
space releases the buffer early

But in case user space itself is doing proper buffer
management, kernel need not take refcounts on all the
buffers - which is also a overhead in submit path

Hence, provide a new submit flag
NVGPU_SUBMIT_GPFIFO_FLAGS_SKIP_BUFFER_REFCOUNTING to
optionally skip taking refcounts on all the buffers

Also, if we do not take refcounts, then no need to drop
any refcounts in gk20a_channel_update() as well

Bug 1698667
Bug 200141116

Change-Id: I81bb7a03240300b691c70bcec04ea1badd5934f4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/824718
(cherry picked from commit 8c8978fa303ec4e6db0233becdbdcbad4a248173)
Reviewed-on: http://git-master/r/835801
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-23 08:32:39 -08:00
Deepak Nibade
67fe5f6d73 gpu: nvgpu: remove temporary gpfifo allocation in submit path
In GPU job submit path gk20a_ioctl_channel_submit_gpfifo(),
we currently allocate a temporary gpfifo, copy user space
gpfifo content into this temporary buffer, and then copy
temp buffer content into channel's gpfifo.

Allocation/copy/free of temporary buffer adds additional
overhead

Rewrite this sequence such that gk20a_submit_channel_gpfifo()
can receive either a pre-filled gpfifo or pointer to
user provided args.
And then we can direclty copy the user provided gpfifo
into the channel's gpfifo

Also, if command buffer tracing is enabled, we still need
to copy user provided gpfifo into temporaty buffer for reading
But that should not cause overhead in real world use case

Bug 200141116

Change-Id: I7166c9271da2694059da9853ab8839e98457b941
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/823386
(cherry picked from commit 3e0702db006c262dd8737a567b8e06f7ff005e2c)
Reviewed-on: http://git-master/r/835799
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-23 08:30:13 -08:00
Richard Zhao
f25f21c231 gpu: nvgpu: remove shift initial value 3 when set runlist timeslice
Bug 1695718

Change-Id: I595694e4a92ccaf8b8cd389e215d28709ed98c50
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/827829
(cherry picked from commit be5397109404b632c51a57758ee0dcdd2d2f5cc7)
Reviewed-on: http://git-master/r/831556
GVS: Gerrit_Virtual_Submit
Reviewed-by: Kirill Artamonov <kartamonov@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-12 07:57:00 -08:00
Seshendra Gadagottu
19b3bd28b3 gpu: nvgpu: use platform data for ptimer source rate
Instead of depending on clock frame-work, use platform data
for ptimer source rate. Removed ptimerscaling10x platform
data, and use ptimer source frequency to calculate
ptimerscaling factor.

Reviewed-on: http://git-master/r/819030
(cherry picked from commit dd291334d54dab80cab7eb1656dffc48a59610b4)

Change-Id: I7638ce9875a6e440bbfc2ba2da0d0b094b2700ff
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/827300
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-04 14:38:34 -08:00
Deepak Nibade
9592a4e6fc gpu: nvgpu: IOCTL to set TSG timeslice
Add new IOCTL NVGPU_IOCTL_TSG_SET_PRIORITY to allow
setting timeslice for entire TSG

Return error from channel specific IOCTL_CHANNEL_SET_PRIORITY
if the channel is part of TSG

Separate out API gk20a_channel_get_timescale_from_timeslice()
to get timeslice_timeout and scale from timeslice period

Use this API to get timeslice_timeout and scale for TSG and
store it in tsg_gk20a structure

Then trigger runlist update so that new timeslice values
will be re-written to runlist for TSG

Bug 200146615

Change-Id: I555467d034f81b372b31372f0835d72b1c159508
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/824206
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-11-03 14:20:08 -08:00
Janne Hellsten
057c6334f7 gpu: nvgpu: Get rid of legacy gpfifo type
Get rid of the duplicate gpfifo struct to emphasize the fact that
nvgpu_gpfifo is the only memory layout for gpfifo entries that works.
This is the same layout that HW uses.

Also, add a local pointer to the gpfifo memory in
gk20a_submit_channel_gpfifo to get rid of repeated typecasts.

Bug 1592391
Bug 1550886

Change-Id: I5432859ef8e7c1aab5907e44098994d7bb807f50
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/677341
(cherry picked from commit 724c8c6228af81dd440e825bddf545dd6b2b8bd7)
Reviewed-on: http://git-master/r/822548
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2015-10-27 08:40:31 -07:00