This change makes the invocation of the deferred job clean-up
mechanism conditional. For submissions that require job tracking,
deferred clean-up is only required if any of the following
conditions are met:
1) Channel's deterministic flag is not set
2) Rail-gating is enabled
3) Channel WDT is enabled
4) Buffer refcounting is enabled
5) Dependency on Sync Framework
In case deferred clean-up is not needed, we clean-up
a single job tracking resource in the submit path. For
deterministic channels, we do not allow deferred clean-up to
occur and fail any submits that require it.
Bug 1795076
Change-Id: I4021dffe8a71aa58f12db6b58518d3f4021f3313
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1220920
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit b09f7589d5ad3c496e7350f1ed583a4fe2db574a)
Reviewed-on: http://git-master/r/1223941
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This change fixes up the calculation of gpfifo entries, to be
allocated depending on the ioctl used:
1) For the legacy ALLOC_GPFIFO ioctl, we preserve the calculation
of gpfifo entries within the kernel.
2) For the new ALLOC_GPFIFO_EX ioctl, we assume that userspace has
pre-calculated power-of-2 value. We process this value un-modified
and only verify that it is a valid power-of-2.
Bug 1795076
Change-Id: I8d2ddfdae40b02fe6b81e63dfd8857ad514a3dfd
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1220968
(cherry picked from commit c42396d9836e9b7ec73e0728f0c502b63aff70db)
Reviewed-on: http://git-master/r/1223937
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change adds a new ioctl flag,
NVGPU_SUBMIT_GPFIFO_FLAGS_DETERMINISTIC, which indicates that a
gpfifo submission must exhibit deterministic behavior within the
kernel.
For submissions that require job tracking and also set
this flag, we require the channel to have previously
pre-allocated job tracking resources.
Bug 1795076
Change-Id: I0496a2513c6c683fcda161b32db9e7ee6712d45c
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1210527
(cherry picked from commit 0a36a0ce3a6cbe398931993e742fc928f7b2c0aa)
Reviewed-on: http://git-master/r/1223935
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support for pre-allocation of job tracking resources
w/ new (extended) ioctl. Goal is to avoid dynamic memory
allocation in the submit path. This patch does the following:
1) Intoduces a new ioctl, NVGPU_IOCTL_CHANNEL_ALLOC_GPFIFO_EX,
which enables pre-allocation of tracking resources per job:
a) 2x priv_cmd_entry
b) 2x gk20a_fence
2) Implements circular ring buffer for job
tracking to avoid lock contention between producer
(submitter) and consumer (clean-up)
Bug 1795076
Change-Id: I6b52e5c575871107ff380f9a5790f440a6969347
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1203300
(cherry picked from commit 9fd270c22b860935dffe244753dabd87454bef39)
Reviewed-on: http://git-master/r/1223934
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change is the first of a series of changes to
support the usage of pre-allocated job tracking resources
in the submit path. With this change, we still maintain a
dynamically-allocated joblist, but make the necessary changes
in the channel_sync & fence framework to use in-place
allocations. Specifically, we:
1) Update channel sync framework routines to take in
pre-allocated priv_cmd_entry(s) & gk20a_fence(s) rather
than dynamically allocating themselves
2) Move allocation of priv_cmd_entry(s) & gk20a_fence(s)
to gk20a_submit_prepare_syncs
3) Modify fence framework to have seperate allocation
and init APIs. We expose allocation as a seperate API, so
the client can allocate the object before passing it into
the channel sync framework.
4) Fix clean_up logic in channel sync framework
Bug 1795076
Change-Id: I96db457683cd207fd029c31c45f548f98055e844
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1206725
(cherry picked from commit 9d196fd10db6c2f934c2a53b1fc0500eb4626624)
Reviewed-on: http://git-master/r/1223933
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
If User does not close the event fd created on channel/TSG,
it is possible that the event stays on channel/TSG
structure and reappears when channel/TSG is re-opened
This causes false failure when we try to enable some event
of channel/TSG since we do not allow enabling same event
twice on same channel/TSG
Fix this by removing all enabled events from channel/TSG
while closing it
Bug 200243092
Bug 1818654
Change-Id: I9d5ffc89f87cf4c44124f8015c2c2f0587ad2ef4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1237723
(cherry picked from commit 2737a5c86cf5fbfe8a04f6a87764e8ecb9b30555)
Reviewed-on: http://git-master/r/1238266
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
A use-after-free scenario is possible where one thread in
gk20a_free_error_notifiers() is trying to free the error
notifier and another thread in gk20a_set_error_notifier()
is still using the error notifier
Fix this by introducing mutex error_notifier_mutex for
error notifier accesses
Take mutex in gk20a_free_error_notifiers() and in
gk20a_set_error_notifier() before accessing notifier
In gk20a_init_error_notifier(), set the pointer
ch->error_notifier_ref inside the mutex and only
after notifier is completely initialized
Bug 1824788
Change-Id: I47e1ab57d54f391799f5a0999840b663fd34585f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1233988
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
We previously used to wait on the last_submit fence
before disabling a channel. Since this part of the
code is no longer exercised, we can remove this
tracking.
Bug 1795076
Change-Id: I54ba2ebaf48772aa775654c0fb4ab614a7167969
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1206585
Reviewed-by: Automatic_Commit_Validation_User
(cherry picked from commit e4e236f2b487b8cfa31f7afd29fad3c97de5f844)
Reviewed-on: http://git-master/r/1209166
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Free the hw_sema before releasing a channel's address space
binding when freeing a channel. Since the semaphore pool
can be freed after releasing the address space, we need
to do this earlier on.
Bug 1795076
Change-Id: Ic8ae7510af7be862feb6694130c6ce8fc0b8e411
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1208071
(cherry picked from commit 82a52fb6789b1c9361c1567f082ca36135287294)
Reviewed-on: http://git-master/r/1209165
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This change improves the aggressive sync creation
& destruction logic to avoid lock contention in
the submit path. It does the following:
1) Removes the global sync destruction (channel)
threshold, and adds a per-platform parameter.
2) Avoids lock contention in the clean-up/submit
path when aggressive sync destruction is disabled.
3) Creates sync object at gpfifo
allocation time (as long as we are not in aggressive
sync destroy mode), to enable faster first submits
Bug 1795076
Change-Id: Ifdb680100b08d00f37338063355bb2123ceb1b9f
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1202425
(cherry picked from commit ac0978711943a59c6f28c98c76b10759e0bff610)
Reviewed-on: http://git-master/r/1202427
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Submit job-tracking is necessary for any of the following
conditions:
- pre- or post-fence functionality
- channel wdt
- GPU rail-gating
- buffer refcounting
If none of the conditions are met, then job tracking is not
required and a fast submit can be done (ie. only need to
write out userspace GPFIFO entries and update GP_PUT).
Bug 1795076
Change-Id: If94d195e3a18a6b623e167829d291ec98a7a43a1
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1203511
(cherry picked from commit 13d7cfe94559dc52cb0bba7f9e48848e0858be81)
Reviewed-on: http://git-master/r/1223066
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move the submit synchornization code into it's own function. This should
help keep the submit code path a little more readable and understandable.
Bug 1732449
Reviewed-on: http://git-master/r/1203833
(cherry picked from commit f931c65c166aeca3b8fe2996dba4ea5133febc5a)
Change-Id: I4111252d242a4dbffe7f9c31e397a27b66403efc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221043
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Use gk20a_gmmu_alloc() in gk20a_alloc_inst_block() so that
we always try to allocate all inst blocks in vidmem first
Also use common API gk20a_alloc_inst_block() in
channel_gk20a_alloc_inst() as well
Jira DNVGPU-22
Change-Id: I6c47c19aae1189d7e57f47a51d21a32e2df53c1f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1216140
(cherry picked from commit 6c84961a50eb8a8b080b2db08f87e58143f5a6e8)
Reviewed-on: http://git-master/r/1219704
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Instead of blocking for gpfifo space in the nvgpu driver,
return -EAGAIN and allow userspace to decide the blocking
policy.
Bug 1795076
Change-Id: Ie091caa92aad3f68bc01a3456ad948e76883bc50
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1202591
(cherry picked from commit 8056f422c6a34a4239fc4993c40c2e517c932714)
Reviewed-on: http://git-master/r/1203800
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The channel timeout lock guards a very small critical section. Use a
spinlock instead of a mutex for performance.
Bug 1795076
Change-Id: I94940f3fbe84ed539bcf1bc76ca6ae7a0ef2fe13
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1200803
(cherry picked from commit 4fa9e973da141067be145d9eba2ea74e96869dcd)
Reviewed-on: http://git-master/r/1203799
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add support for cyclestats snapshots in the virtual case
Bug 1700143
JIRA EVLR-278
Change-Id: I376a8804d57324f43eb16452d857a3b7bb0ecc90
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1211547
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This is done to boost performance of the GPU submit time, which
is critical for compute use-cases.
Bug 200215465
Bug 1804898
Conflicts:
drivers/gpu/nvgpu/gk20a/channel_gk20a.c
Change-Id: Ic4884ee4eac910b92b84a47fdc1b2e9f26b2f1f0
Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/1199860
Reviewed-on: http://git-master/r/1209834
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
While collecting failing engine data, id type (is_tsg) was not
set for ctxsw and save engine states. This could result in some
ctxsw timeout interrupts to be ignored (id reported with wrong
is_tsg).
For TSGs, check if we made some progress on any of the channels
before kicking fifo recovery.
Bug 200228310
Jira EVLR-597
Change-Id: I231549ae68317919532de0f87effb78ee9c119c6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1204035
(cherry picked from commit 7221d256fd7e9b418f7789b3d81eede8faa16f0b)
Reviewed-on: http://git-master/r/1204037
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently store fault_id into fifo.deferred_fault_engines
and use that in gk20a_fifo_reset_engine() which is incorrect
Also, in deferred engine reset path during channel close,
we do not check if channel is loaded on engine or not
fix this with below
- store engine_id bits into fifo.deferred_fault_engines
- define new API gk20a_fifo_deferred_reset() to perform
deferred engine reset
- get all engines on which channel is loaded with
gk20a_fifo_engines_on_id()
- for each set bit/engine_id in fifo.deferred_fault_engines,
check if channel is loaded on that engine, and if yes,
reset the engine
Bug 1791696
Change-Id: I1b8b1a9e3aa538fe6903a352aa732b47c95ec7d5
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1195087
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Initialize character array buf in gk20a_channel_ioctl() to zero
Keeping it uninitialized can result in leaking kernel stack
info to user space since we pass this buffer to UMD
Bug 1793398
Change-Id: Iffd654dbaca3b4e3c8fd2ac270d0febd01c165b8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1195862
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Added interface to allow kernel to create privileged CE channels for
page migration and clearing support between sysmem and videmem.
JIRA DNVGPU-53
Change-Id: I3e18d18403809c9e64fa45d40b6c4e3844992506
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1173085
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
When processing FECS traces, a hash table is used
to retrieve the 'pid' of the process that created
the channel/TSG. Report process identifer (aka
tgid in kernel) instead of thread identifier (aka
pid) for FECS traces.
Bug 1736423
Change-Id: I54cb9d298b9fe3e1cccdd7145604cd01c5758c9d
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1166501
(cherry picked from commit f7fd1f6d7ad0753b787ec20604a08a1f4882fe6f)
Reviewed-on: http://git-master/r/1168728
(cherry picked from commit 97a62e5b89352fce576f1bca71b38bf2242ff047)
Reviewed-on: http://git-master/r/1177823
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Replace kfree with nvgpu_free in error handling path in
gk20a_alloc_channel_gpfifo where the gpfifo pipe buffer is being
allocated, because it's allocated with nvgpu_alloc.
Jira DNVGPU-21
Change-Id: I73100394b67da2ab064e4e9df6b430d818abce56
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1182401
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
For devices that have vidmem available, use the vidmem allocator in
gk20a_gmmu_alloc{,attr,_map,_map_attr}. For others, use sysmem.
Because all of the buffers haven't been tested to work in vidmem yet,
rename calls to gk20a_gmmu_alloc{,attr,_map,_map_attr} to have _sys at
the end to declare explicitly that vidmem is used. Enabling vidmem for
each now is a matter of removing "_sys" from the function call.
Jira DNVGPU-18
Change-Id: Ibe42f67eff2c2b68c36582e978ace419dc815dc5
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1176805
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
add gk20a_aperture_mask() for memory target selection now that buffers
can actually be allocated from vidmem, and use it in all cases that have
a mem_desc available.
Jira DNVGPU-76
Change-Id: I4353cdc6e1e79488f0875581cfaf2a5cfb8c976a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169306
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
It is possible that when we abort the channel, we have
job clean up worker running, which could race with abort
and sometimes result in below panic
[ 245.483566] Unable to handle kernel paging request at virtual address
800000000
...
[ 245.548991] PC is at gk20a_channel_abort_clean_up+0xb8/0x140
[ 245.554683] LR is at gk20a_channel_abort_clean_up+0xac/0x140
...
[ 247.301860] [<ffffffc000479390>]
gk20a_channel_abort_clean_up+0xb8/0x140
[ 247.312853] [<ffffffc0004794d4>] gk20a_channel_abort+0xbc/0xc8
[ 247.322970] [<ffffffc0004794f8>] gk20a_disable_channel+0x18/0x30
[ 247.333267] [<ffffffc000479628>] gk20a_free_channel+0x118/0x584
[ 247.343473] [<ffffffc000479aa0>] gk20a_channel_close+0xc/0x14
[ 247.353479] [<ffffffc000479b80>] gk20a_channel_release+0xd8/0x104
Fix this by cancelling the job clean up worker before aborting
the channel
Bug 1777281
Change-Id: Ic24c7c03b27cfb5cd164a52efdb1e2813a41a10a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1174416
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Revamp the support the nvgpu driver has for semaphores.
The original problem with nvgpu's semaphore support is that it
required a SW based wait for every semaphore release. This was
because for every fence that gk20a_channel_semaphore_wait_fd()
waited on a new semaphore was created. This semaphore would then
get released by SW when the fence signaled. This meant that for
every release there was necessarily a sync_fence_wait_async() call
which could block. The latency of this SW wait was enough to cause
massive degredation in performance.
To fix this a fast path was implemented. When a fence is passed to
gk20a_channel_semaphore_wait_fd() that is backed by a GPU semaphore
a semaphore acquire is directly used to block the GPU. No longer is
a sync_fence_wait_async() performed nor is there an extra semaphore
created.
To implement this fast path the semaphore memory had to be shared
between channels. Previously since a new semaphore was created
every time through gk20a_channel_semaphore_wait_fd() what address
space a semaphore was mapped into was irrelevant. However, when
using the fast path a sempahore may be released on one address
space but acquired in another.
Sharing the semaphore memory was done by making a fixed GPU mapping
in all channels. This mapping points to the semaphore memory (the
so called semaphore sea). This global fixed mapping is read-only to
make sure no semaphores can be incremented (i.e released) by a
malicious channel. Each channel then gets a RW mapping of it's own
semaphore. This way a channel may only acquire other channel's
semaphores but may both acquire and release its own semaphore.
The gk20a fence code was updated to allow introspection of the GPU
backed fences. This allows detection of when the fast path can be
taken. If the fast path cannot be used (for example when a fence is
sync-pt backed) the original slow path is still present. This gets
used when the GPU needs to wait on an event from something which
only understands how to use sync-pts.
Bug 1732449
JIRA DNVGPU-12
Change-Id: Ic0fea74994da5819a771deac726bb0d47a33c2de
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1133792
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
1) Keep the __user tag in the type of the user gpfifo when copying,
2) use NULL instead of 0 for initializing user_gpfifo pointer.
Bug 200067946
Change-Id: I631b4bca44ded0900204134338fa1d62d0017df0
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1168441
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Use gk20a_mem_*() accessors for gpfifo memory in work submission instead
of direct cpu accesses in order to support other apertures than sysmem.
The gpfifo memory is still allocated from sysmem for dgpus too.
Split the copying of priv_cmds and the main gpfifo to be submitted in
gk20a_submit_channel_gpfifo() into separate functions.
JIRA DNVGPU-21
Change-Id: If271ca8e7e34235f00d31855dbccf77c0008e10b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1145923
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Export gk20a_free_priv_cmdbuf() so that the channel_sync_gk20a.c code
can call this function. This is necessary for error paths in the
semaphore wait/incr functions.
Bug 1732449
JIRA DNVGPU-12
Change-Id: Id2ea13e5553d50475ee1bbf94781e18590321fdf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1162686
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Extend the existing NVGPU_GPU_IOCTL_OPEN_CHANNEL interface to allow
opening channels for other than the primary (i.e., the graphics)
runlists. This is required to push work to dGPU engines that have
their own runlists, such as the asynchronous copy engines and the
multimedia engines.
Minor change - Added active_engines_list allocation
and assignment for fifo_vgpu back end.
JIRA DNVGPU-25
Change-Id: I3ed377e2c9a2b4dd72e8256463510a62c64e7a8f
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1161541
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Rework how the messages in the channel timeout handler to be a little
bit more verbose and more clear about what is happening.
Bug 1732449
JIRA DNVGPU-12
Change-Id: Ifc018d99c647b3036caa8ad453e5e3dfc4151396
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1153669
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove the gp_get and gp_put pointers from the priv_cmdbuf code. These
pointers appear to track the position of th the priv_cmdbuf in the gp_fifo.
However, these pointers are not used for anything nor are they needed for
anything in the future. This code appears to be a relic left over from the
past.
Change-Id: Ibed1a6d51fa0cac12c5e0429760e8e2f611fc899
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1161859
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_event_id_poll(), we always set the mask value
and return it. This causes poll() from UMD to be always
successful irrespective of event is really generated or not
Fix this by adding a flag event_posted for each event
Set this flag while posting the event
In gk20a_event_id_poll(), set the mask value only if
this flag is set. If flag is set, set mask and clear the flag
Bug 200089620
Change-Id: If14236547c611fe4bfa1410ff5b69c9fa02d43bb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1160253
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This CL covers the following modification,
1) Added multiple engine_info support
2) Added multiple runlist_info support
3) Initial changes for ASYNC CE support
4) Added ASYNC CE interrupt handling support
for gm206 GPU family
5) Added generic mechanism to identify the
CE engine pri_base address for gm206
(CE0, CE1 and CE2)
6) Removed hard coded engine_id logic and
made generic way
7) Code cleanup for readability
JIRA DNVGPU-26
Change-Id: I2c3846c40bcc8d10c2dfb225caa4105fc9123b65
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1155963
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_fifo_abort_tsg(), we loop through channels of
TSG and call gk20a_channel_abort() for each channel
This is incorrect since we disable and preempt each
channel separately, whereas we should disable all channels
at once and use TSG specific API to preempt TSG
Fix this with below sequence :
- gk20a_disable_tsg() to disable all channels
- preempt tsg if required
- for each channel in TSG
- set has_timedout flag
- call gk20a_channel_abort_clean_up() to clean up channel state
Also, separate out common gk20a_channel_abort_clean_up() API
which can be called from both channel and TSG abort routines
In gk20a_channel_abort(), call gk20a_fifo_abort_tsg() if the
channel is part of TSG
Add new argument "preempt" to gk20a_fifo_abort_tsg() and
preempt TSG if flag is set
Bug 200205041
Change-Id: I4eff5394d26fbb53996f2d30b35140b75450f338
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1157190
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gr_gk20a_ctx_zcull_setup(), gr_gk20a_update_smpc_ctxsw_mode(),
and in gk20a_channel_suspend(), we call channel specific APIs
to disable/preempt/enable channel
But we do not consider TSGs in this case
Hence use correct (below) APIs in above functions which
will handle channel or TSG internally :
gk20a_disable_channel_tsg()
gk20a_fifo_preempt()
gk20a_enable_channel_tsg()
Bug 200205041
Change-Id: Ieed378dac4ad2322b35f9102706176ec326d386c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1157189
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>