This is done to boost performance of the GPU submit time, which
is critical for compute use-cases.
Bug 200215465
Bug 1804898
Conflicts:
drivers/gpu/nvgpu/gk20a/channel_gk20a.c
Change-Id: Ic4884ee4eac910b92b84a47fdc1b2e9f26b2f1f0
Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/1199860
Reviewed-on: http://git-master/r/1209834
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
adding 'ifdef CONFIG_DEBUG_FS' check to fix following compilation error
when CONFIG_DEBUG_FS=n(which is used for Android 'production' build):
mm_gk20a.c: In function 'gk20a_mm_debugfs_init':
mm_gk20a.c:4824:2: error: implicit declaration of function
'debugfs_create_x64' [-Werror=implicit-function-declaration]
Bug 1778001
Change-Id: I785288a37b96c391b84925d5971d2691cf80206e
Signed-off-by: David Pu <dpu@nvidia.com>
Reviewed-on: http://git-master/r/1210393
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Instead of having the debug prints from the allocators be
warnings they should be just regular prints.
Bug 1799159
Change-Id: Ic6e3c38fa286c4acd6fcba51dc59158dc2d655fc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1201372
(cherry picked from commit 107caf4ce68a7c76023ee1e66a98c5570f401059)
Reviewed-on: http://git-master/r/1208478
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
One of the flags that is defined for allocators has not yet been
imlpemented. This clarifies the comment and explains why the flag
has been defined even though it is not yet implemented.
Bug 1799159
Change-Id: I1e84439d63ca391941cee8e5362ffd9cc959744b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1201371
(cherry picked from commit 8e6566b173f17d9c169a9fa0f6104f4bbf608dc1)
Reviewed-on: http://git-master/r/1208477
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add checks to make sure function pointers are valid before attempting
to call said function.
Also, ensure that any allocator created defines the following 3 functions
at minimum:
alloc()
free()
fini()
Bug 1799159
Change-Id: I4cd3d5746ccb721c723a161c9487564846027572
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1200059
(cherry picked from commit e26557a49d7ca6629ada24f12a3be396b0ae22cd)
Reviewed-on: http://git-master/r/1208476
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a new debug message type: gpu_dbg_map_v. This is used for mapping
messages that are not specifically memory map operations.
Also cleanup the memory mapping debugging a bit since there was one
duplicate print and the memory map print was difficult to parse
visually. As a result the message has been modified to put the most
important information first in an easily readable format.
Bug 1732449
JIRA DNVGPU-12
Change-Id: Ib19c9371ee958009ab5a2d89b9610e699d070ee2
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1198593
(cherry picked from commit 51dba53b06ca171cdb13d1707f2d026b0ce29f07)
Reviewed-on: http://git-master/r/1147670
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement an allocator suitable for managing the video memory on dGPUs.
It works by allocating chunks from an underlying buddy allocator and
collating the chunks together (similar to what an sgt does in the
wider Linux kernel). This handles the ability to get large buffers in
potentially fragmented memory. The GMMU can then obviously map the
physical vidmem into contiguous GVA spaces.
Jira DNVGPU-96
Change-Id: Ic1d7800b033a170b77790aa23fad6858443d0e89
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1197203
(cherry picked from commit fa44684a843956ae384fef6d7a79b9cbbd04f73e)
Reviewed-on: http://git-master/r/1185231
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Flushing timestamp record method can fail in case FECS is not
processing the main method queue. In particular, this occurs
in case of ctxsw timeout, where we process fifo sched interrupts
from the host, but FECS is still waiting for idle (grWFI).
In such scenario, this adds huge delay in fifo recovery
procedure (timeout on FECS method). Since flushing the last
(incomplete) record from FECS would only be useful in that case
(context switch ongoing), remove flush operation on engine
reset. Note that an explicit ENGINE_RESET event (with pid)
is inserted in user-facing ctxsw buffer on engine reset.
Bug 200228310
Change-Id: I885525f8f197f81266b50db161bb511867fc74f4
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1207305
(cherry picked from commit 44391b6204fd648949295f90481b0c424d9a5ddf)
Reviewed-on: http://git-master/r/1208414
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
If a channel is part of a TSG, report TSG's
interleave in debugfs for sched parameters.
Bug 200228310
Change-Id: I2eeee7aacfa92f9d5fc367225a23a663ca6ac593
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1207304
(cherry picked from commit 1950ae679f112dcf24a7f3c695d4ab098de10326)
Reviewed-on: http://git-master/r/1208413
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
While collecting failing engine data, id type (is_tsg) was not
set for ctxsw and save engine states. This could result in some
ctxsw timeout interrupts to be ignored (id reported with wrong
is_tsg).
For TSGs, check if we made some progress on any of the channels
before kicking fifo recovery.
Bug 200228310
Jira EVLR-597
Change-Id: I231549ae68317919532de0f87effb78ee9c119c6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1204035
(cherry picked from commit 7221d256fd7e9b418f7789b3d81eede8faa16f0b)
Reviewed-on: http://git-master/r/1204037
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Moving getting constant attributes into one cmd which will be
called only once.
This patch adds basic infrastructure and gpu arch info, max_freq
and num_channels support.
JIRA VFND-2103
Change-Id: I100599b49f29c99966f9e90ea381b1f3c09177a3
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/1189832
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
Move vgpu private data to a dedicated structure and allocate it
at probe time. Also add virt_handle helper function which is used
everywhere.
JIRA VFND-2103
Change-Id: I125911420be72ca9be948125d8357fa85d1d3afd
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/1185206
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
We currently store fault_id into fifo.deferred_fault_engines
and use that in gk20a_fifo_reset_engine() which is incorrect
Also, in deferred engine reset path during channel close,
we do not check if channel is loaded on engine or not
fix this with below
- store engine_id bits into fifo.deferred_fault_engines
- define new API gk20a_fifo_deferred_reset() to perform
deferred engine reset
- get all engines on which channel is loaded with
gk20a_fifo_engines_on_id()
- for each set bit/engine_id in fifo.deferred_fault_engines,
check if channel is loaded on that engine, and if yes,
reset the engine
Bug 1791696
Change-Id: I1b8b1a9e3aa538fe6903a352aa732b47c95ec7d5
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1195087
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
We initialized vidmem allocator with base=4K, and size of 4GB. This
caused allocator to allocate addresses between 4K and 4GB+4K, causing
a physical MMU fault.
Bug 1793810
Change-Id: I554f62aeee4080acd86ef2c8011089ec9b8120df
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1196300
(cherry picked from commit 41a860e21c6da3f8fda58ceb56e78316f6987f53)
Reviewed-on: http://git-master/r/1200712
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Device nodes for PCI devices need to be RW for everybody.
Bug 200225622
Change-Id: I14de9d17f76ca45ba525d0c4f5e8d448bbfda98b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1198556
(cherry picked from commit 60bf9118715b61b8cd3f2379479caf280ae4e35c)
Reviewed-on: http://git-master/r/1200713
Reviewed-by: Automatic_Commit_Validation_User
We currently post bpt events (bpt.int and bpt.pause) even
before we process and clear the interrupts and this
could cause races with UMD
Fix this by posting bpt events only after we are done
processing the interrupts
Bug 200209410
Change-Id: Ic3ff7148189fccb796cb6175d6d22ac25a4097fb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1184109
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Initialize character array buf in gk20a_channel_ioctl() to zero
Keeping it uninitialized can result in leaking kernel stack
info to user space since we pass this buffer to UMD
Bug 1793398
Change-Id: Iffd654dbaca3b4e3c8fd2ac270d0febd01c165b8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1195862
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Disabling / enabling of PFIFO must stay inside the isr. It cannot be held
disabled outside the isr -- this causes any kind of preemption mechanism to
fail in the presence of an MMU fault until the channel resets the engine.
Bug 1791696
Change-Id: I16600a8571f6555262a75deb305c1d67eb29581a
Signed-off-by: Cory Perry <cperry@nvidia.com>
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1191026
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Added preemption mode (WFI, GFXP, CTA and CILP) support for gp10x
family gr class (PASCAL_B and PASCAL_COMPUTE_B).
Bug 200221149
Change-Id: I859a4d2db518bca0ffeb0d85a6bb271f6b15db87
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1193207
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Currently, host1x power refcount may decrement
to 0, while GPU is still powered on and we're still
servicing IRQs. So to prevent this situation,,
take a ref while GPU is being powered on, and
decrement it during power off. Since we are always
holding one reference while GPU is powered on,
we can remove this handling from gk20a_busy/idle()
Bug 200187507
Change-Id: I249a4527178537c1dc53d769411f53c4451352c3
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1172320
(cherry picked from commit 3e27e6a5820f5c1ad05596553d75e8979b71f1bd)
Reviewed-on: http://git-master/r/1172607
(cherry picked from commit 1e01a49fdc)
Reviewed-on: http://git-master/r/1185175
GVS: Gerrit_Virtual_Submit
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
This CL covers the following simple modification,
1) Linux kernel list implementation doesn't handle NULL pointer dereference
at list_del() api.
2) NULL validation before access the command buffer related operations.
This required for prevent the illegal/NULL memory access during
gk20a_ce_create_context_with_cb() failure case.
JIRA DNVGPU-53
Change-Id: I3ad178970ecb1485098124378bfc5256a9455ebd
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1184294
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Before calling prod settings functions, check for
availability of those functions.
Similar check is extended for get_clk_freqs.
Bug 1735760
Change-Id: Ic4b38079043ab2049a479a2d8bb0cb6091e94f4a
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1181571
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Include the buffer aperture flag (sysmem/vidmem/invalid) and the size of
the buffer and of the mapping in logging strings during gmmu map path.
Change-Id: Ie4c46bf9cb5db79b738571029d46ce8cbfc63f99
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1189492
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
When managing GVA spaces the buddy allocator requires PDE size
alignment. This is to ensure that PTE size in buddies always
remains consistent.
Consider the following hypothetical GVA space: it is 32 elements
long, order 0 block size is 1, and PDE size is 8. This leads to:
Base: 8
Size: 24
Managed space: [8, 32)
The start of the space will be 8 (base must be aligned to a PDE
and we need a hole at the bottom for handling errors). Size is
simply the max, 32, minus what we cut out for the low hole. The
two top level buddies are [8 -> 24), and [24 -> 32).
Now, suppose, instead the base were 4:
Base: 4
Size: 28
Managed space: [4, 32)
The top level buddies would be [4 -> 20), [20 -> 28), and [28 -> 32).
This presents several problems: none of the buddies are PDE aligned
and one top level buddy is smaller than the PDE size. The simplest
issue is how to determine the PTE size of the [28 -> 32) block. We
can just set it as small but that's not ideal. The bigger issue is
the mis-alignment of the larger buddies. [20 -> 28) is halfway in
one PDE and halfway in another. That means the allocator would have
to manage the two sub-buddies [20 -> 24) and [24 -> 28) separately.
Instead of dealing with the above issues in the allocator it is much
more simple to require that any GVA space is PDE aligned since they
are already massive and already, in practice, have this alignment.
Change-Id: I9eacd2db6485291db9f9f1d6c4c03c2a5c22de03
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1185137
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
In the buddy allocator the BALLOC_PTE_SIZE_* macros are inconsistent
with the gmmu_page_size_* enum. This patch makes the buddy allocator
use the gmmu_page_size_* fields and now has only BALLOC_PTE_SIZE_ANY
for when the allocator does not care about PTE size.
Change-Id: Idbe727b8208e1ace2b947d67f698c471782d5587
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1185136
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Correct the formula used to determine the range for BE registers
Bug 1778245
Change-Id: I5443b3e68d920cecd031a9b154ed90f26e5251b2
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1170602
(cherry picked from commit 813a08f1aa758d718987b4e6f2cf2ac8d15a1611)
Reviewed-on: http://git-master/r/1177828
(cherry picked from commit de8239a5c6241419b98276a5f549ed8cfd7f4cf9)
Reviewed-on: http://git-master/r/1181500
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
When mapping a userspace buffer, determine if it was vidmem allocated
from the aperture of the current gpu, and pass that information into
page tables.
Mapping a vidmem buffer to a gpu it wasn't allocated from is disallowed.
This includes mapping vidmem to igpus and to possibly other dgpus on the
system.
Jira DNVGPU-19
Change-Id: Ia9d2d0133e77659ab96b36ed61eeb4cd5a2b7dff
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169309
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Add NVGPU_GPU_IOCTL_ALLOC_VIDMEM to the ctrl fd for letting userspace
allocate on-board GPU memory (aka vidmem). The allocations are returned
as dmabuf fds.
Also, report the amount of local video memory in the gpu
characteristics.
Jira DNVGPU-19
Jira DNVGPU-38
Change-Id: I28e361d31bb630b96d06bb1c86d022d91c7592bc
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1181152
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Use the nvgpu-internal buddy allocator for video memory allocations,
instead of nvmap. This allows better integration for copyengine, BAR1
mapping to userspace, etc.
Jira DNVGPU-38
Change-Id: I9fd67b76cd39721e4cd8e525ad0ed76f497e8b99
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1181151
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>