When system is in low memory, kzalloc will fail if
kernel requests more than PAGE_SIZE continous memory block.
Bug 200096099
Change-Id: I44e217ffa6aa6c453a4d4afba45a8ee3b5756cc1
Signed-off-by: Kerwin Wan <kerwinw@nvidia.com>
Reviewed-on: http://git-master/r/732197
(cherry picked from commit 62861976421415f93e98a0a9f977ac1f66046714)
Reviewed-on: http://git-master/r/737057
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Tested-by: Krishna Reddy <vdumpa@nvidia.com>
Implement NVGPU_AS_IOCTL_GET_BUFFER_COMPBITS_INFO for requesting info
on compbits-mappable buffers; and NVGPU_AS_IOCTL_MAP_BUFFER_COMPBITS,
which enables mapping compbits to the GPU address space of said
buffers. This, subsequently, enables moving comptag swizzling from GPU
to CDEH/CDEV formats to userspace.
Compbits mapping is conservative and it may map more than what is
strictly needed. This is because two reasons: 1) mapping must be done
on small page alignment (4kB), and 2) GPU comptags are swizzled all
around the aggregate cache line, which means that the whole cache line
must be visible even if only some comptag lines are required from
it. Cache line size is not necessarily a multiple of the small page
size.
Bug 200077571
Change-Id: I5ae88fe6b616e5ea37d3bff0dff46c07e9c9267e
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/719710
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement a new buddy allocation scheme for the GPU's VA space.
The bitmap allocator was using too much memory and is not a scaleable
solution as the GPU's address space keeps getting bigger. The buddy
allocation scheme is much more memory efficient when the majority
of the address space is not allocated.
The buddy allocator is not constrained by the notion of a split
address space. The bitmap allocator could only manage either small
pages or large pages but not both at the same time. Thus the bottom
of the address space was for small pages, the top for large pages.
Although, that split is not removed quite yet, the new allocator
enables that to happen.
The buddy allocator is also very scalable. It manages the relatively
small comptag space to the enormous GPU VA space and everything in
between. This is important since the GPU has lots of different sized
spaces that need managing.
Currently there are certain limitations. For one the allocator does
not handle the fixed allocations from CUDA very well. It can do so
but with certain caveats. The PTE page size is always set to small.
This means the BA may place other small page allocations in the
buddies around the fixed allocation. It does this to avoid having
large and small page allocations in the same PDE.
Change-Id: I501cd15af03611536490137331d43761c402c7f9
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740694
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
On linsim, when the push buffers are allowed to be allocated with small
pages above 4GB the simulator crashes. This patch ensures that for
linsim all small page allocations are forced to be below 4GB in the
GPU VA space. By doing so the simulator no longer crashes.
This bug has come up because the GPU buddy allocator work generates
allocations at the top of the address space first. Thus push buffers
were located at between 12GB and 16GB in the GPU VA space.
Change-Id: Iaef0af3fda3f37ac09a66b5e1179527d6fe08ccc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740728
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The number of entries in the next level PDE data structure was one
half of what was needed since the bit shift was 1 bit too small.
Change-Id: Id4981f230dd206ae94336cddab117312e143e6a1
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740727
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Record size of each page table level. The size of level 0 depends
on size of the address space, and we generally do not support the
whole address space.
Change-Id: Iab47505af1a641e193d9e98a2246e522813f221a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/729730
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-on: http://git-master/r/737531
Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: Idf51831e8be9cabe1ab9122b18317137fde6339f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/721030
Reviewed-on: http://git-master/r/737530
Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
When mapping buffer on a fixed address, ensure that the alignment of
buffer and the address are compabile. When freeing, retrieve page
size from the VA instead of choosing it again.
Bug 1605769
Change-Id: I4f73453996cd53a912b6a414caa41563cde28da7
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/725764
Define smallest compressible page size per SoC, and use that for
determining if a compressible kind should be downgraded to
uncompressed.
Bug 1605769
Change-Id: I7c9991ba0ae82fe533641f045e506c0b01a10d8b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/724492
Sparse buffers were allowed only with big pages. That restriction is
not necessary, so remove it.
Bug 1605769
Change-Id: I92efc0efe80edccead47b47d33fd9a75c921ca9a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/725763
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I10c226e2377aa867a5cf11be61d08a9d67206b1d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720507
Use sg_alloc_table_for_pages to create sg_table for buffers
allocated with NO_KERNEL_MAPPING. The old code returned always an
sg_table with one chunk. sg_alloc_table_for_pages returns an
sg_table which describes the actual physical chunks of the buffer.
Bug 1605769
Change-Id: I412c0151d830fa0f53dbbb08ba8cc9ebce6699e3
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/723696
Add platform specific API pointer (*get_iova_addr)()
which can be used to get iova/physical address from
given scatterlist and flags
Use this API with g->ops.mm.get_iova_addr() instead
of calling API gk20a_mm_iova_addr() which makes it
platform specific
Bug 1605653
Change-Id: I798763db1501bd0b16e84daab68f6093a83caac2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713089
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Instead of comptag index we were dumping an offset in buffer.
Change-Id: Iaa07919c8d87009227556eacbcb6dcbd83954c7d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/718597
Reviewed-by: Automatic_Commit_Validation_User
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I7c1662b669ed8c86465254f6001e536141051ee5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720435
Reduce amount of duplicate code around memory allocation by
introducing a variant of allocation helper that does not map the
allocated buffer to kernel address space.
NO_KERNEL_MAPPING allocations return a struct page **, so store the
results of allocation in a new field of mem_desc.
Bug 1605769
Change-Id: Ib760b9e6d34b229b04d1fb4f3abf10648670fc69
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/721029
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.
Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
Added support for dumping vpr/wpr info for gm20b.
This dump info called when ever gk20a_mm_fb_flush
is timed-out.
Bug 200082817
Change-Id: I21b0372d0e3f976a189c9c428c015165b715bf88
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/711439
(cherry picked from commit b69897d71c8f6119b49ceb8d3273cdb354178cc5)
Reviewed-on: http://git-master/r/712675
GVS: Gerrit_Virtual_Submit
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Fix below sparse warnings by making below APIs static :
kernel/drivers/gpu/nvgpu/gk20a/mm_gk20a.c:1795:5:
warning: symbol 'update_gmmu_pde_locked' was not declared. Should it be
static?
kernel/drivers/gpu/nvgpu/gk20a/mm_gk20a.c:1841:5:
warning: symbol 'update_gmmu_pte_locked' was not declared. Should it be
static?
Bug 200067946
Change-Id: I8158aaf503378b176cfd5cc129db9557803003c1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713024
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
don't unmap compbits_buf explicitly from system vm early but store it in
the dmabuf's private data, and unmap it later when all user mappings to
that buffer have been disappeared.
Bug 1546619
Change-Id: I333235a0ea74c48503608afac31f5e9f1eb4b99b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/661949
(cherry picked from commit ed2177e25d9e5facfb38786b818330798a14b9bb)
Reviewed-on: http://git-master/r/661835
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Pass always the directory structure to mm functions instead of
pointers to members to it. Also split update_gmmu_ptes_locked()
into smaller functions, and turn the hard
coded MMU levels (PDE, PTE) into run-time parameters.
Change-Id: I315ef7aebbea1e61156705361f2e2a63b5fb7bf1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/672485
Reviewed-by: Automatic_Commit_Validation_User
Map/unmap buffers for HWPM and deal with its instance block, the minimum
work required to run the HWPM via regops from userspace.
Bug 1517458
Bug 1573150
Change-Id: If14086a88b54bf434843d7c2fee8a9113023a3b0
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/673689
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
We have infinite timeouts for loops in emulation. Some functions with
the loops still return error if we exceed the original retry count.
Change-Id: I1f9ddbfc0acd9f30f6bd49d9e748d8d8fbefa154
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/709491
Introduce a new struct gk20a_mm_entry. Allocate and store PDE and PTE
arrays using the same structure. Always pass pointer to this struct
when possible between functions in memory code.
Change-Id: Ia4a2a6abdac9ab7ba522dafbf73fc3a3d5355c5f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/696414
Use busy looping on L2 and TLB maintenance operations. This speeds
them up by an order of magnitude.
Add also trace points to measure performance for memory ops and
interrupt processing.
Change-Id: Ic4a8525d3d946b2b8f57b4b8ddcfc61605619399
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/681640
Pass vm instead of as share to the userspace buffer mapping functions,
since they need to be called also from other places than just the AS
device ioctls, and as share is specific to them.
Bug 1573150
Change-Id: I994872f23ea7b1582361f3f4fabbd64b4786419c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/674020
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Compression page size varies depending on architecture. Make it
129kB on gk20a and gm20b.
Also export some common functions from gm20b.
Bug 1592495
Change-Id: Ifb1c5b15d25fa961dab097021080055fc385fecd
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/673790
Decrement the ref count on all mapped_buffers belonging to a va_node
when a va_node is freed. This prevents userspace from leaking some
mapped_buffers in some cases.
This does prevent userspace from keeping a buffer around after freeing
a space allocation if the buffer in question is not otherwise ref
counted. Not sure if this is a bad thing for userspace or not.
Bug 1600686
Change-Id: I659ccbda5935d44086fd367bd2110f7d0f066194
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/676629
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement several fixes for allowing the GVA address space to grow
to larger than 32GB and increase the address space to 128GB.
o Implement dynamic allocation of PDE backing pages. The memory
to store the PDE entries was hard coded to 1 page. Now the
number of pages necessary is computed dynamically based on the
size of the address space and the size of large pages.
o Fix an arithmetic problem in the gm20b sparse texture code
that caused large address spaces to be truncated when sparse
PDEs/PTEs were being filled in. This caused a kernel panic
when freeing the address space since a lot of the backing
PTE memory was not allocated.
o Change the address space split for large and small pages. Small
pages now occupy the bottom 16GB of the address space. Large
pages are used for the rest of the address space. Now, with a
128GB address space, there are 112GB of large page GVA available.
This patch exists to allow large (16GB) sparse textures to be allocated
without running into lack of memory issues and kernel panics.
Bug 1574267
Change-Id: I7c59ee54bd573dfc53b58c346156df37a85dfc22
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/671204
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix a memory leak: add the newly created state to the dmabuf priv's
state list, instead of the other way around.
Bug 1594784
Bug 200064154
Change-Id: I939746a254bb8bf4d06de7fcecba06c191da665f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/668758
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Lauri Peltonen <lpeltonen@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Remove support for gk20a sparse textures. We're using implementation
from user space, so gk20a code is never invoked.
Also removes ref_cnt for PTEs, so we never free PTEs when unmapping
pages, but only at VM delete time.
Change-Id: I04d7d43d9bff23ee46fd0570ad189faece35dd14
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/663294