Commit Graph

364 Commits

Author SHA1 Message Date
Konsta Holtta
8a15e02ca9 gpu: nvgpu: add NO_KERNEL_MAPPING for alloc_map_vid
Commit 8f3875393e ("abstract away dma
alloc attrs") added an implicit NVGPU_DMA_NO_KERNEL_MAPPING for the
function that allocates vidmem buffers, but the functions
gk20a_gmmu_alloc_map_vid and gk20a_gmmu_alloc_map got overlooked and use
no flags which now triggers a warning.  Make those cases alloc the
vidmem buffer with the no kernel mapping flag as it should have been,
because kernel mappings of vidmem are not supported.

Bug 1853519

Change-Id: I9f29c9d310f97c9bd5f279674decf61efb3e75ea
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1328953
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-27 13:56:38 -07:00
Terje Bergstrom
b45a67934f gpu: nvgpu: Use nvgpu_timeout for all loops
There were still a few remaining loops where we did not use
nvgpu_timeout and required Tegra specific functions for detecting if
timeout should be skipped. Replace all of them with nvgpu_timeout and
remove including chip-id.h where possible.

FE power mode timeout loop also used wrong delay value. It always
waited for the whole max timeout instead of looping with smaller
increments.

If SEC2 ACR boot fails to halt, we should not try to check ACR result
from mailbox. Add an early return for that case.

JIRA NVGPU-16

Change-Id: I9f0984250d7d01785755338e39822e6631dcaa5a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1323227
2017-03-27 10:48:31 -07:00
David Nieto
e0f2afe5eb gpu: nvgpu: refactor teardown to support unbind
This change refactors the teardown in remove to ensure that it is
possible to unload the driver while leaving fds open. This is achieved
by making sure that the SW state is kept alive till all fds are closed
and by checking that subsequent calls to ioctls after the teardown fail.

Normally, this would be achieved ny calls into gk20a_busy(), but in
kickoff we dont call into that to reduce latency, so we need to check
the driver status directly, and also in some of the functions
as we need to make sure the ioctl does not dereference the device or
platform struct

bug 200277762
JIRA: EVLR-1023

Change-Id: I163e47a08c29d4d5b3ab79f0eb531ef234f40bde
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1320219
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Shreshtha Sahu <ssahu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
2017-03-25 02:06:55 -07:00
Terje Bergstrom
81660ab58c gpu: nvgpu: Enable CE always
All GPUs have a copy engine. So delete the flag has_ce, because
it's always true.

JIRA NVGPU-16

Change-Id: I89db74c7cf66b24db84301b79832862ef28100b9
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1325355
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-24 09:08:23 -07:00
Terje Bergstrom
4492c62ffe gpu: nvgpu: Add bus HAL
Add bus HAL and move all bus related hardware sequencing to that file:
BAR1 binding, timer access, and interrupt handling.

Change-Id: Ibc5f5797dc338de10749b446a7bdbcae600fecb4
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1323353
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-23 08:44:07 -07:00
Seema Khowala
971a3c8050 gpu: nvgpu: add mm ops for mmu_fault_pending
This change is needed for t19x mmu fault handling.

Change-Id: I7f9190ab305f699401f6b0033b6a93dd8b4fc3cd
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/1315201
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-22 10:11:21 -07:00
Konsta Holtta
fcd7fce9bc gpu: nvgpu: use dma-attr wrappers for K4.9 compatibility
On kernel 4.9, the DMA API has changed, so use the NVIDIA compatibility
wrappers from dma-attrs.h to allow the code to build for both 4.4 and
4.9.

Bug 1853519

Change-Id: I0196936e81c7f72b41b38a67f42af0dc0b5518df
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1321102
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-03-21 15:01:48 -07:00
Konsta Holtta
8f3875393e gpu: nvgpu: abstract away dma alloc attrs
Don't use enum dma_attr in the gk20a_gmmu_alloc_attr* functions, but
define nvgpu-internal flags for no kernel mapping, force contiguous, and
read only modes. Store the flags in the allocated struct mem_desc and
only use gk20a_gmmu_free, remove gk20a_gmmu_free_attr. This helps in OS
abstraction. Rename the notion of attr to flags.

Add implicit NVGPU_DMA_NO_KERNEL_MAPPING to all vidmem buffers
allocated via gk20a_gmmu_alloc_vid for consistency.

Fix a bug in gk20a_gmmu_alloc_map_attr that dropped the attr
parameter accidentally.

Bug 1853519

Change-Id: I1ff67dff9fc425457ae445ce4976a780eb4dcc9f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1321101
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-03-21 15:01:47 -07:00
Terje Bergstrom
ca762e4220 gpu: nvgpu: Move all FB programming to FB HAL
Move all programming of FB to fb_*.c files, and remove the inclusion
of FB hardware headers from other files.

TLB invalidate function took previously a pointer to VM, but the new
API takes only a PDB mem_desc, because FB does not need to know about
higher level VM.

GPC MMU is programmed from the same function as FB MMU, so added
dependency to GR hardware header to FB.

GP106 ACR was also triggering a VPR fetch, but that's not applicable
to dGPU, so removed that call.

Change-Id: I4eb69377ac3745da205907626cf60948b7c5392a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1321516
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-17 08:44:03 -07:00
Konsta Holtta
7fd72e3135 gpu: nvgpu: cancel vidmem worker only if supported
Cancel the vidmem.clear_mem_worker during suspend only if vidmem is
enabled via kernel config. Otherwise it's not initialized.

Bug 1853519

Change-Id: If88c756ae14f348eddda01218fa218480217388c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1321118
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-03-16 09:17:22 -07:00
Terje Bergstrom
85cb10c313 gpu: nvgpu: Remove unused function gk20a_get_phys_from_iova
Remove unused function gk20a_get_phys_from_iova. At the same time
remove the #include for iommu.h, which was only needed by this
function.

Change-Id: Ia858b0ad5fe7e423d650aa9f82e430f419f2a492
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1319070
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-14 11:47:17 -07:00
Terje Bergstrom
503a5e6826 gpu: nvgpu: Do not use pm_runtime calls directly
mm_gk20a.c had direct calls to pm_runtime_put_noidle(). Replace
them with calls to wrapper gk20a_idle_nosuspend() to prevent
unnecessary dependencies to Linux.

Change-Id: Iaf8b9255750be2f3e1aa39587c1a4a3cbeacc67f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1319069
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-14 11:47:11 -07:00
Deepak Nibade
9efadcdfc0 gpu: nvgpu: check return value of mutex_init for semaphores
- check return value of nvgpu_mutex_init for semaphores
- add corresponding nvgpu_mutex_destroy calls

Jira NVGPU-13

Change-Id: I5404dbd29e3fce29f1a445eb2e6ce8e1d1b616c4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1317138
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Navneet Kumar <navneetk@nvidia.com>
2017-03-14 11:46:53 -07:00
Alex Waterman
707ea45e0f gpu: nvgpu: kmem abstraction and tracking
Implement kmem abstraction and tracking in nvgpu. The abstraction
helps move nvgpu's core code away from being Linux dependent and
allows kmem allocation tracking to be done for Linux and any other
OS supported by nvgpu.

Bug 1799159
Bug 1823380

Change-Id: Ieaae4ca1bbd1d4db4a1546616ab8b9fc53a4079d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1283828
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-03 10:34:48 -08:00
Alex Waterman
3966efc2e5 gpu: nvgpu: Give nvgpu_kalloc a less generic name
Change nvgpu_kalloc() to nvgpu_big_[mz]alloc(). This is necessary
since the natural free function name for this is nvgpu_kfree() but
that conflicts with nvgpu_k[mz]alloc() (implemented in a subsequent
patch).

This API exists becasue not all allocation sizes can be determined
at compile time and in some cases sizes may vary across the system
page size. Thus always using kmalloc() could lead to OOM errors due
to fragmentation. But always using vmalloc() is wastful of memory
for small allocations. This API tries to alleviate those problems.

Bug 1799159
Bug 1823380

Change-Id: I49ec5292ce13bcdecf112afbb4a0cfffeeb5ecfc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1283827
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-03-03 10:34:43 -08:00
Deepak Nibade
64e1782aee gpu: nvgpu: optimize duplicate buffer lookup in case of fixed offsets
In gk20a_vm_map_duplicate_locked(), we always do a linear search
in rb-tree to find a duplicate entry of the buffer

In case NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET is set, we first
traverse whole rb-tree linearly and then compare offset_align
with the address searched from rb-tree

If size of rb-tree is very large this linear lookup takes upto
7mS and causes huge delays

Hence in case of NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET, we can
use offset_align to perform a binary search on rb-tree and then
verify that dmabuf and kind match with the node obtained from
the search
This saves a lot of time per-lookup

Bug 1874516

Change-Id: Ia4924b64d66e586c14341ae2e2283beac394bf6f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1309343
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-03-02 07:53:55 -08:00
Deepak Nibade
8cdb91c527 gpu: nvgpu: remove use of DEFINE_MUTEX()
API DEFINE_MUTEX() is defined in Linux and might
not be available in other OSs.
Hence remove its usage from nvgpu

Declare and explicitly initialize below mutexes
for both nvgpu and vgpu
g->mm.priv_lock
g->mm.tlb_lock

Jira NVGPU-13

Change-Id: If72885a6da0227a1552303206172f1f2b751471d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1298042
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-02-22 04:15:08 -08:00
Deepak Nibade
8ee3aa4b31 gpu: nvgpu: use common nvgpu mutex/spinlock APIs
Instead of using Linux APIs for mutex and spinlocks
directly, use new APIs defined in <nvgpu/lock.h>

Replace Linux specific mutex/spinlock declaration,
init, lock, unlock APIs with new APIs
e.g
struct mutex is replaced by struct nvgpu_mutex and
mutex_lock() is replaced by nvgpu_mutex_acquire()

And also include <nvgpu/lock.h> instead of including
<linux/mutex.h> and <linux/spinlock.h>

Add explicit nvgpu/lock.h includes to below
files to fix complilation failures.
gk20a/platform_gk20a.h
include/nvgpu/allocator.h

Jira NVGPU-13

Change-Id: I81a05d21ecdbd90c2076a9f0aefd0e40b215bd33
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1293187
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-02-22 04:15:02 -08:00
Alex Waterman
e7a0c0ae8b gpu: nvgpu: Move from gk20a_ to nvgpu_ in semaphore code
Change the prefix in the semaphore code to 'nvgpu_' since this code
is global to all chips.

Bug 1799159

Change-Id: Ic1f3e13428882019e5d1f547acfe95271cc10da5
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1284628
Reviewed-by: Varun Colbert <vcolbert@nvidia.com>
Tested-by: Varun Colbert <vcolbert@nvidia.com>
2017-02-13 18:15:03 -08:00
Alex Waterman
aa36d3786a gpu: nvgpu: Organize semaphore_gk20a.[ch]
Move semaphore_gk20a.c drivers/gpu/nvgpu/common/ since the semaphore
code is common to all chips.

Move the semaphore_gk20a.h header file to drivers/gpu/nvgpu/include/nvgpu
and rename it to semaphore.h. Also update all places where the header
is inluced to use the new path.

This revealed an odd location for the enum gk20a_mem_rw_flag. This should
be in the mm headers. As a result many places that did not need anything
semaphore related had to include the semaphore header file. Fixing this
oddity allowed the semaphore include to be removed from many C files that
did not need it.

Bug 1799159

Change-Id: Ie017219acf34c4c481747323b9f3ac33e76e064c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1284627
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-02-13 18:14:45 -08:00
Alex Waterman
7e403974d3 gpu: nvgpu: Simplify ref-counting on VMs
Simplify ref-counting on VMs: take a ref when a VM is bound to a
channel and drop a ref when a channel is freed.

Previously ref-counts were scattered over the driver. Also the CE
and CDE code would bind channels with custom rolled code. This was
because the gk20a_vm_bind_channel() function took an as_share as
the VM argument (the VM was then inferred from that as_share).
However, it is trivial to abtract that bit out and allow a central
bind channel function that just takes a VM and a channel.

Bug 1846718

Change-Id: I156aab259f6c7a2fa338408c6c4a3a464cd44a0c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1261886
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-02-07 14:54:02 -08:00
Alex Waterman
95a3eb454c gpu: nvgpu: Conditional address space unification
Allow platforms to choose whether or not to have unified GPU
VA spaces. This is useful for the dGPU where having a unified
address space has no problems. On iGPUs testing issues is
getting in the way of enabling this feature.

Bug 1396644
Bug 1729947

Change-Id: I65985f1f9a818f4b06219715cc09619911e4824b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265303
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-31 16:23:07 -08:00
Alex Waterman
b9b94c073c gpu: nvgpu: Remove separate fixed address VMA
Remove the special VMA that could be used for allocating fixed
addresses. This feature was never used and is not worth maintaining.

Bug 1396644
Bug 1729947

Change-Id: I06f92caa01623535516935acc03ce38dbdb0e318
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265302
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-31 16:23:07 -08:00
Alex Waterman
321537b8ed gpu: nvgpu: Cleanup gk20a_init_vm()
Cleanup and simplify the gk20a_init_vm() function to ease the
implementation of a platform dependent address space unification
decision.

Bug 1396644
Bug 1729947

Change-Id: Id8487d0e3d3c65e3357e3528063fb17c8a85f7da
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265301
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-31 16:23:07 -08:00
Alex Waterman
d630f1d99f gpu: nvgpu: Unify the small and large page address spaces
The basic structure of this patch is to make the small page allocator
and the large page allocator into pointers (where they used to be just
structs). Then assign each of those pointers to the same actual
allocator since the buddy allocator has supported mixed page sizes
since its inception.

For the rest of the driver some changes had to be made in order to
actually support mixed pages in a single address space.

1. Unifying the allocation page size determination

   Since the allocation and map operations happen at distinct
   times both mapping and allocation of GVA space must agree
   on page size. This is because the allocation has to separate
   allocations into separate PDEs to avoid the necessity of
   supporting mixed PDEs.

   To this end a function __get_pte_size() was introduced which
   is used both by the balloc code and the core GPU MM code. It
   determines page size based only on the length of the mapping/
   allocation.

2. Fixed address allocation + page size

   Similar to regular mappings/GVA allocations fixed address
   mapping page size determination had to be modified. In the
   past the address of the mapping determined page size since
   the address space split was by address (low addresses were
   small pages, high addresses large pages). Since that is no
   longer the case the page size field in the reserve memory
   ioctl is now honored by the mapping code. When, for instance,
   CUDA makes a memory reservation it specifies small or large
   pages. When CUDA requests mappings to be made within that
   address range the page size is then looked up in the reserved
   memory struct.

   Fixed address reservations were also modified to now always
   allocate at a PDE granularity (64M or 128M depending on
   large page size. This prevents non-fixed allocations from
   ending up in the same PDE and causing kernel panics or GMMU
   faults.

3. The rest...

   The rest of the changes are just by products of the above.
   Lots of places required minor updates to use a pointer to
   the GVA allocator struct instead of the struct itself.

Lastly, this change is not truly complete. More work remains to be
done in order to fully remove the notion that there was such a thing
as separate address spaces for different page sizes. Basically after
this patch what remains is cleanup and proper documentation.

Bug 1396644
Bug 1729947

Change-Id: If51ab396a37ba16c69e434adb47edeef083dce57
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265300
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-31 16:23:07 -08:00
Alex Waterman
793791ebb7 gpu: nvgpu: use map_offset for PTE size computation
Make sure that map_offset is set to the fixed map address or 0)
before determining PTE size. Then use map_offset instead of
offset_align for computing the PTE size since offset_align
could be either an alignment ora fixed mapping offset.

Also is the minimum of the buffer size and the buffer alignment
for computing page size. This is necessary is the GMMU is doing
page gathering (i.e the buffer does not appear as a continguous
IOMMU range to the GPU). Is such cases a large page sized buffer
may be made up of a bunch of discontiguous 4k pages.

Bug 1396644
Bug 1729947

Change-Id: I6464ee6a4ccab2495ccb31cd1ddf1db467d2b215
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1271359
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-31 16:23:07 -08:00
Shardar Shariff Md
a470647ad7 gpu: nvgpu: use soc/tegra/chip-id.h for soc header
The soc tegra headers are unified and moved all the content of
linux/tegra-soc.h to the soc/tegra/chip-id.h to have the
single soc header for Tegra.

Change-Id: I281e19dd3eb1538b8dfbea4eb0779fb64d1fcffa
Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com>
Reviewed-on: http://git-master/r/1288365
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-20 08:24:01 -08:00
Alex Waterman
6e2237ef62 gpu: nvgpu: Use timer API in gk20a code
Use the timers API in the gk20a code instead of Linux specific
API calls.

This also changes the behavior of several functions to wait for
the full timeout for each operation that can timeout. Previously
the timeout was shared across each operation.

Bug 1799159

Change-Id: I2bbed54630667b2b879b56a63a853266afc1e5d8
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1273826
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-01-18 16:46:33 -08:00
Alex Waterman
b928f10d37 gpu: nvgpu: Start re-organizing the HW headers
Reorganize the HW headers of gk20a. The headers are moved to a
new directory:

  include/nvgpu/hw/gk20a

And from the code are included like so:

  #include <nvgpu/hw/gk20a/hw_pwr_gk20a.h>

This is the first step in reorganizing all of the HW headers for
gm20b, gm206, etc. This is part of a larger effort to re-structure
and make the driver more readable and scalable.

Bug 1799159

Change-Id: Ic151155cbc2e6f75009f2d9d597b364a1bed2c4c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1244790
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-01-11 12:44:14 -08:00
Alex Waterman
6df3992b60 gpu: nvgpu: Move allocators to common/mm/
Move the GPU allocators to common/mm/ since the allocators are common
code across all GPUs. Also rename the allocator code to move away from
gk20a_ prefixed structs and functions.

This caused one issue with the nvgpu_alloc() and nvgpu_free() functions.
There was a function for allocating either with kmalloc() or vmalloc()
depending on the size of the allocation. Those have now been renamed to
nvgpu_kalloc() and nvgpu_kfree().

Bug 1799159

Change-Id: Iddda92c013612bcb209847084ec85b8953002fa5
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1274400
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2017-01-09 12:33:16 -08:00
Alex Waterman
91d977ced4 gpu: nvgpu: Misc fixes for crashes on shutdown
Fix miscellaneous issues seen during driver shutdown.

 o  Make sure pointers are valid before accessing them.
 o  Busy the GPU during channel timeout.
 o  Cancel delayed work on channels.
 o  Avoid access to channels that may have been freed.

Bug 1816516
Bug 1807277

Change-Id: I62df40373fdfb1c4a011364e8c435176a08a7a96
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1250026
(cherry picked from commit 64a95fc96c8ef7c5af9c53c4bb3402626e0d2f60)
Reviewed-on: http://git-master/r/1274474
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2017-01-04 15:53:55 -08:00
Konsta Holtta
92fe000749 gpu: nvgpu: rename timeout_check to timeout_expired
Change "check" to "expired" in nvgpu_timeout_check* and append _expired
to nvgpu_timeout_peek to clarify what the boolean-like return value
means and thus avoid bugs.

Bug 200260715

Change-Id: I47e097ee922e856005a79fa9e27eddb1c8d77f8b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1269366
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-12-19 15:40:36 -08:00
Terje Bergstrom
a4731b3282 gpu: nvgpu: Use end of vidmem as bootstrap region
Instead of hard coding bootstrap region, it should always be set to
the last 256MB of vidmem.

Bug 200244445

Change-Id: I91779d1bf861f4f23a0b646f70b1febbbc4581b5
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1242409
Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-on: http://git-master/r/1267124
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-12-09 15:05:12 -08:00
Konsta Holtta
54810444d2 gpu: nvgpu: fix timeout retry usage in mm_gk20a.c
Loop conditions of timeout checking introduced in commit
2109478311 ("gpu: nvgpu: Use timeout retry
API in mm_gk20a.c") were flipped by accident, so each usage in a loop
actually did not wait enough but ran only one iteration. Fix the
conditions to loop as long as the timeout is NOT expired.

Also restore l2 flush timeout to 10 ms from 1, which was done in commit
030ef82bdd ("gpu: nvgpu: increase l2 flush
timeout") but overwritten by the above "use timeout" commit.

Bug 200260715

Change-Id: I0db16be79a1a27caa3d97fac9d4361582cc232e8
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1268482
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-12-09 13:44:53 -08:00
Alex Waterman
2109478311 gpu: nvgpu: Use timeout retry API in mm_gk20a.c
Use the retry API that is part of the nvgpu timeout API to keep
track of retry attempts in the vrious flushing and invalidating
operations in the MM code.

Bug 1799159

Change-Id: I36e98d37183a13d7c3183262629f8569f64fe4d7
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1255866
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-12-05 16:16:24 -08:00
Krishna Reddy
89e707d2b4 gpu: nvgpu: fix hardcoded page size reference
Bug 1843356
Bug 1769772

Change-Id: I6c2a3a72f7082074bbf1165a74d5070195e1e653
Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-on: http://git-master/r/1258352
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-12-05 13:49:17 -08:00
seshendra Gadagottu
030ef82bdd gpu: nvgpu: increase l2 flush timeout
Under heavy throttling case gpu runs 8 times slower.
This is making l2 flush to timeout, to avoid this increase
timeout to 10msec from 1msec.

Bug 1787261

Change-Id: Ia112ce968c136135ccb9df4a7364073103684403
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1216559
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-12-02 05:37:03 -08:00
Deepak Nibade
af5d2d208a gpu: nvgpu: API to access fb memory
Add IOCTL API NVGPU_DBG_GPU_IOCTL_ACCESS_FB_MEMORY
to read/write fb/vidmem memory

Interface will accept dmabuf_fd of the buffer in vidmem,
offset into the buffer to access, temporary buffer
to copy data across API, size of read/write and
command indicating either read or write operation

API will first parse all the inputs, and then call
gk20a_vidbuf_access_memory() to complete fb access

gk20a_vidbuf_access_memory() will then just use
gk20a_mem_rd_n() or gk20a_mem_wr_n() depending
on the command issued

Bug 1804714
Jira DNVGPU-192

Change-Id: Iba3c42410abe12c2884d3b603fa33d27782e4c56
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1255556
(cherry picked from commit 2c49a8a79d93fc526adbf6f808484fa9a3fa2498)
Reviewed-on: http://git-master/r/1260471
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2016-11-30 02:18:17 -08:00
seshendra Gadagottu
ef95e43d97 gpu: nvgpu: chip specific init_inst_block
Add function pointer to add chip specific init_inst_block.
Update this function pointer for gk20a and gm20b.

JIRA GV11B-21

Change-Id: I74ca6a8b4d5d1ed36f7b25b7f62361c2789b9540
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1254875
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-11-21 08:50:45 -08:00
Terje Bergstrom
d29afd2c9e gpu: nvgpu: Fix signed comparison bugs
Fix small problems related to signed versus unsigned comparisons
throughout the driver. Bump up the warning level to prevent such
problems from occuring in future.

Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1252068
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-11-16 21:35:36 -08:00
Alex Waterman
2fa54c94a6 gpu: nvgpu: Remove global debugfs variable
Remove a global debugfs variable and instead save the allocator
debugfs root node in the gk20a struct.

Bug 1799159

Change-Id: If4eed34fa24775e962001e34840b334658f2321c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1225611
(cherry picked from commit 1908fde10bb1fb60ce898ea329f5a441a3e4297a)
Reviewed-on: http://git-master/r/1242390
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-10-26 11:10:01 -07:00
Alex Waterman
93eea1d729 gpu: nvgpu: Move CE cleanup
Move the CE cleanup to before the FIFO cleanup. Since the CE closes
a channel during its cleanup the FIFO needs to be initialized since
the FIFO code maintains the vmalloc()'ed channels.

Bug 1816516

Change-Id: Ia7a97059a12a0c2b52368ffe411e597f803e8e6e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1225613
(cherry picked from commit 707bd2a6d4672c6a7b7a8b2e581ea3a606ed971d)
Reviewed-on: http://git-master/r/1240106
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-10-26 11:09:57 -07:00
Alex Waterman
f0fe1c2f02 gpu: nvgpu: Only cleanup existing semaphore pools
Not all VMs have semaphore pools made for them even when semaphores
are going to be used. Thus only VMs with existing semaphore pools
should have their pools cleaned up.

Bug 1816516

Change-Id: I07828708faef451f1711f58c0d5b3f8e4d296dd0
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1225612
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit 6cdb7b6650765465dca68dc3c23b3d795ccdafb5)
Reviewed-on: http://git-master/r/1240105
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
2016-10-26 11:09:56 -07:00
Alex Waterman
fc4f0ddddb gpu: nvgpu: SLAB allocation for page allocator
Add the ability to do "SLAB" allocation in the page allocator. This
is generally useful since the allocator manages 64K pages but often
we only need 4k chunks (for example when allocating memory for page
table entries).

Bug 1799159
JIRA DNVGPU-100

Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1225322
(cherry picked from commit 299a5639243e44be504391d9155b4ae17d914aa2)
Change-Id: Ib3a8558d40ba16bd3a413f4fd38b146beaa3c66b
Reviewed-on: http://git-master/r/1227924
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-18 12:24:33 -07:00
Konsta Holtta
70e0462861 gpu: nvgpu: skip pramin barriers for page tables
Page table updates have an explicit write barrier at the end of a pte
update operation in update_gmmu_ptes_locked(), so the per-wr32 wmb()s
are not necessary.

Jira DNVGPU-23

Change-Id: I2e2596f0900d840fadb369ee1261c5e2305f2070
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225150
(cherry picked from commit 6664a667ea326e9663a6b502765f858d8669f4d9)
Reviewed-on: http://git-master/r/1227475
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-17 14:39:10 -07:00
Konsta Holtta
a8e260bc8d gpu: nvgpu: allow skipping pramin barriers
A wmb() next to each gk20a_mem_wr32() via PRAMIN may be overly careful,
so support not inserting these barriers for performance, in cases where
they are not necessary, where the caller would do an explicit barrier
after a bunch of reads.

Also, move those optional wmb()s to be done at the end of the whole
internally batched write for gk20a_mem_{wr_n,memset} from the per-batch
subloops that may run multiple times.

Jira DNVGPU-23

Change-Id: I61ee65418335863110bca6f036b2e883b048c5c2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225149
(cherry picked from commit d2c40327d1995f76e8ab9cb4cd8c76407dabc6de)
Reviewed-on: http://git-master/r/1227474
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-17 14:39:00 -07:00
Alex Waterman
718af968f0 gpu: nvgpu: Fix wmb() order in pramin access
wmb() should come after the writes to ensure that the writes have
completed before progressing.

Bug 1811382

Change-Id: I98fba317b1760240c0b5de531accf398fe69c9b3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
(cherry picked from commit 1b1201b9c109061590e6e25260d7230ae2c89888)
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225251
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-17 14:38:50 -07:00
Konsta Holtta
fa6ab1943e gpu: nvgpu: add ioctl for querying memory state
Add NVGPU_GPU_IOCTL_GET_MEMORY_STATE to read the amount of free
device-local video memory, if applicable.

Some reserved fields are added to support different types of queries in
the future (e.g. context-local free amount).

Bug 1787771
Bug 200233138

Change-Id: Id5ffd02ad4d6ed3a6dc196541938573c27b340ac
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1223762
(cherry picked from commit 96221d96c7972c6387944603e974f7639d6dbe70)
Reviewed-on: http://git-master/r/1235980
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-14 08:12:34 -07:00
Konsta Holtta
a4e4466a0c gpu: nvgpu: track pending bytes for vidmem clears
Change clears_pending to bytes_pending and track accordingly the number
of bytes to be freed instead of the number of buffers. This, atomically
combined with the amount of space in the allocator, is the total amount
of free memory available.

Bug 200233138

Change-Id: Ibbb4e80a32728781ba19a74307d8a8ac1a4d7431
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1231422
(cherry picked from commit 025e765f312c253b201ecf2dbbe0f4972fe1d4bc)
Reviewed-on: http://git-master/r/1235957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-14 08:12:17 -07:00
Konsta Holtta
8728da1c6e gpu: nvgpu: compact pte buffers
The lowest page table level may hold very few entries for mappings of
large pages, but a new page is allocated for each list of entries at the
lowest level, wasting memory and performance. Compact these so that the
new "allocation" of ptes is appended at the end of the previous
allocation, if there is space.

4 KB page is still the smallest size requested from the allocator; any
possible overhead in the allocator (e.g., internally allocating big
pages only) is not taken into account.

Bug 1736604

Change-Id: I03fb795cbc06c869fcf5f1b92def89a04583ee83
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1221841
(cherry picked from commit fa92017ed48e1d5f48c1a12c512641c6ce9924af)
Reviewed-on: http://git-master/r/1234996
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2016-10-13 08:09:16 -07:00