Remove the last sg_phys() call from the GMMU code and replace it
with a generic nvgpu_mem API. This new API, nvgpu_mem_get_phys_addr(),
returns the physical address of an nvgpu_mem struct.
Also, implement this new API in the Linux specific nvgpu_mem code
since it requires access to the underlying SGT/SGL.
JIRA NVGPU-68
Change-Id: Idf88701a2a8515464c658c26e0de493c82ff850d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1542964
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When the L2 flush IOCTL gets no l2 flush and no fb flush we now
return -EINVAL. This can sometimes happen if the user tries to just
invalidate. Currently we do not support L2 invalidates only.
Bug 1661242
Change-Id: I87f3259bfbd736b5f4222cfe7b3cfa4a6475389e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1227125
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change solves crashes during bind that were introduced in the driver
during the OS unification refactoring due to lack of coverage of the remove()
function.
The fixes during remove are:
(1) Prevent NULL dereference on GPUs with secure boot
(2) Prevent NULL dereferences when fecs_trace is not enabled
(3) Added PRAMIN blocker during driver removal if HW is no longer accesible
(4) Prevent double free of debugfs nodes as they are handled on the
debugfs_remove_recursive() call
(5) quiesce() can now be called without checking is HW accesible flag is set
(6) added function to free irq so no IRQ association is left on the driver after
it is removed
(7) prevent NULL dereference on nvgpu_thread_stop() if the thread is already
stopped
JIRA: EVLR-1739
Change-Id: I787d38f202d5267a6b34815f23e1bc88110e8455
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1563005
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The basic nvgpu_mem_sgl implementation provides support
for OS specific scatter-gather list implementations by
simply copying them node by node. This is inefficient,
taking extra time and memory.
This patch implements an nvgpu_mem_sgt struct to act as
a header which is inserted at the front of any scatter-
gather list implementation. This labels every struct
with a set of ops which can be used to interact with
the attached scatter gather list.
Since nvgpu common code only has to interact with these
function pointers, any sgl implementation can be used.
Initialization only requires the allocation of a single
struct, removing the need to copy or iterate through the
sgl being converted.
Jira NVGPU-186
Change-Id: I2994f804a4a4cc141b702e987e9081d8560ba2e8
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1541426
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The last major item preventing the core MM code in the nvgpu
driver from being platform agnostic is the usage of Linux
scattergather tables and scattergather lists. These data
structures are used throughout the mapping code to handle
discontiguous DMA allocations and also overloaded to represent
VIDMEM allocs.
The notion of a scatter gather table is crucial to a HW device
that can handle discontiguous DMA. The GPU has a MMU which
allows the GPU to do page gathering and present a virtually
contiguous buffer to the GPU HW. As a result it makes sense
for the GPU driver to use some sort of scatter gather concept
so maximize memory usage efficiency.
To that end this patch keeps the notion of a scatter gather
list but implements it in the nvgpu common code. It is based
heavily on the Linux SGL concept. It is a singly linked list
of blocks - each representing a chunk of memory. To map or
use a DMA allocation SW must iterate over each block in the
SGL.
This patch implements the most basic level of support for this
data structure. There are certainly easy optimizations that
could be done to speed up the current implementation. However,
this patches' goal is to simply divest the core MM code from
any last Linux'isms. Speed and efficiency come next.
Change-Id: Icf44641db22d87fa1d003debbd9f71b605258e42
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530867
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Make the kmem debugging prints much more easily usable. Previously the
prints would only take effect if the full tracking was enabled:
CONFIG_NVGPU_TRACK_MEM_USAGE
However, there are many times when just the debug prints would be nice
to have by simply setting the log mask bit in the log mask.
echo 0x80000 > /sys/kernel/debug/<gpu>/log_mask
Also this change now uses the real nvgpu_log() function instead of the
legacy nvgpu_dbg() function. This makes the logging appear with proper
GPU printing as well.
Change-Id: If545da3d357d38fe8252e7d548c6765b995cd3d7
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1560248
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a function to check if an nvgpu_mem is allocated (valid) or not.
Also fix possibly leaked state in nvgpu_mems when they fail to
allocate.
Also ensure that in the case of a failure to allocate no state is
accidentally leaked to the caller. This should hopefully make it
less likely that a caller thinks a buffer that failed to allocate
is actually allocated.
Change-Id: I43224ece7da84e63b2f43f36f04941126fabf3c7
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1559419
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
NVGPU_SUBMIT_GPFIFO_FLAGS_RESCHEDULE_RUNLIST is only used by realtime
priority EGL context, which checks for CAP_SYS_NICE during context
creation in userspace, so it wasn't secure against unprivileged program
spoofing submit ioctl with this flag to stall GPU progress of others.
This flag does increase duration of submit by approx 16us,
mostly due to register accesses and PMU FIFO mutex.
Bug 1989493
Bug 1854791
Bug 1968813
Change-Id: I086b1d14f286abf8bd2d2dfae5945974b7fe6d1f
Reviewed-on: https://git-master.nvidia.com/r/#/c/1558644
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1558644
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allow userspace to control directly the PTE kind for the mappings by
supplying NVGPU_AS_MAP_BUFFER_FLAGS_DIRECT_KIND_CTRL for MAP_BUFFER_EX.
In particular, in this mode, the userspace will tell the kernel
whether the kind is compressible, and if so, what is the
incompressible fallback kind. By supplying only the compressible kind,
the userspace can require that the map kind will not be demoted to the
incompressible fallback kind in case of comptag allocation failure.
Add also a GPU characteristics flag
NVGPU_GPU_FLAGS_SUPPORT_MAP_DIRECT_KIND_CTRL to signal whether direct
kind control is supported.
Fix indentation of nvgpu_as_map_buffer_ex_args header comment.
Bug 1705731
Change-Id: I317ab474ae53b78eb8fdd31bd6bca0541fcba9a4
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1543462
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add platform specific operations to enable/disable a TSG and use them instead
of directly calling enable/disable APIs
For gm20b/gp106/gp10b we continue to use gk20a_enable_tsg() and
gk20a_disable_tsg() as platform specific operations
Bug 1739362
Change-Id: I2dd0f38c8303757e8c7a47d8da0e30a790e514f0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1560635
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The following changes are part of the porting of the bind/unbind
functionality.
These changes reuse the shutdown codepaths in iGPU and dGPU and fix a locking
issue with in gk20a_busy() where the usage count can lead to a deadlock during
the driver shutdown. It fixes a racing condition with the gr/mm code by
invalidating the sw ready flag while holding the busy lock
JIRA: EVLR-1739
Change-Id: I62ce47378436b21f447f4cd93388759ed3f9bad1
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1554959
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Explicitly page align DMA memory allocations. Non-page aligned
DMA allocs can lead to nvgpu_mem structs that have a size that's
not page aligned. Those allocs, in some cases, can cause vGPU
maps to return an error.
More generally DMA allocs in Linux are never going to not be
page aligned both in size and address. So might as well page align
the alloc size to make code else where in the driver more simple.
To imlpement this an aligned_size field has been added to struct
nvgpu_mem. This field has the real page aligned size of the
allocation. The original size is still saved in size.
Change-Id: Ie08cfc4f39d5f97db84a54b8e314ad1fa53b72be
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1547902
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix a race condition in gk20a_get_channel_from_file() that returns a
channel pointer from an fd: take a reference to the channel before
putting the file ref back. Now the caller is responsible of releasing
the channel reference eventually.
Also document why dbg_session_channel_data has to hold a ref to the
channel file instead of just the channel: that might deadlock if the fds
were closed in "wrong" order.
Change-Id: I8e91b809f5f7b1cb0c1487bd955ad6d643727a53
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1549290
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In the Linux VM code the kind attribute is assigned to a u8 from an
int. The code is safe due to the checks in place, however, coverity
does not like this so explicitly cast to u8 before assigning kind
to kind_v.
Coverity ID: 2567919
Bug 200291879
Change-Id: I3860faa99da8ac1a24bd0c3d77c7fbc72207614a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1548712
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove the kernel feature to map compbit backing store areas to GPU VA.
The feature is not used, and the relevant user space code is getting
removed, too.
Change-Id: I94f8bb9145da872694fdc5d3eb3c1365b2f47945
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1547898
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Remove debugging features that did not really get used and make
the debugging code use the nvgpu_log() functionality. This ties
the allocator debugging into the larger nvgpu debug framework.
Also modify many of the places CONFIG_DEBUG_FS was used to
conditionally compile allocator debug code to use __KERNEL__
instead. This is because that debug code can still be called even
when debugfs is not present in Linux.
Change-Id: I112ebe1cae22d6f8db96d023993498093e18d74a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1544439
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Prints addresses of device-specific HAL functions to debugfs file
hal/gops.
The list of functions is produced by dumping the contents of the
gpu_ops substruct of the gk20a struct. This interface makes the
assumption that there are only function pointers in gpu_ops.
Companion Python script nvgpu_debug_hal.py analyzes gk20a.h to
determine operation counts and prettyify debugfs interface's
output.
Jira NVGPU-107
Change-Id: I0910e86638d144979e8630bbc5b330bccfd3ad94
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1542990
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reorganize HAL initialization to remove inheritance and construct
the gpu_ops struct at compile time. This patch only covers the
mm sub-module of the gpu_ops struct.
Perform HAL function assignments in hal_gxxxx.c through the
population of a chip-specific copy of gpu_ops.
Jira NVGPU-74
Change-Id: Ieb87a62f047510e51c52e6563d8e3fd5a65b5f28
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537753
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a pointer to struct gk20a to the FUSE APIs. This helps
QNX builds avoid any static data definitions.
Also this change plumbs struct gk20a in some of the Linux clk
code and fixes a few minor style nits.
Change-Id: I27dfb2c4e9a352f784d6cead150460d8e9e808d3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537611
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reorganize HAL initialization to remove inheritance and construct
the gpu_ops struct at compile time. This patch only covers the
mm sub-module of the gpu_ops struct.
Perform HAL function assignments in hal_gxxxx.c through the
population of a chip-specific copy of gpu_ops.
Jira NVGPU-74
Change-Id: I289284e6e528fc7951c959c8765ccf9349eec33b
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1533351
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gm20b_tegra_postscale(), we use platform->railgate_lock to check if GPU is
railgated or not
But platform->railgate_lock was introduced only to prevent unrailgating in midst
of gk20a_do_idle() sequence
This lock is not the right way to check railgate status since it is still
possible to railgate GPU with this lock being held
Hence remove acquire/release of platform->railgate_lock from
gm20b_tegra_postscale()
Bug 1962265
Change-Id: I6208063de3fa77ed71e8fb0c011367fb66151193
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1536573
(cherry picked from commit 68bce66be338e48f4921f645b10b3fa5994fe1d4)
Reviewed-on: https://git-master.nvidia.com/r/1537297
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
After setting 'Y' in disable_bigpage, in native SMMU case,
we could still see 64K GMMU pages beeing used.
Fixed the following:
- enforce disable_bigpage in nvgpu_vm_map
- update GPU characteristics so that new clients know
whether or not big pages are enabled. For instance
this may affect how CUDA requests memory mapping.
JIRA EVLR-1694
Change-Id: I62841096add3bd798c5c11090054f82c8a2be832
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1532429
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove the mm.get_iova_addr() HAL and replace it with a new HAL
called mm.gpu_phys_addr(). This new HAL provides the real phys
address that should be passed to the GPU from a physical address
obtained from a scatter list. It also provides a mechanism by
which the HAL code can add extra bits to a GPU physical address
based on the attributes passed in. This is necessary during GMMU
page table programming.
Also remove the flags argument from the various address functions.
This flag was used for adding an IO coherence bit to the GPU
physical address which is not supported.
JIRA NVGPU-30
Change-Id: I69af5b1c6bd905c4077c26c098fac101c6b41a33
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530864
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Pass struct gk20a pointer instead of struct device to
gk20a_wait_for_idle(). The code is not Linux specific and does not
need pointer to struct device.
Change-Id: I2cafd6c7db019c9de76b6e68a1ae73f0b4cea37d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1533173
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Refactor the sync_debugfs LTC HAL op so that the logic to enable
or disable LTC goes to common code nvgpu_ltc_sync_enabled() and
the LTC HAL set_enabled only performs the hardware register access.
Create a new common function nvgpu_init_ltc_support() to initialize
the LTC software variable, and move hardware initialization of LTC to
be called from it.
JIRA NVGPU-62
Change-Id: Ib1cf4f5b83ca3dac08407464ed56a732e0a33923
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1528262
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix kernel warnings for GPUs with real vidmem:
- dma.c: in nvgpu_dma_alloc_flags, ignore incoming flags when using vidmem,
since anything but NVGPU_DMA_NO_KERNEL_MAPPING will end up generating
kernel warnings, and the vidmem mapping functions ignore the other flags
anyway.
- gmmu.c: in __nvgpu_gmmu_update_page_table, use appropriate function for
memory type to retrieve physical address
Bug 1967748
Change-Id: I6fc01fd5f2c5cd7b81cba70ab59cc3c8fe4cda19
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530877
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move fields in struct gk20a related to interrupt handling into
Linux specific nvgpu_os_linux. At the same time move the counter
logic from function in HAL into Linux specific code, and two Linux
specific power management functions from generic gk20a.c to Linux
specific module.c.
JIRA NVGPU-123
Change-Id: I0a08fd2e81297c8dff7a85c263ded928496c4de0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1528177
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sourab Gupta <sourabg@nvidia.com>
GVS: Gerrit_Virtual_Submit
gk20a_channel_release can still get called even if the open_channel call failed
(e.g., if we ran out of hw chids), in which case priv is null. Check for this
case and return if null.
Bug 1964531
Change-Id: I48bc88e4dbd88a1c30fc399de629d8f8b344cfd9
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1526544
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
If an error occurs during an attempt to perform a runtime_resume,
the runtime power management framework sets an error flag that
prevents further attempts to resume until the error is cleared.
nvgpu currently does not clear the flag, which causes nvgpu to
lock up if an error occurs during runtime_resume.
This change explicitly sets the device pm status to suspended
on error, which clears the error flag so that subsequent attempts
to resume will not be blocked.
Bug 200324790
Change-Id: I3c875453670d3691ab01cff90ce31e797296662a
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1526478
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>