Fix a race condition in gk20a_get_channel_from_file() that returns a
channel pointer from an fd: take a reference to the channel before
putting the file ref back. Now the caller is responsible of releasing
the channel reference eventually.
Also document why dbg_session_channel_data has to hold a ref to the
channel file instead of just the channel: that might deadlock if the fds
were closed in "wrong" order.
Change-Id: I8e91b809f5f7b1cb0c1487bd955ad6d643727a53
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1549290
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In the Linux VM code the kind attribute is assigned to a u8 from an
int. The code is safe due to the checks in place, however, coverity
does not like this so explicitly cast to u8 before assigning kind
to kind_v.
Coverity ID: 2567919
Bug 200291879
Change-Id: I3860faa99da8ac1a24bd0c3d77c7fbc72207614a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1548712
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove the kernel feature to map compbit backing store areas to GPU VA.
The feature is not used, and the relevant user space code is getting
removed, too.
Change-Id: I94f8bb9145da872694fdc5d3eb3c1365b2f47945
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1547898
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Remove debugging features that did not really get used and make
the debugging code use the nvgpu_log() functionality. This ties
the allocator debugging into the larger nvgpu debug framework.
Also modify many of the places CONFIG_DEBUG_FS was used to
conditionally compile allocator debug code to use __KERNEL__
instead. This is because that debug code can still be called even
when debugfs is not present in Linux.
Change-Id: I112ebe1cae22d6f8db96d023993498093e18d74a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1544439
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Prints addresses of device-specific HAL functions to debugfs file
hal/gops.
The list of functions is produced by dumping the contents of the
gpu_ops substruct of the gk20a struct. This interface makes the
assumption that there are only function pointers in gpu_ops.
Companion Python script nvgpu_debug_hal.py analyzes gk20a.h to
determine operation counts and prettyify debugfs interface's
output.
Jira NVGPU-107
Change-Id: I0910e86638d144979e8630bbc5b330bccfd3ad94
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1542990
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move non-function pointer members out of the pmu and pmu_ver
substructs of gpu_ops. Ideally gpu_ops will have only function
ponters, better matching its intended purpose and improving
readability.
- g.ops.pmu_ver.cmd_id_zbc_table_update has been changed to
g.pmu_ver_cmd_id_zbc_table_update
- g.ops.pmu.lspmuwprinitdone has been changed to
g.pmu_lsf_pmu_wpr_init_done
- g.ops.pmu.lsfloadedfalconid has been changed to
g.pmu_lsf_loaded_falcon_id
Boolean flags have been implemented using the enabled.h API
- g.ops.pmu_ver.is_pmu_zbc_save_supported moved to
common flag NVGPU_PMU_ZBC_SAVE
- g.ops.pmu.fecsbootstrapdone moved to
common flag NVGPU_PMU_FECS_BOOTSTRAP_DONE
Jira NVGPU-74
Change-Id: I08fb20f8f382277f2c579f06d561914c000ea6e0
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530981
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In pmu_handle_pg_elpg_msg, when processing ELPG_MSG_DISALLOW_ACK, if current
pmu state is ELPG_BOOTING and GR_POWER_GATING is not supported, make sure
nvgpu_pmu_state_change updates the state_change flag, since PMU is now fully
initialized. In particular, this fixes PMU boot for dgpu, which does not support
GR_POWER_GATING.
JIRA EVLR-1776
Change-Id: I2feb97b0fb8248e9cb7945ac3189877c21815a4a
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1539102
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Nirav Patel <nipatel@nvidia.com>
Fix some simple pointer arithmetic errors in the deterministic submit
path. The lockless allocator was doing a subtraction to determine the
correct offset of the element to free. However, this subtraction was
using the base address and a numeric offset - not a pointer offset. Thus
the difference computed was in bytes not in elements of the block size.
The fix is simple: just divide by the block size.
Also this modifies the debugging statement a bit so that a bit more
information is printed at more useful times.
Lastly, a pointer to numeric cast was fixed in the fence code.
Change-Id: I514724205f1b73805b21e979481a13ac689f3482
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1538905
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reorganize HAL initialization to remove inheritance and construct
the gpu_ops struct at compile time. This patch only covers the
mm sub-module of the gpu_ops struct.
Perform HAL function assignments in hal_gxxxx.c through the
population of a chip-specific copy of gpu_ops.
Jira NVGPU-74
Change-Id: Ieb87a62f047510e51c52e6563d8e3fd5a65b5f28
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537753
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a pointer to struct gk20a to the FUSE APIs. This helps
QNX builds avoid any static data definitions.
Also this change plumbs struct gk20a in some of the Linux clk
code and fixes a few minor style nits.
Change-Id: I27dfb2c4e9a352f784d6cead150460d8e9e808d3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537611
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The offset units in the nvgpu_pramin_access_batched() code changes
midway through the function. In the first section it is treated as
bytes but then in the while-loop iterating over the PRAMIN window
and page_alloc_chunks it becomes an offset in words.
This patch leaves the offset field in bytes and converts to words
where needed.
Change-Id: Iba964171679dfc27645238b297ed467a450b5cbc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537079
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The call to __set_pd_level() for vidmem allocs had the wrong
length being passed in. This was a silent error since the subsequent
__set_pd_level() calls overwrote the bad mappings. However this
caused significantly more PDE/PTE writes than necessary since
each chunk could be mapped N times where N is the number of chunks
in an SGL.
Change-Id: Ied7247b70825dc91b9eea1c3350f4ef370ab1a52
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537078
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reorganize HAL initialization to remove inheritance and construct
the gpu_ops struct at compile time. This patch only covers the
mm sub-module of the gpu_ops struct.
Perform HAL function assignments in hal_gxxxx.c through the
population of a chip-specific copy of gpu_ops.
Jira NVGPU-74
Change-Id: I289284e6e528fc7951c959c8765ccf9349eec33b
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1533351
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gm20b_tegra_postscale(), we use platform->railgate_lock to check if GPU is
railgated or not
But platform->railgate_lock was introduced only to prevent unrailgating in midst
of gk20a_do_idle() sequence
This lock is not the right way to check railgate status since it is still
possible to railgate GPU with this lock being held
Hence remove acquire/release of platform->railgate_lock from
gm20b_tegra_postscale()
Bug 1962265
Change-Id: I6208063de3fa77ed71e8fb0c011367fb66151193
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1536573
(cherry picked from commit 68bce66be338e48f4921f645b10b3fa5994fe1d4)
Reviewed-on: https://git-master.nvidia.com/r/1537297
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
After setting 'Y' in disable_bigpage, in native SMMU case,
we could still see 64K GMMU pages beeing used.
Fixed the following:
- enforce disable_bigpage in nvgpu_vm_map
- update GPU characteristics so that new clients know
whether or not big pages are enabled. For instance
this may affect how CUDA requests memory mapping.
JIRA EVLR-1694
Change-Id: I62841096add3bd798c5c11090054f82c8a2be832
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1532429
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove the mm.get_iova_addr() HAL and replace it with a new HAL
called mm.gpu_phys_addr(). This new HAL provides the real phys
address that should be passed to the GPU from a physical address
obtained from a scatter list. It also provides a mechanism by
which the HAL code can add extra bits to a GPU physical address
based on the attributes passed in. This is necessary during GMMU
page table programming.
Also remove the flags argument from the various address functions.
This flag was used for adding an IO coherence bit to the GPU
physical address which is not supported.
JIRA NVGPU-30
Change-Id: I69af5b1c6bd905c4077c26c098fac101c6b41a33
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530864
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Pass struct gk20a pointer instead of struct device to
gk20a_wait_for_idle(). The code is not Linux specific and does not
need pointer to struct device.
Change-Id: I2cafd6c7db019c9de76b6e68a1ae73f0b4cea37d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1533173
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
ACCESS_ONCE() is used for making sure that in a given place of code
access a variable exactly once. It prevents compiler rearranging the
read from happening earlier.
Remove its use from cases where rearranging of the read does not
create problems.
Change-Id: I340f375e8fecc31f3a3fab543256069cb4c682dc
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1531649
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Refactor the sync_debugfs LTC HAL op so that the logic to enable
or disable LTC goes to common code nvgpu_ltc_sync_enabled() and
the LTC HAL set_enabled only performs the hardware register access.
Create a new common function nvgpu_init_ltc_support() to initialize
the LTC software variable, and move hardware initialization of LTC to
be called from it.
JIRA NVGPU-62
Change-Id: Ib1cf4f5b83ca3dac08407464ed56a732e0a33923
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1528262
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix kernel warnings for GPUs with real vidmem:
- dma.c: in nvgpu_dma_alloc_flags, ignore incoming flags when using vidmem,
since anything but NVGPU_DMA_NO_KERNEL_MAPPING will end up generating
kernel warnings, and the vidmem mapping functions ignore the other flags
anyway.
- gmmu.c: in __nvgpu_gmmu_update_page_table, use appropriate function for
memory type to retrieve physical address
Bug 1967748
Change-Id: I6fc01fd5f2c5cd7b81cba70ab59cc3c8fe4cda19
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1530877
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move fields in struct gk20a related to interrupt handling into
Linux specific nvgpu_os_linux. At the same time move the counter
logic from function in HAL into Linux specific code, and two Linux
specific power management functions from generic gk20a.c to Linux
specific module.c.
JIRA NVGPU-123
Change-Id: I0a08fd2e81297c8dff7a85c263ded928496c4de0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1528177
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sourab Gupta <sourabg@nvidia.com>
GVS: Gerrit_Virtual_Submit
In the PD caching code use a non-contiguous DMA alloc for PAGE_SIZE
and below allocations. There's no need for using the special contig
pool of mem for these page sized allocs so wasting said mem can lead
us to OOM problems pretty quickly (think large sparse textures, for
example).
Also turn several pd_dbg() statements for printing OOM errors into
nvgpu_err()s since knowing exactly where an alloc fails is very
convenient.
Bug 200326705
Change-Id: Ib7c45020894d4bdd73cc92179ef707e472714d61
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1527294
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
gk20a_channel_release can still get called even if the open_channel call failed
(e.g., if we ran out of hw chids), in which case priv is null. Check for this
case and return if null.
Bug 1964531
Change-Id: I48bc88e4dbd88a1c30fc399de629d8f8b344cfd9
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1526544
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
If an error occurs during an attempt to perform a runtime_resume,
the runtime power management framework sets an error flag that
prevents further attempts to resume until the error is cleared.
nvgpu currently does not clear the flag, which causes nvgpu to
lock up if an error occurs during runtime_resume.
This change explicitly sets the device pm status to suspended
on error, which clears the error flag so that subsequent attempts
to resume will not be blocked.
Bug 200324790
Change-Id: I3c875453670d3691ab01cff90ce31e797296662a
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1526478
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reorganize HAL initialization to remove inheritance and construct
the gpu_ops struct at compile time. This patch covers the debug
and dbg_session_ops sub-modules of the gpu_ops struct.
Perform HAL function assignments in hal_gxxxx.c through the
population of a chip-specific copy of gpu_ops.
Jira NVGPU-74
Change-Id: Id51feeccbea91f884a6057efc680566a7d5d0b6d
Signed-off-by: Sunny He <suhe@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1514822
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Add new routines for accessing and modifying PTEs in situ. They are:
__nvgpu_pte_words()
__nvgpu_get_pte()
__nvgpu_set_pte()
All the details of modifying a page table entry are handled within.
Note, however, that these routines will not build page tables. If a PTE
does not exist then said PTE will not be created. Instead -EINVAL will
be returned. But, keep in mind, a PTE marked as invalid still exists.
So this API can be used to mark an invalid PTE valid.
JIRA NVGPU-30
Change-Id: Ic8615f209a0c4eb6fa64af9abadcfb3b2c11ee73
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1510447
Reviewed-by: Automatic_Commit_Validation_User
Tested-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
A negative value in the timeout duration does not have any special uses,
so change the duration type to u32 (from just int). Delete some
unnecessary typecasts to int.
Also change MAX_SCHEDULE_TIMEOUT to ULONG_MAX in default gr idle timeout
because the value is in milliseconds instead of scheduling units and to
drop unnecessary Linux dependency.
Change-Id: I5cf6febd4f1cb00c46fe159603436a9ac3b003ac
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master/r/1512565
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Ensure that all debug prints are consistent from chip to chip
and function to function. The following maps letters in the
debug print to their meaning:
C Mapping is cachable
v Mapping is volatile
S Mapping is sparse
P Mapping is private (VPR/WPR)
c Mapping is coherent
V Mapping is valid
JIRA NVGPU-30
Change-Id: Ia890af88677c3e6d3fdd8c4fe266158c35b8afcd
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master/r/1514903
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Tested-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>