The driver is not properly tearing down the arbiter on the PCI driver
unload. This change makes sure that the workqueues are drained before
tearing down the driver
bug 200277762
JIRA: EVLR-1023
Change-Id: If98fd00e27949ba1569dd26e2af02b75897231a7
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1320147
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_channel_clean_up_jobs, move removal of job from channel's job list
to before fences are cleaned up; this will prevent gk20a_channel_abort from
asynchronously trying to dereference an already freed job.
Bug 1844305
JIRA EVLR-849
Change-Id: I1ba05237aa74be1350007630bfa5eba9988f859a
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
(cherry picked from commit 2a9ce58b1b318b95ecfcdf78462f918d090eab99)
Reviewed-on: http://git-master/r/1319026
(cherry picked from commit 990f070b0a363159ce1b21f936b7512f469018ca)
Reviewed-on: http://git-master/r/1321624
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move all programming of FB to fb_*.c files, and remove the inclusion
of FB hardware headers from other files.
TLB invalidate function took previously a pointer to VM, but the new
API takes only a PDB mem_desc, because FB does not need to know about
higher level VM.
GPC MMU is programmed from the same function as FB MMU, so added
dependency to GR hardware header to FB.
GP106 ACR was also triggering a VPR fetch, but that's not applicable
to dGPU, so removed that call.
Change-Id: I4eb69377ac3745da205907626cf60948b7c5392a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1321516
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move clock APIs from gk20a_platform to gpu_ops. At the same time
allow use of internal get_rate/set_rate for querying both GPCCLK
and PWRCLK on iGPU.
At the same time we can replace calls to clk framework with the
new HAL and drop direct dependency to clk framework.
gp10b ops were replaced as a whole at HAL initialization. That
replaces anything set in platform probe stage, so reduce that to
touch only clock gating regs.
Change-Id: Iaf219b1f000d362dbf397d45832f52d25463b31c
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1300113
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Platform files are used for adding code to probe for Tegra Linux
platform. Move the files to Tegra Linux directory to make this
clear.
Change-Id: Ida66af835688325f095260c618dad90395851267
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1300112
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
get_rate is already used for call-back that queries the last set
clock rate. This instance of get_rate actually measures the frequency
so renaming it to measure_freq.
At the same time modify to use hertz instead of megahertz. We use
fractional megahertz already in GPU.
Change-Id: I387473d6a6cbf3bb9b9e5a909677a1a725403c32
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1300111
Reviewed-by: Alex Waterman <alexw@nvidia.com>
In the timers code a macro was using __builtin_return_address(0)
when it should have been using _THIS_IP_.
__builtin_return_address(0) will cause the timers code to print
the return address of the function that calls the timers code. This
isn't actually useful, of course. A user actually cares about where
the timers code call comes from which is easily obtained with
_THIS_IP_.
Bug 1799159
Change-Id: Iac16bc79e89e4cd18133db3d20f5a50d4d5e8f31
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1320839
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The call site (gk20a_submit_prepare_syncs) owns the incr_cmd buffer
passed to __gk20a_channel_semaphore_incr. Delete the free in the error
path of the latter case to avoid freeing the same buffer twice.
Bug 1853519
Change-Id: I9b90ce7ebb17ac63992938c7f9fe90bbd139f85f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1321117
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Read fuse values from GPU's own fuse registers instead of Tegra fuse
registers whenever possible. This reduces the number of dependencies
to Linux fuse code.
Some fuses do not have a corresponding register in GPU, so they're
left as is.
Change-Id: Id9f2f4da897f3e20b20c300a67f705e3fa5ba35a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1318278
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Query the RM server to retrieve gpu preemption policy
of guest based on pct configuration. If guest is not
allowed to request wfi preemption mode then set context
with either gfxp or cta preemption mode only.
Jira VFND-3079
Jira VFND-3081
Change-Id: I60cbf121d6f0e2373568cf40b3dfdb4df76fe02d
Signed-off-by: Aparna Das <aparnad@nvidia.com>
Reviewed-on: http://git-master/r/1280903
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Sachit Kadle <skadle@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
dGPU's SEC2 is passed frequency, which is queried with
clk_get_rate(). dGPU clocks are not represented in CCF, so the query
always returned an error. The value is ignored, so this went
unnoticed.
Replace the call to clk_get_rate() by just hard coding 0 as the clock
rate.
Change-Id: I86fec3726d2b6683cdadd86cab1672f3b199378f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1319068
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Navneet Kumar <navneetk@nvidia.com>
In gk20a_suspend_all_sms(), we currently loop
over all GPCs and then loop over all TPCs in inner
loop
But this is incorrect and leads to SM with
invalid GPC,TPC ids
Fix this by looping over number of TPCs in each
GPC in inner loop
Also, fix gk20a_gr_wait_for_sm_lock_down() as
per below
- we right now wait infinitely for SM to lock down
- restrict this wait with a timeout on silicon
platforms
- return ETIMEDOUT instead of EAGAIN
- add more debug prints with additional data
for SM lock down failures
Bug 200258704
Change-Id: Id6fe32e579647fd8ac287a4b2ec80cbf98791e0d
Signed-off-by: Cory Perry <cperry@nvidia.com>
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1316471
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
JIRA: EVLR-1004
(*) Refactor the non-stalling interrupt path to execute clear on the
top half, so on dGPU case processing of stalling interrupts does not
block non-stalling one.
(*) Use a worker thread to do semaphore wakeups and allow batching of
the non-stalling operations.
(*) Fix a bug where some gpus will not properly track the completion
of interrupts, preventing safe driver unloads
Change-Id: Icc90a3acba544c97ec6a9285ab235d337ab9eefa
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1312796
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Navneet Kumar <navneetk@nvidia.com>
Add support for clearing single SM error state for CUDA debugger.
In addition to clearing local copy of SM error state,
vgpu_gr_clear_sm_error_state now sends a command to RM server
(TEGRA_VGPU_CMD_CLEAR_SM_ERROR_STATE), to clear global ESR and
warp ESR.
Bug 1791111
Change-Id: I3a1f0644787fd900ec59a0e7974037d46a603487
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1296311
(cherry picked from commit fd07e03c3d086f396e4d65575c576a4dd68c920a)
Reviewed-on: http://git-master/r/1299060
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add ability to suspend/resume contexts for a debug session
(NVGPU_DBG_GPU_IOCTL_SUSPEND_RESUME_CONTEXTS), in virtualized
case:
- added hal function to resume contexts.
- added vgpu support for suspend contexts, i.e. build a list
of channel ids, and send TEGRA_VGPU_CMD_SUSPEND_CONTEXTS
- added vgpu support for resume contexts, i.e. build a list
of channel ids, and send TEGRA_VGPU_CMD_RESUME_CONTEXTS
Bug 1791111
Change-Id: Icc1c00d94a94dab6384ac263fb811c00fa4b07bf
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1294761
(cherry picked from commit d17a38eda312ffa92ce92e5bafc30727a8b76c4e)
Reviewed-on: http://git-master/r/1299059
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a debugfs interface to profile the kickoff ioctl
it provides the probability distribution and separates the information
between time spent in: the full ioctl, the kickoff function, the amount
of time spent in job tracking and the amount of time doing pushbuffer
copies
JIRA: EVLR-1003
Change-Id: I9888b114c3fbced61b1cf134c79f7a8afce15f56
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1308997
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The Linux timer code prints an error message containing the caller when
a timeout happens. The caller is formatted as a function name instead of
a pointer, so don't print 0x in front of it.
Bug 1799159
Change-Id: I4f86d885aba78ca20ba025a91c8c940d3c927d58
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1315749
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
1) Trace buffer allocation now calls generic gmmu alloc/map.
so for dGPUs they are allocated in vidmem and iGPUs they
are allocated in sysmem
2) Use pmu surface mechanism to setup trace buffer params as
dGPU binaries follow falcon memory structure to get mem surface
params
3) Fix minor coverity issue by removing unnecessary overwrite
of count variable in trace print function
JIRA DNVGPU-217
Coverity ID 2431386
Change-Id: I2ae49a4e0450481cde2a778447c270a796681dad
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/1312404
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Handle pbus and priv stall interrupts first.
In general critical interrupts should be
handled before any other non critical ones.
-Dump info enabled with gpu_dbg_intr if priv_ring
interrupt is flagged by fmodel.
JIRA NVGPU-25
Change-Id: Iee767d8c9c933ceb57532c1b5a7fd7812daf1b6d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/1311273
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
trace_printk() does an extra stringify operation before calling
do_trace_printk(). The string ends up unused. This has an impact
to kernel even if we never end up using trace_printk().
Disable use of trace_printk() and introduce a Kconfig option for
re-enabling it.
Change-Id: I2e1014f5231f2089d7dc3cb2539e3eb5f4d58361
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1295298
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement kmem abstraction and tracking in nvgpu. The abstraction
helps move nvgpu's core code away from being Linux dependent and
allows kmem allocation tracking to be done for Linux and any other
OS supported by nvgpu.
Bug 1799159
Bug 1823380
Change-Id: Ieaae4ca1bbd1d4db4a1546616ab8b9fc53a4079d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1283828
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Change nvgpu_kalloc() to nvgpu_big_[mz]alloc(). This is necessary
since the natural free function name for this is nvgpu_kfree() but
that conflicts with nvgpu_k[mz]alloc() (implemented in a subsequent
patch).
This API exists becasue not all allocation sizes can be determined
at compile time and in some cases sizes may vary across the system
page size. Thus always using kmalloc() could lead to OOM errors due
to fragmentation. But always using vmalloc() is wastful of memory
for small allocations. This API tries to alleviate those problems.
Bug 1799159
Bug 1823380
Change-Id: I49ec5292ce13bcdecf112afbb4a0cfffeeb5ecfc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1283827
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove nvgpu_gpuid_t18x.h since this file is now visible. Migrate
the relevant definitions and defines into their expected places and
make the code use the real defines. No longer is hiding t18x specific
stuff necessary.
Bug 1799159
Change-Id: I47fa2392e46fdb7aacc70aeb0cc8c3f5ca0dc22f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1300976
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Added fecs trace characteristic flag to vgpu to determine for
platform support
Also moved fecs flag from ctxsw init to fecs init for native linux.
This makes flag init uniform across both platforms.
JIRA EVLR-993
Change-Id: Id62f31954cb5d7028ef01a199548f5f908b51eb0
Signed-off-by: Vishnu Reddy Mandalapu <vmandalapu@nvidia.com>
Reviewed-on: http://git-master/r/1305045
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement a worker thread to replace the delayed works in channel
watchdog and job cleanups. Watchdog runs by polling the channel states
periodically, and job cleanup is performed on channels that are appended
on a work queue consumed by the worker thread. Handling both of these
two in the same thread makes it impossible for them to cause a deadlock,
as has previously happened.
The watchdog takes references to channels during checking and possibly
recovering channels. Jobs in the cleanup queue have an additional
reference taken which is released after the channel is processed. The
worker is woken up from periodic sleep when channels are added to the
queue.
Currently, the queue is only used for job cleanups, but it is extendable
for other per-channel works too. The worker can also process other
periodic actions dependent on channels.
Neither the semantics of timeout handling or of job cleanups are yet
significantly changed - this patch only serializes them into one
background thread.
Each job that needs cleanup is tracked and holds a reference to its
channel and a power reference, and timeouts can only be processed on
channels that are tracked, so the thread will always be idle if the
system is going to be suspended, so there is currently no need to
explicitly suspend or stop it.
Bug 1848834
Bug 1851689
Bug 1814773
Bug 200270332
Jira NVGPU-21
Change-Id: I355101802f50841ea9bd8042a017f91c931d2dc7
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1297183
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
fifo_pbdma_status__size_1_v() and fifo_engine_status__size_1_v()
are not same for all gpus. Use litter value to calculate chip
specific fifo*status__size_1(v)
JIRA GV11B-45
Change-Id: I3d3d45bf79d15e14739fcc18cb1ca987669d5c11
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/1312688
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove restriction of setting compute preemption mode for channels
created with PASCAL_COMPUTE_A class only and allow it to be set
for PASCAL_A class too.
Also print compute preemption mode during channel closing.
Bug 200284575
Change-Id: I2de3b3acda128e91caa2ab0fd341915ce6e6520b
Signed-off-by: Sandeep Shinde <sashinde@nvidia.com>
Reviewed-on: http://git-master/r/1313286
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Donghan Ryu <dryu@nvidia.com>