gr_*__set_alpha_circular_buffer_size() left max_batches field of
gr_pd_ab_dist_cfg1_r as 0 which results in too many alpha beta
transitions and poor performance when tessellation or geometry
shaders are used
Change-Id: If18feb1119e9672005455155dc56337cd444a1f1
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: http://git-master/r/735476
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The message "per-write ctx patch begin?" is a legacy message for warning
about probably inefficient code, but it's written at error loglevel.
Silence it out a bit by using gk20a_dbg_info(). The inefficient paths
can be fixed later.
Bug 200075565
Change-Id: Idae821aef3001ea5016de22a1a87fec747c42d31
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/734248
The channel sync object can get deleted before all channel updates have
finished if the channel is freed before them, so work around a null
dereference by testing if the sync exists. Channel and/or c->sync
refcounting would be necessary for proper fix.
Bug 200076344
Change-Id: Ica8ef2df9cd95cfa593cd4f41768dbb6641357b2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/734266
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
4b6f83704f054f5b21e05873fa5862c667a9992e tried to fix ACR related
leak. It fell short, because the data structures related were local
and thus the leak was not really fixed.
This patch stores the ACR ucode blob in a global variable, which
survives across rail gating.
Change-Id: Iec3ac9d41156baa26048e079732568c0a95264f4
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/733732
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Move the fifo engine activity disabling and wait-for-idle from the
lowest-level functions higher, into the ioctl path of zbc operations, so
that the sw initialization path wouldn't call them. During the init
path, the disable isn't necessary, and the code path could result in a
deadlock in the fifo runlist mutex.
Change-Id: Icf5c270ba29bc1c7f88874fba2d176d68e11278a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/733668
Added delays definitions to GPCPLL parameters structure:
- locking timeout delay (applied to locking in fixed frequency mode and
to PLL dynamic ramp in any mode)
- lock delay for GPCPLL NA mode
- IDDQ exit delay in any mode
Specified delay parameters for GM20B PLL, and used this data instead of
hard-coded numbers.
Change-Id: I63ce0abc9ee900c36ec34b8641513db3cbb6f7d5
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/732094
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
- Added GPU voltage debug print to the initial locking of GPCPLL under
bypass (available only when GPCPLL is in NA mode).
- Added /sys/kernel/debug/gpu.0/voltage debugfs node to read voltage
through GPCPLL (available only when GPCPLL is in NA mode).
Change-Id: I6643ad4d1b228ec4cbc4ff5e8716cce3ef9dccfc
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/731572
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
This removes all direct access to the MC registers. This requires
that the MC be loaded before the GPU.
Bug 1540908
Change-Id: I90bcde62f65a0c0d73a2bbe92cbf4a980c671c7d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/453653
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Supriya Sharatkumar <ssharatkumar@nvidia.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This reverts commit 259842f9d222dd2ca2e66bddaceef4a2fd626bc7.
The commit clears some init values that are never restored.
Change-Id: I4efee115863cbfb08b2e280a58b525cb49adc0b6
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/732428
GPU does not need to powered up if user space calls kernel and there
is no new work to be done.
Bug 1623918
Change-Id: I531aa7033530ae652d13684d8f8568a0e05fc2e1
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/732748
While writing to sysfs "tpc_fs_mask", we need to have
GPU initialized (we need to have called gk20a_busy()
at least once before)
If this is not happened yet, then return error
Bug 1456969
Change-Id: I09db6bcaa44b8939246cb5ed1205f3fbc0ee0552
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/731327
(cherry picked from commit 0dbbcf60bbad6b9a31392d2290a3e26c5daa1e5d)
Reviewed-on: http://git-master/r/731671
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
We call prepare_ucode_blob() once each time we un-railgate. We
allocate prepare the header for ACR ucode there, but the header
never gets freed.
Allocate and prepare the ACR header only once.
Change-Id: I948da8b47d6bb2fa021868d7038d2cc35eccb460
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/729745
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Return zero for missing sgt instead of attempting to dereference NULL.
Those NULL conditions should be almost nonexistent, and zero is not
normally used.
When reading gk20a_mem_phys() in gk20a_gr_get_chid_from_ctx() from an
isr, the mem desc may race with channel deletion and get suddendly
zeroed, even if the channel's in_use flag would be set. Plain zero
results in expected behaviour.
Change-Id: Id8ce37798d6fd3ceeb96a3f521c82569fccf30aa
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/729006
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix the return code for both gk20a_ and gm20b_ltc_cbc_ctrl()
functions. Before a positive return woudl always happen. Now,
if there's a timeout -EBUSY is returned.
Change-Id: Id76dc44af1376fceebf5043afb057c153cb0752e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/729165
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The flush timeout should have been comparing between the current
time (jiffies) not the snapshot in time when the L2 flush started.
Change-Id: Idba0ccbfeeab9e3fadd0b5bed7073acefbd403e3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/729090
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: Ib70db4dff782176ed7f92b6809c8415b8c35abe1
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/721120
We have below race condition during __gk20a_do_idle()
and force_reset case :
- before execution of __gk20a_do_idle(), a process drops the last
usage count of GPU, which triggers GPU railgate process
- but before GPU is really railgated (there is 500 mS delay),
some process calls __gk20a_do_idle()
- in __gk20a_do_idle(), we first take railgate_lock
- then we check if GPU is already railgated or not
- since it is not railgated yet (due to 500 mS delay), this
returns false
- then we call pm_runtime_get_noresume() which just increases the
usage counter
- in this particular case, this call just increases usage count to
1 from 0, but whereas GPU is already on its way to railgate
- while we check if GPU usage count drops to one, GPU gets railgated
- now if we have force_reset=true case, we will end up calling
pm_runtime_get_sync() which will take railgate_lock lock _again_
and try to unrailgate GPU
- this causes a deadlock on railgate_lock
To fix this, use below sequence :
- take railgate_lock
- check if GPU is already railgated
- release railgate_lock
- call pm_runtime_get_sync() which will keep GPU active even if
railgating is already triggered
- take railgate_lock again to prevent unrailgate in futher process
Also, add more descriptive comments to explain the flow
Bug 1624537
Change-Id: I0febc65d7bfac03ee738be200cf321322ffbe5a6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/719625
(cherry picked from commit 480284eda16e2b50ee6368bad3d15574e098b231)
Reviewed-on: http://git-master/r/719620
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
If the clock is null, calling the reset function will crash the
kernel. So, don't call the reset function.
Change-Id: I37ef25c8dca67bec8bf6654eb6e275b866bdae53
Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com>
Reviewed-on: http://git-master/r/742361
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Implement a new buddy allocation scheme for the GPU's VA space.
The bitmap allocator was using too much memory and is not a scaleable
solution as the GPU's address space keeps getting bigger. The buddy
allocation scheme is much more memory efficient when the majority
of the address space is not allocated.
The buddy allocator is not constrained by the notion of a split
address space. The bitmap allocator could only manage either small
pages or large pages but not both at the same time. Thus the bottom
of the address space was for small pages, the top for large pages.
Although, that split is not removed quite yet, the new allocator
enables that to happen.
The buddy allocator is also very scalable. It manages the relatively
small comptag space to the enormous GPU VA space and everything in
between. This is important since the GPU has lots of different sized
spaces that need managing.
Currently there are certain limitations. For one the allocator does
not handle the fixed allocations from CUDA very well. It can do so
but with certain caveats. The PTE page size is always set to small.
This means the BA may place other small page allocations in the
buddies around the fixed allocation. It does this to avoid having
large and small page allocations in the same PDE.
Change-Id: I501cd15af03611536490137331d43761c402c7f9
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740694
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
On linsim, when the push buffers are allowed to be allocated with small
pages above 4GB the simulator crashes. This patch ensures that for
linsim all small page allocations are forced to be below 4GB in the
GPU VA space. By doing so the simulator no longer crashes.
This bug has come up because the GPU buddy allocator work generates
allocations at the top of the address space first. Thus push buffers
were located at between 12GB and 16GB in the GPU VA space.
Change-Id: Iaef0af3fda3f37ac09a66b5e1179527d6fe08ccc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740728
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The number of entries in the next level PDE data structure was one
half of what was needed since the bit shift was 1 bit too small.
Change-Id: Id4981f230dd206ae94336cddab117312e143e6a1
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740727
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reduce the BAR1 size in the kernel to match the reserved size in the
DTB. This caused problems for the buddy allocator since the allocator
can sometimes allocate from higher memory before lower memory in the
managed space. This would cause the kernel to access unmapped memory.
Change-Id: I70b72ef5bb4db01253e5087757051ef852e99bc6
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740726
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Record size of each page table level. The size of level 0 depends
on size of the address space, and we generally do not support the
whole address space.
Change-Id: Iab47505af1a641e193d9e98a2246e522813f221a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/729730
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-on: http://git-master/r/737531
Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: Idf51831e8be9cabe1ab9122b18317137fde6339f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/721030
Reviewed-on: http://git-master/r/737530
Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
The channel teardown process sends a WFI method to ensure that all
work has been completed. But we also preempt the channel a while
later, which also ensures that all work is completed.
Remove the code for submitting WFI, and rely on preemption to handle
idling the pipe.
Change-Id: I2af029184440ee73e70d377f15690ddaf9b8599f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/735067
Reviewed-on: http://git-master/r/737527
Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Add posting a channel event whenever we do a wakeup due to semaphore.
Change-Id: Id1765123de93bcbc0822af7926d7f4e9919ffe10
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/726420
When mapping buffer on a fixed address, ensure that the alignment of
buffer and the address are compabile. When freeing, retrieve page
size from the VA instead of choosing it again.
Bug 1605769
Change-Id: I4f73453996cd53a912b6a414caa41563cde28da7
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/725764
Define smallest compressible page size per SoC, and use that for
determining if a compressible kind should be downgraded to
uncompressed.
Bug 1605769
Change-Id: I7c9991ba0ae82fe533641f045e506c0b01a10d8b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/724492
Sparse buffers were allowed only with big pages. That restriction is
not necessary, so remove it.
Bug 1605769
Change-Id: I92efc0efe80edccead47b47d33fd9a75c921ca9a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/725763
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I10c226e2377aa867a5cf11be61d08a9d67206b1d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720507
Move gk20a_idle() under error check in NVGPU_GPU_IOCTL_ZBC_SET_TABLE so
that if gk20a_busy fails, the idle is skipped properly.
Change-Id: Iffde3734f7fb121e1bc7838a67bfee3dacfd0a46
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/726104
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Spew debug lines in case we get a priv ring error.
Change-Id: Iba46813a355b5d2d192614a9e146397688e130a7
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/660850