Fixed the following sparse warning by making the local function 'static':
- warning: symbol 'gm20b_load_falcon_ucode' was not declared.
Should it be static?
Bug 200067946
Change-Id: I11beaa301dc45dfec6f2295a6a96c1571e0264c9
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/766361
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-on: http://git-master/r/767991
Reviewed-by: Automatic_Commit_Validation_User
FB reset was added for gk20a. It should be invoked also on gm20b.
Change-Id: I0b074bc50a889108edae93d62b3194e54bfda881
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/765366
Implement support for privileged pages. Use them for kernel allocated buffers.
Change-Id: I720fc441008077b8e2ed218a7a685b8aab2258f0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/761919
bug 200080684
use new cmd defined in ucode for loading
GR falcons. flip PRIV load flag in lsb
header to indicate using dma. use pmu msg
as cmd completion for new cmd instead of
polling fecs mailbox. also move
check for using dma in non secure boot path
to hal.
Change-Id: I22582a705bd1ae0603f858e1fe200d72e6794a81
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/761625
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
When flushing L2 do not check status of L2s not present in system.
Change-Id: I95703689314c146f591fea0d85b1a484fdf82cf7
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/759267
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
For both adding and querying zbc entry, added callbacks in gr ops.
Native gpu driver (gk20a) and vgpu will both hook there. For vgpu, it
will add or query zbc entry from RM server.
Bug 1558561
Change-Id: If8a4850ecfbff41d8592664f5f93ad8c25f6fbce
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/732775
(cherry picked from commit a3787cf971128904c2712338087685b02673065d)
Reviewed-on: http://git-master/r/737880
(cherry picked from commit fca2a0457c968656dc29455608f35acab094d816)
Reviewed-on: http://git-master/r/753278
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
That is a kernel supporting code for cyclestats mode E.
Cyclestats mode E implemented following Windows-design in user-space
and required the following operations to be implemented:
- attach a client for shared hardware buffer of device
- detach client from shared hardware buffer
- flush means copy of available data from hardware buffer to private
client buffers according to perfmon IDs assigned for clients
- perfmon IDs management for user-space clients
- a NVGPU_GPU_FLAGS_SUPPORT_CYCLE_STATS_SNAPSHOT capability added
Bug 1573150
Change-Id: I9e09f0fbb2be5a95c47e6d80a2e23fa839b46f9a
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/740653
(cherry picked from commit 79fe89fd4cea39d8ab9dbef0558cd806ddfda87f)
Reviewed-on: http://git-master/r/753274
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This reverts commit ce1cf06b9a8eb6314ba0ca294e8cb430e1e141c0 since
it causes GPU pbdma interrupt to be generated.
Bug 200106514
Change-Id: If3ed9a914c4e3e7f3f98c6609c6dbf57e1eb9aad
Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/749291
Create a HAL for waiting for GR to become quiet. Use it forall cases
where we require GR to be quiet, but where it does not need to be
idle.
Bug 1640378
Change-Id: Ic0222d595a2d049e0fa8864b069ab94a97fac143
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/745640
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
bug 200080684
keeping it disabled by default
also trimming the code by removing redundant
variable to check recovery. pmu quick wait
now checks only for irqs which are serviced
by kernel. requests pmu to bit bang gpccs
ucode.
Change-Id: I12ef23d6d59b507e86a129b69eab65b21d0438c6
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/729622
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allow querying and setting default betacb size via debugfs. For global buffers
the value takes effect upon first boot of GPU, and has no effect after that.
Bug 1628352
Change-Id: Ib63f4299249c41eab1b36cc501b525cc54211195
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/733328
gr_*__set_alpha_circular_buffer_size() left max_batches field of
gr_pd_ab_dist_cfg1_r as 0 which results in too many alpha beta
transitions and poor performance when tessellation or geometry
shaders are used
Change-Id: If18feb1119e9672005455155dc56337cd444a1f1
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: http://git-master/r/735476
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
4b6f83704f054f5b21e05873fa5862c667a9992e tried to fix ACR related
leak. It fell short, because the data structures related were local
and thus the leak was not really fixed.
This patch stores the ACR ucode blob in a global variable, which
survives across rail gating.
Change-Id: Iec3ac9d41156baa26048e079732568c0a95264f4
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/733732
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Added delays definitions to GPCPLL parameters structure:
- locking timeout delay (applied to locking in fixed frequency mode and
to PLL dynamic ramp in any mode)
- lock delay for GPCPLL NA mode
- IDDQ exit delay in any mode
Specified delay parameters for GM20B PLL, and used this data instead of
hard-coded numbers.
Change-Id: I63ce0abc9ee900c36ec34b8641513db3cbb6f7d5
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/732094
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
- Added GPU voltage debug print to the initial locking of GPCPLL under
bypass (available only when GPCPLL is in NA mode).
- Added /sys/kernel/debug/gpu.0/voltage debugfs node to read voltage
through GPCPLL (available only when GPCPLL is in NA mode).
Change-Id: I6643ad4d1b228ec4cbc4ff5e8716cce3ef9dccfc
Signed-off-by: Alex Frid <afrid@nvidia.com>
Reviewed-on: http://git-master/r/731572
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
This removes all direct access to the MC registers. This requires
that the MC be loaded before the GPU.
Bug 1540908
Change-Id: I90bcde62f65a0c0d73a2bbe92cbf4a980c671c7d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/453653
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Supriya Sharatkumar <ssharatkumar@nvidia.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We call prepare_ucode_blob() once each time we un-railgate. We
allocate prepare the header for ACR ucode there, but the header
never gets freed.
Allocate and prepare the ACR header only once.
Change-Id: I948da8b47d6bb2fa021868d7038d2cc35eccb460
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/729745
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Fix the return code for both gk20a_ and gm20b_ltc_cbc_ctrl()
functions. Before a positive return woudl always happen. Now,
if there's a timeout -EBUSY is returned.
Change-Id: Id76dc44af1376fceebf5043afb057c153cb0752e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/729165
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The flush timeout should have been comparing between the current
time (jiffies) not the snapshot in time when the L2 flush started.
Change-Id: Idba0ccbfeeab9e3fadd0b5bed7073acefbd403e3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/729090
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: Ib70db4dff782176ed7f92b6809c8415b8c35abe1
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/721120
Implement a new buddy allocation scheme for the GPU's VA space.
The bitmap allocator was using too much memory and is not a scaleable
solution as the GPU's address space keeps getting bigger. The buddy
allocation scheme is much more memory efficient when the majority
of the address space is not allocated.
The buddy allocator is not constrained by the notion of a split
address space. The bitmap allocator could only manage either small
pages or large pages but not both at the same time. Thus the bottom
of the address space was for small pages, the top for large pages.
Although, that split is not removed quite yet, the new allocator
enables that to happen.
The buddy allocator is also very scalable. It manages the relatively
small comptag space to the enormous GPU VA space and everything in
between. This is important since the GPU has lots of different sized
spaces that need managing.
Currently there are certain limitations. For one the allocator does
not handle the fixed allocations from CUDA very well. It can do so
but with certain caveats. The PTE page size is always set to small.
This means the BA may place other small page allocations in the
buddies around the fixed allocation. It does this to avoid having
large and small page allocations in the same PDE.
Change-Id: I501cd15af03611536490137331d43761c402c7f9
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/740694
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Define smallest compressible page size per SoC, and use that for
determining if a compressible kind should be downgraded to
uncompressed.
Bug 1605769
Change-Id: I7c9991ba0ae82fe533641f045e506c0b01a10d8b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/724492
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I10c226e2377aa867a5cf11be61d08a9d67206b1d
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720507
Fix the following sparse warning:
-drivers/gpu/nvgpu/gk20a/gr_gk20a.c:6028:6: warning: symbol
'gr_gk20a_init_sm_dsm_reg_info' was not declared. Should it be static?
-gr_gk20a.c:6174:6: warning: symbol 'gr_gk20a_get_sm_dsm_perf_regs' was not
declared. Should it be static?
-gr_gk20a.c:6184:6: warning: symbol 'gr_gk20a_get_sm_dsm_perf_ctrl_regs' was
not declared. Should it be static?
-gr_gm20b.c:465:6: warning: symbol 'gr_gm20b_init_sm_dsm_reg_info' was not
declared. Should it be static?
-gr_gm20b.c:476:6: warning: symbol 'gr_gm20b_get_sm_dsm_perf_regs' was not
declared. Should it be static?
-gr_gm20b.c:486:6: warning: symbol 'gr_gm20b_get_sm_dsm_perf_ctrl_regs' was not
declared. Should it be static?
Bug 200067946
Change-Id: Ic14080e94e343386f4fef379152a623c402984fd
Signed-off-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-on: http://git-master/r/723518
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sachin Nikam <snikam@nvidia.com>
Add platform specific API pointer (*get_iova_addr)()
which can be used to get iova/physical address from
given scatterlist and flags
Use this API with g->ops.mm.get_iova_addr() instead
of calling API gk20a_mm_iova_addr() which makes it
platform specific
Bug 1605653
Change-Id: I798763db1501bd0b16e84daab68f6093a83caac2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713089
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I7c1662b669ed8c86465254f6001e536141051ee5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720435
Update extended buffer definition for Maxwell. On GM20B only PERF_CONTROL0 and
PERF_CONTROL5 registers are restored in extended buffer. They are needed for
stopping the counters as late as possible during ctx save and start them as
early as possible during context restore. On Maxwell, these registers contain
the enable/disable bit.
Bug 200086767
Change-Id: I59125a2f04bd0975be8a1ccecf993c9370f20337
Signed-off-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-on: http://git-master/r/717421
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
There were still a couple of places using sg_phys directly. Use new
gk20a_mem_phys() to make the code shorter.
Change-Id: I6eb9b14e0c14a27ec39bacd06ab24e31e99769ca
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/717502
Fixed the following sparse warnings by making below APIs static:
- gk20a.c: warning: symbol 'gk20a_pm_restore_debug_setting' was not declared.
Should it be static?
- gr_gk20a.c: warning: symbol 'gr_gk20a_rop_l2_en_mask' was not declared.
Should it be static?
- gr_gm20b.c: warning: symbol 'gr_gm20b_rop_l2_en_mask' was not declared.
Should it be static?
Bug 200067946
Change-Id: I334893bb6614171bff835d270716a7dd262c9ba7
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/718756
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.
Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
New members are added in nvgpu_gpu_characterstics to export more
information required specially from CUDA tools.
Change-Id: I907f3bcbd272405a13f47ef6236bc2cff01c6c80
Signed-off-by: Sujeet Baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/679202
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The current CUDA drivers have been using the regops to
directly accessing the GPU registers from user space through
the dbg node. This is a security hole and needs to be avoided.
The patch alternatively implements the similar functionality
in the kernel and provide an ioctl for it.
Bug 200083334
Change-Id: Ic5ff5a215cbabe7a46837bc4e15efcceb0df0367
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/711758
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently have below exceptions enabled but we do
not have any handler for them. So if any of these
exception is raised, we do not clear it.
NV_PGRAPH_EXCEPTION_PD
NV_PGRAPH_EXCEPTION_SCC
NV_PGRAPH_EXCEPTION_DS
NV_PGRAPH_EXCEPTION_MME
NV_PGRAPH_EXCEPTION_SKED
Hence do not enable above exceptions.
Bug 200078514
Change-Id: I0dd3a2299f80f3fe06994818f64151e7cc83a84e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/714166
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_gr_isr(), handle memfmt exception as below :
- read NV_PGRAPH_PRI_MEMFMT_HWW_ESR
- debug print for contents of above register
- write same value back to NV_PGRAPH_PRI_MEMFMT_HWW_ESR and
clear the exception
Bug 200078514
Change-Id: I5b9afacd7f99b5a37de953041582b3a53b863642
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/713713
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Added support for dumping vpr/wpr info for gm20b.
This dump info called when ever gk20a_mm_fb_flush
is timed-out.
Bug 200082817
Change-Id: I21b0372d0e3f976a189c9c428c015165b715bf88
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/711439
(cherry picked from commit b69897d71c8f6119b49ceb8d3273cdb354178cc5)
Reviewed-on: http://git-master/r/712675
GVS: Gerrit_Virtual_Submit
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Use busy looping for l2 tag flush and elpg flush
operations. This is making total flash time more
accurate and reduced overall time compared with
usleep. Also added trace points to measure
performance for these operations.
Also corrected timeout error check for non-silicon
platforms.
Bug 200081799
Change-Id: I63410bb7528db9258501633996fbdee5fdec1c74
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/710472
(cherry picked from commit 18684cf9d5d6870a1a1fd5711c4fc2d733caad20)
Reviewed-on: http://git-master/r/710986
GVS: Gerrit_Virtual_Submit
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Allow enabling of PC sampling hardware workaround. It is only
applicable to gm20b.
Bug 1517458
Bug 1573150
Change-Id: Iad6a3ae556489fb7ab9628637d291849d2cd98ea
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/710421