Add new API gr_gk20a_get_ctx_id() to get/extract
context id from GR context
Bug 200156699
Change-Id: If0e8887a9a6b139cd795bf03f5def64fd664d12b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927130
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Support preprocessing of SM exceptions if API
pointer pre_process_sm_exception() is defined
Also, expose some common APIs
Bug 200156699
Change-Id: I1303642c1c4403c520b62efb6fd83e95eaeb519b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925883
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Interleave all high priority channels between all other channels.
This reduces the latency for high priority work when there
are a lot of lower priority work present, imposing an upper
bound on the latency. Change the default high priority timeslice
from 5.2ms to 3.0 in the process, to prevent long running high priority
apps from hogging the GPU too much.
Introduce a new debugfs node to enable/disable high priority
channel interleaving. It is currently enabled by default.
Adds new runlist length max register, used for allocating
suitable sized runlist.
Limit the number of interleaved channels to 32.
This change reduces the maximum time a lower priority job
is running (one timeslice) before we check that high priority
jobs are running.
Tested with gles2_context_priority (still passes)
Basic sanity testing is done with graphics_submit
(one app is high priority)
Also more functional testing using lots of parallel runs with:
NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 –finish
plus multiple:
NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw
–drawsperframe 20000 –triangles 50 –runtime 30 -finish
Previous to this change, the relative performance between
high priority work and normal priority work comes down
to timeslice value. This means that when there are many
low priority channels, the high priority work will still
drop quite a lot. But with this change, the high priority
work will roughly get about half the entire GPU time, meaning
that after the initial lower performance, it is less likely
to get lower in performance due to more apps running on the system.
This change makes a large step towards real priority levels.
It is not perfect and there are no guarantees on anything,
but it is a step forwards without any additional CPU overhead
or other complications. It will also serve as a baseline to
judge other algorithms against.
Support for priorities with TSG is future work.
Support for interleave mid + high priority channels,
instead of just high, is also future work.
Bug 1419900
Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98
Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com>
Reviewed-on: http://git-master/r/805961
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
It'll detect dead semaphore acquire. The worst case is when
ACQUIRE_SWITCH is disabled, semaphore acquire will poll and
consume full gpu timeslicees.
The timeout value is set to half of channel WDT.
Bug 1636800
Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/928827
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
On Maxwell comptaglines are assigned per 128k, but preferred big page
size for graphics is 64k. Bit 16 of GPU VA is used for determining
which half of comptagline is used.
This creates problems if user space wants to map a page multiple times
and to arbitrary GPU VA. In one mapping the page might be mapped to
lower half of 128k comptagline, and in another mapping the page might
be mapped to upper half.
Turn on mode where MSB of comptagline in PTE is used instead of bit 16
for determining the comptagline lower/upper half selection.
Bug 1704834
Change-Id: If87e8f6ac0fc9c5624e80fa1ba2ceeb02781355b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/924322
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Fixed the following sparse warning by restricting the scope of
API's within file.
- gr_gk20a.c:7273: warning: symbol 'gr_gk20a_bpt_reg_info' was not declared.
Should it be static?
- gr_gm20b.c:1053: warning: symbol 'gr_gm20b_bpt_reg_info' was not declared.
Should it be static?
Bug 200088648
Change-Id: I63bba55b1432e4284c9074d2729a176f1767a83a
Signed-off-by: Amit Sharma <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/842260
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
SM locking & register reads Order has been changed.
Also, functions have been implemented based on gk20a
and gm20b.
Change-Id: Iaf720d088130f84c4b2ca318d9860194c07966e1
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Signed-off-by: ashutosh jain <ashutoshj@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/837236
Add new operaton g->ops.mm.set_debug_mode and let other places
that set debug mode call this callback.
It's preparing for adding vgpu set mmu debug mode hook.
JIRA VFND-1005
Bug 1594604
Change-Id: I1d227a0c0f96adb0035ae16ae1f4fbfa739bf0a7
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/833497
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
At zcull bind we disable whole GR engine. This is unnecessary, so
instead disable only the channel and make sure it's unloaded.
Introduces also an API in fifo_gk20a.c to do the channel disable.
gr_gk20a_ctx_zcull_setup() was always passed true as last parameter,
so remove parameter.
Change-Id: I7ae6e101ec7d1ab3f6ee4e9bcc442d23dbd21247
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/787570
Program clock slowdown to happen using gradual slowdown. It is
significantly faster than the default slowdown.
Change-Id: I9e5259889637fce2c0b083a424b54af12bb45c25
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/819698
Always use the PROD value for FE_GO_IDLE_TIMEOUT.
Change-Id: I455c03ae07b35a8999cd0995e458c421a10e7ca2
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/813958
To handle any of the pbdma interrupt, we currently write zero
to pbdma_method0 and then clear the interrupt
But this is insufficient since we cannot use same intr clear
method for all the interrupts
Hence, add intr specific routines to handle those interrupts
NV_PPBDMA_INTR_0_PBENTRY:
- fix the pb_header to have a null opcode
- fix the pbdma_method to have a valid nop
NV_PPBDMA_INTR_0_METHOD:
- fix the pbdma_method to have a valid nop
NV_PPBDMA_INTR_0_DEVICE:
- fix the pb_header to have a null opcode
- go through all pbdma_method0/1/2/3
-- if they contain host s/w methods, replace those
methods with a valid NOP
Bug 200134238
Change-Id: I10c284a6cdc1441f9d437cea65aae00d3c33a8c8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/814561
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
bug 1603226
host based timeouts use ptimer for detecting
timeouts. on gk20a and gm20b ptimer runs 2.6x
slower. scale the fifo_eng_timeout to account
for this
Change-Id: Ie44718382953e36436ea47d6e89b9a225d5c2070
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/799983
(cherry picked from commit d1d837fd09ff0f035feff1757c67488404c23cc6)
Reviewed-on: http://git-master/r/808250
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In case of CDE channel, T1 (Tex) unit needs to be promoted to 128B
aligned, otherwise causes a HW deadlock. Gpu driver makes changes in
FECS header which FECS uses to configure the T1 promotions to aligned
128B accesses.
Bug 200096226
Change-Id: I8a8deaf6fb91f4bbceacd491db7eb6f7bca5001b
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/804625
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In case of CDE channel, T1 (Tex) unit needs to be promoted to 128B
aligned, otherwise causes a HW deadlock. Gpu driver makes changes in
FECS header which FECS uses to configure the T1 promotions to aligned
128B accesses.
Bug 200096226
Change-Id: Ic006b2c7035bbeabe1081aeed968a6c6d11f9995
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/802327
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Cyclestats snapshot feature is expected for new devices.
The detection code was isolated in separate function and run-time
check added to validate/allow ioctl calls on the current GPU.
Bug 1674079
Change-Id: Icc2f1e5cc50d39b395d31d5292c314f99d67f3eb
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/781697
(cherry picked from commit bdd23136b182c933841f91dd2829061e278a46d4)
Reviewed-on: http://git-master/r/793630
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add gpu_ops for CDE, and add get_program_numbers function pointer for
determining horizontal and vertical CDE swizzler programs. This allows
different GPUs to have their own specific requirements for choosing
the CDE firmware programs.
Bug 1604102
Change-Id: Ib37c13abb017c8eb1c32adc8cbc6b5984488222e
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/784899
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fixed the following sparse warning by making the local function 'static':
- warning: symbol 'gm20b_load_falcon_ucode' was not declared.
Should it be static?
Bug 200067946
Change-Id: I11beaa301dc45dfec6f2295a6a96c1571e0264c9
Signed-off-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-on: http://git-master/r/766361
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-on: http://git-master/r/767991
Reviewed-by: Automatic_Commit_Validation_User
FB reset was added for gk20a. It should be invoked also on gm20b.
Change-Id: I0b074bc50a889108edae93d62b3194e54bfda881
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/765366
Implement support for privileged pages. Use them for kernel allocated buffers.
Change-Id: I720fc441008077b8e2ed218a7a685b8aab2258f0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/761919
bug 200080684
use new cmd defined in ucode for loading
GR falcons. flip PRIV load flag in lsb
header to indicate using dma. use pmu msg
as cmd completion for new cmd instead of
polling fecs mailbox. also move
check for using dma in non secure boot path
to hal.
Change-Id: I22582a705bd1ae0603f858e1fe200d72e6794a81
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/761625
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
When flushing L2 do not check status of L2s not present in system.
Change-Id: I95703689314c146f591fea0d85b1a484fdf82cf7
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/759267
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
For both adding and querying zbc entry, added callbacks in gr ops.
Native gpu driver (gk20a) and vgpu will both hook there. For vgpu, it
will add or query zbc entry from RM server.
Bug 1558561
Change-Id: If8a4850ecfbff41d8592664f5f93ad8c25f6fbce
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/732775
(cherry picked from commit a3787cf971128904c2712338087685b02673065d)
Reviewed-on: http://git-master/r/737880
(cherry picked from commit fca2a0457c968656dc29455608f35acab094d816)
Reviewed-on: http://git-master/r/753278
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
That is a kernel supporting code for cyclestats mode E.
Cyclestats mode E implemented following Windows-design in user-space
and required the following operations to be implemented:
- attach a client for shared hardware buffer of device
- detach client from shared hardware buffer
- flush means copy of available data from hardware buffer to private
client buffers according to perfmon IDs assigned for clients
- perfmon IDs management for user-space clients
- a NVGPU_GPU_FLAGS_SUPPORT_CYCLE_STATS_SNAPSHOT capability added
Bug 1573150
Change-Id: I9e09f0fbb2be5a95c47e6d80a2e23fa839b46f9a
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/740653
(cherry picked from commit 79fe89fd4cea39d8ab9dbef0558cd806ddfda87f)
Reviewed-on: http://git-master/r/753274
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This reverts commit ce1cf06b9a8eb6314ba0ca294e8cb430e1e141c0 since
it causes GPU pbdma interrupt to be generated.
Bug 200106514
Change-Id: If3ed9a914c4e3e7f3f98c6609c6dbf57e1eb9aad
Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/749291
Create a HAL for waiting for GR to become quiet. Use it forall cases
where we require GR to be quiet, but where it does not need to be
idle.
Bug 1640378
Change-Id: Ic0222d595a2d049e0fa8864b069ab94a97fac143
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/745640
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
bug 200080684
keeping it disabled by default
also trimming the code by removing redundant
variable to check recovery. pmu quick wait
now checks only for irqs which are serviced
by kernel. requests pmu to bit bang gpccs
ucode.
Change-Id: I12ef23d6d59b507e86a129b69eab65b21d0438c6
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/729622
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>