Bug 200066741
ACR ucode has mechanism to skip WPR blob copy for second time,
in case WPR size is sent as 0 to acr ucode.
With above there is a saving of around 0.5 ms, but, in
conjunction with acr change to disable LS sig verification,
and scrubbing empty spaces in WPR sections to 0. This change
can reduce railgate exit latency by 4ms
ACR ucodes to be checked in main, as a different CL, and after
getting prod signs for ACR
Change-Id: I9d662027abf0b2615176d17433ff3ec3ae53d78a
Signed-off-by: Supriya <ssharatkumar@nvidia.com>
Reviewed-on: http://git-master/r/681892
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Make recovery a more straightforward process. When we detect a fault,
trigger MMU fault, and wait for it to trigger, and complete recovery.
Also reset engines before aborting channel to ensure no stray sync
point increments can happen.
Change-Id: Iac685db6534cb64fe62d9fb452391f43100f2999
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/709060
(cherry picked from commit 95c62ffd9ac30a0d2eb88d033dcc6e6ff25efd6f)
Reviewed-on: http://git-master/r/707443
Add the ioctl to open a new gpu channel to also the control node for
improved process startup performance, in addition to the current open
ioctl in the channel node. The new channel fd creation is refactored to
a separate function which is called from both ctrl and channel ioctls.
Bug 1604952
Change-Id: I3357ceec694c0e6d7a85807183884324cb725d3a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/679516
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Use busy looping on L2 and TLB maintenance operations. This speeds
them up by an order of magnitude.
Add also trace points to measure performance for memory ops and
interrupt processing.
Change-Id: Ic4a8525d3d946b2b8f57b4b8ddcfc61605619399
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/681640
Pass vm instead of as share to the userspace buffer mapping functions,
since they need to be called also from other places than just the AS
device ioctls, and as share is specific to them.
Bug 1573150
Change-Id: I994872f23ea7b1582361f3f4fabbd64b4786419c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/674020
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Compression page size varies depending on architecture. Make it
129kB on gk20a and gm20b.
Also export some common functions from gm20b.
Bug 1592495
Change-Id: Ifb1c5b15d25fa961dab097021080055fc385fecd
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/673790
Decrement the ref count on all mapped_buffers belonging to a va_node
when a va_node is freed. This prevents userspace from leaking some
mapped_buffers in some cases.
This does prevent userspace from keeping a buffer around after freeing
a space allocation if the buffer in question is not otherwise ref
counted. Not sure if this is a bad thing for userspace or not.
Bug 1600686
Change-Id: I659ccbda5935d44086fd367bd2110f7d0f066194
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/676629
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Use memcpy for copying gpfifo inputs into the gpfifo ring buffer.
This speeds up one command buffer heavy benchmark from 82 FPS to 86
FPS. Speed up is due to a) faster memory move and b) zero tracing
overhead when PB tracing is disabled.
Bug 1550886
Change-Id: If95ebff53745bbf59edeac32ad4f32f10f1ea7ee
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Reviewed-on: http://git-master/r/676967
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Handle the gr and fifo exceptions delivered from the server
and update the channel state as needed.
Bug 1551865
Change-Id: Ie19626c6e8a72f92ffd134983fe6d84e5c6c8736
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/670329
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add a couple of trace points for tracking when we need to wait for
space in the gpfifo ring buffer. This wait can introduce significant
latencies to rendering with large gpfifo entry inputs so it's good to
be able to measure how often this path is taken.
Bug 1592391
Bug 1550886
Change-Id: I7f362e9c307eeffeeecaaba268ef2e3613e54597
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Reviewed-on: http://git-master/r/674021
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix panics when using regops when PMU is disabled, or when whitelists
have not been defined.
Bug 1592505
Change-Id: I316c98147c54be7b1114ad23049ce3a634d4805e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/671841
Make gr_gm20b_alloc_gr_ctx static. It is used only in the same file.
Bug 200067946
Change-Id: I484ff84ebe9a356f251db5a14ca0e60db64578bf
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/673267
Implement several fixes for allowing the GVA address space to grow
to larger than 32GB and increase the address space to 128GB.
o Implement dynamic allocation of PDE backing pages. The memory
to store the PDE entries was hard coded to 1 page. Now the
number of pages necessary is computed dynamically based on the
size of the address space and the size of large pages.
o Fix an arithmetic problem in the gm20b sparse texture code
that caused large address spaces to be truncated when sparse
PDEs/PTEs were being filled in. This caused a kernel panic
when freeing the address space since a lot of the backing
PTE memory was not allocated.
o Change the address space split for large and small pages. Small
pages now occupy the bottom 16GB of the address space. Large
pages are used for the rest of the address space. Now, with a
128GB address space, there are 112GB of large page GVA available.
This patch exists to allow large (16GB) sparse textures to be allocated
without running into lack of memory issues and kernel panics.
Bug 1574267
Change-Id: I7c59ee54bd573dfc53b58c346156df37a85dfc22
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/671204
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
enables non-blocking interrupts in ce2 all other
ce2 interrupts are cleared and not handled.
bug 200036089
Change-Id: I9f47b06c677c72ac523019e6a3f70fedd07830a2
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/671783
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Free also newly allocated struct file in error conditions with fput, and
pair it by not trying to release the resulting null as_share on release.
Bug 1597056
Change-Id: Ifad5c3a829b2c459ed6a738ecdc1ac2ac7e1678a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/671527
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
CTA preemption needs to be enabled by setting a value in context. Set
it for gm20b.
Bug 200063473
Bug 1517461
Change-Id: I080cd71b348d08f834fd23ebbe7443dba79224db
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/661299
The default zbc entries were never populated in zbc HW table
because the conditional flag "gr->sw_ready" was always set thus
avoided the zbc default loading function call. Now zbc default
loading would happen only during boot time in sw structure.Hw
zbc regs would be loaded from that structure every time a
railgate exit happens.
Bug 1580210
Change-Id: Ie3e40738cbc84cf724c3f3871f15b17a5c84025a
Signed-off-by: Sujeet Baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/662306
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Tested-by: Lauri Peltonen <lpeltonen@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
if a request is submitted larger than the
allocated fifo, an error is returned immediately
rather than waiting for timeout while enough space
becomes available in the fifo (timeout
will not trigger in this case)
bug 1563401
Change-Id: I264dee2673dc8722034881f9e7db7bb137a8c0c8
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/665113
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move the debug dump to HAL and add a stub for vgpu.
Bug 1595164
Change-Id: Ifdcdd8a8caca7a41919dad075fee1c87032f53b0
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/662722
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
setup_buffer_kind_and_compression() expects vm->big_page_size
to be set, which was not done for the vgpu case.
Bug 200064162
Change-Id: I15af3600fda0161aad2185ec7a12b560044cc171
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/662721
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Switch to a larger integer type for priv_cmd_queue get/put/size
fields. The previous 16-bit int type overflowed on >= 2048 gpfifo
buffer sizes. This triggered a div-by-zero kernel panic.
Bug 1592391
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Change-Id: Ibffcbbd145f39fdb4a63d05b1dcb42bb4b101795
Reviewed-on: http://git-master/r/667103
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix a memory leak: add the newly created state to the dmabuf priv's
state list, instead of the other way around.
Bug 1594784
Bug 200064154
Change-Id: I939746a254bb8bf4d06de7fcecba06c191da665f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/668758
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Lauri Peltonen <lpeltonen@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Gpu channels may get spurious updates from at least nonstalling
semaphore wait interrupts. Protect data structure sanity by ignoring
releases on already released (= not in use) cde contexts.
Bug 200062826
Change-Id: I5940a7557e902bcfcff1a7e8e4593472d9ac306c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/666235
(cherry picked from commit 47dc2f41eb8054b099b6eb9a4a7d82c97295d415)
Reviewed-on: http://git-master/r/666657
GVS: Gerrit_Virtual_Submit
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
This must have occurred while rebasing dev-kernel-3.10
over kernel 3.18.
This change corrects the mistake.
Change-Id: I11fbc11105a032198828e8bc31da5ab92af0ffdb
Signed-off-by: Ishan Mittal <imittal@nvidia.com>
Reviewed-on: http://git-master/r/720240
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Add error prints in gk20a_do_idle() to narrow down
the failure point
Bug 200064302
Change-Id: Iffe1151bdc200a79b88e273b3b01523f8e46d130
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/664446
(cherry picked from commit bf1cd9b5551d27cb5cc468795cd147376f48e482)
Reviewed-on: http://git-master/r/666218
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Remove an undesired register from the regops whitelist on both
gk20a and gm20b.
Bug 1589732
Change-Id: I7747fafd3c2c32a9c5ce6388be73c7f61e509f0a
Signed-off-by: Matt Craighead <mcraighead@nvidia.com>
Reviewed-on: http://git-master/r/663373
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Bug 1587090
Bug 200050711
PMU dmem start address is unaligned.
Allocator allocates aligned length amount of memory
But address alloced is nto checked to be aligned, but
free checks for alignment of addresses before free.
For dmem case, frees never actually happened. This fix
ensures addresses are aligned.
Change-Id: I8b95f89940aa4d23355c3788dc95afb5c8867373
Signed-off-by: Supriya <ssharatkumar@nvidia.com>
Reviewed-on: http://git-master/r/663140
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove support for gk20a sparse textures. We're using implementation
from user space, so gk20a code is never invoked.
Also removes ref_cnt for PTEs, so we never free PTEs when unmapping
pages, but only at VM delete time.
Change-Id: I04d7d43d9bff23ee46fd0570ad189faece35dd14
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/663294