Fix a check that was backwards for signaled sync_fences. This would cause
the code to not wait on some sync_fences that had not already signaled and
wait on other fences that had signaled.
Bug 1787348
Reviewed-on: http://git-master/r/1204710
(cherry picked from commit 75b94bb30f79c3a7a9992773dc8a93b507121006)
Change-Id: I00b0f8a373a9954a5ad9ab31aff6423e91574153
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221044
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move the submit synchornization code into it's own function. This should
help keep the submit code path a little more readable and understandable.
Bug 1732449
Reviewed-on: http://git-master/r/1203833
(cherry picked from commit f931c65c166aeca3b8fe2996dba4ea5133febc5a)
Change-Id: I4111252d242a4dbffe7f9c31e397a27b66403efc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221043
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Greatly simplify and make more robust the gpu semaphore detection
in sync_fences. Instead of using a magic number use the parent
timeline of sync_pts.
This will also work with multi-GPU setups using nvgpu since the
timeline ops pointer will be the same across all instances of
nvgpu.
Bug 1732449
Reviewed-on: http://git-master/r/1203834
(cherry picked from commit 66eeb577eae5d10741fd15f3659e843c70792cd6)
Change-Id: I4c6619d70b5531e2676e18d1330724e8f8b9bcb3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221042
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Only create sync-fences in the semaphore synchronization path
when they are actually needed (i.e requested by userspace).
Bug 1795076
Reviewed-on: http://git-master/r/1201564
(cherry picked from commit dc52d424a839e6c064c02b7f02905dd6a59a50af)
Change-Id: Ieac6aef415678d4ea982683a955897c64959436e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221041
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add new API set_vidmem_page_alloc() which sets BIT(0)
in sg_dma_address() only for vidmem allocation
Add and use new API get_vidmem_page_alloc() which
receives scatterlist and returns pointer to vidmem
allocation i.e. struct gk20a_page_alloc *alloc
In this API, check if BIT(0) is set or not in
sg_dma_address() before converting it to allocation
address
In gk20a_mm_smmu_vaddr_translate(), ensure that
the address is pure IOVA address by verifying that
BIT(0) is not set in that address
Jira DNVGPU-22
Change-Id: Ib53ff4b63ac59a8d870bc01d0af59839c6143334
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1216142
(cherry picked from commit 03c9fbdaa40746dc43335cd8fbe9f97ef2ef50c9)
Reviewed-on: http://git-master/r/1219705
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Use gk20a_gmmu_alloc() in gk20a_alloc_inst_block() so that
we always try to allocate all inst blocks in vidmem first
Also use common API gk20a_alloc_inst_block() in
channel_gk20a_alloc_inst() as well
Jira DNVGPU-22
Change-Id: I6c47c19aae1189d7e57f47a51d21a32e2df53c1f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1216140
(cherry picked from commit 6c84961a50eb8a8b080b2db08f87e58143f5a6e8)
Reviewed-on: http://git-master/r/1219704
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
While programming ucode's inst block in API
gr_gk20a_load_falcon_bind_instblk(), use gk20a_aperture_mask()
to select target address (i.e. if address is in sysmem or
vidmem) based on aperture
Also add target accessors for gr_fecs_new_ctx and
gr_fecs_arb_ctx_ptr
Jira DNVGPU-22
Change-Id: I88198080f188b349a4448a229dff8416a6a18073
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1216139
(cherry picked from commit 42bc14110df17400dd655bc994dc9e61c73048b1)
Reviewed-on: http://git-master/r/1219703
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gk20a_gmmu_alloc_map_attr(), which is used for in-kernel allocations
combined with immediate gmmu map, fall back to attempting to allocate
sysmem when vidmem allocation fails.
Bug 1809939
Change-Id: I4ec4fbf93d41fd9681166b47b3ecad24b51ea274
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1216814
(cherry picked from commit a9929682f1f356f7e8a652a2cec8ed73cc492448)
Reviewed-on: http://git-master/r/1217688
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Instead of blocking for gpfifo space in the nvgpu driver,
return -EAGAIN and allow userspace to decide the blocking
policy.
Bug 1795076
Change-Id: Ie091caa92aad3f68bc01a3456ad948e76883bc50
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1202591
(cherry picked from commit 8056f422c6a34a4239fc4993c40c2e517c932714)
Reviewed-on: http://git-master/r/1203800
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
When allocating discontiguous memory composed of several chunks,
update also the number of pages used by the current chunk, if a large
chunk was not available and a retry is performed with a smaller one.
Failing to do this would result in too few chunks reserved for a large
enough allocation in certain conditions.
Bug 1805067
Change-Id: I9d14864724d228b42c47eb4669fbe0f789334397
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1214914
(cherry picked from commit 9bece931b13e4dad808622462d4d98d421cfb383)
Reviewed-on: http://git-master/r/1220546
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
An empty list of soon-to-be-freed userspace vidmem buffers is not enough
to safely assume that an allocation may succeed or not if tried again,
because removal from the list and actually marking the memory freed is
not atomic. Fix this by using an atomic counter for the number of
pending frees (so that it's still safe to first remove from the job list
and then perform the free), and making allocation attempts combined with
a test of pending frees atomic.
This still does not guarantee that there is memory available (as the
actual amount of pending memory in bytes plus the current free amount
isn't computed), but removes the race that produces false negatives in
case a single program expects repeated frees and allocs to succeed.
Bug 1809939
Change-Id: I6a92da2e21cbf3f886b727000c924d56f35ce55b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1217078
(cherry picked from commit 83c1f1e70dccd92fdd4481132cf5b6717760d432)
Reviewed-on: http://git-master/r/1220545
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix cyclestats snapshots HAL entries in the vgpu case, need
to null out the ones that don't apply.
Bug 1700143
JIRA EVLR-278
Change-Id: I1b5f4652d1bf3283d96fdb3c2f66c4f69a9f6acc
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1217507
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
The channel timeout lock guards a very small critical section. Use a
spinlock instead of a mutex for performance.
Bug 1795076
Change-Id: I94940f3fbe84ed539bcf1bc76ca6ae7a0ef2fe13
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/1200803
(cherry picked from commit 4fa9e973da141067be145d9eba2ea74e96869dcd)
Reviewed-on: http://git-master/r/1203799
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
split pmu include files to add lot more APIs
pmu_api.h - all the current APIs used in igpu
pmu_common.h - common defines for all APIs
pmu_gk20a.h - SW defines specific needed for nvgpu
like PMU version, PMU SW structure definition etc.
Splitting APIs to separate files allows us to use auto
generated PMU task headers from RM
We have script which generates pmu interface herader files
in linux format. It replaces RM with NV. Adding typedef in existing pmu
code make auto generated files easy to compile/add
JIRA DNVGPU-85
Change-Id: I851b88769fe8d60561a44754ddb7dde45b45959e
Signed-off-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/1192702
Reviewed-on: http://git-master/r/1203124
(cherry picked from commit 0fe5f020c3f934cf2cc5336f1b6c3bafaf9e0c2a)
Reviewed-on: http://git-master/r/1217301
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allocate vidmem fds from 1024 onwards. This prevents us from
using up the 0-1023 range which is tracked per process, and
fits within FD_SETSIZE.
Bug 200222681
Change-Id: I104b81f2831f1816ff66fc245fa63013d78001ec
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1199269
(cherry picked from commit 5d5cbaf6a63dd31538fa35081b70e103d8a658f4)
Reviewed-on: http://git-master/r/1217294
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Unknown engine is expected, as we do not support all dGPU engines.
Remove the error spew.
JIRA DNVGPU-26
Change-Id: I6f7897c6ead168f1d8100421d16d0540a7f7b542
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1206449
(cherry picked from commit 4cc610755df94065afd28a90c63aca8fff9685b1)
Reviewed-on: http://git-master/r/1217292
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Add support for cyclestats snapshots in the virtual case
Bug 1700143
JIRA EVLR-278
Change-Id: I376a8804d57324f43eb16452d857a3b7bb0ecc90
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1211547
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We have completely different versions of probe for
nvgpu and pci device
Extract out common steps into nvgpu_probe() function
and separate it out in new file nvgpu_common.c
Divide task of nvgpu_probe() into further smaller
functions
Do platform specific things (like irq handling,
memresource management, power management) only in
individual probes and then call nvgpu_probe() to
complete the common initialization
Move all debugfs initialization to common gk20a_debug_init()
This also helps to bringup all debug nodes to pci device
Pass debugfs_symlink name as a parameter to gk20a_debug_init()
This allows us to set separate debugfs symlink for nvgpu
and pci device
In case of railgating, cde and ce debugfs, check if
platform supports them or not
Copy vidmem_is_vidmem from platform to mm structure
and set it to true for pci device
Return from gk20a_scale_init() if we don't have either of
governor or qos_notifier
Fix gk20a_alloc_debugfs_init() and gk20a_secure_page_alloc()
to receive device pointer instead of platform_device
Export gk20a_railgating_debugfs_init() so that we can call
it from gk20a_debug_init()
Jira DNVGPU-56
Jira DNVGPU-58
Change-Id: I3cc048082b0a1e57415a9fb8bfb9eec0f0a280cd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1204207
(cherry picked from commit add6bb0a3d5bd98131bbe6f62d4358d4d722b0fe)
Reviewed-on: http://git-master/r/1204462
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We have blocking 1sec wait for vidmem allocation
Remove this blocking wait and just return proper error
code to the caller
In case we have some buffers to be cleaned up in the
list (clear_list_head), return EAGAIN so that caller
can retry
Otherwise return ENOMEM indicating that no memory is
available right now
Jira DNVGPU-84
Change-Id: Ife2b17c989fc80e568f03bb18ad75b93a25be962
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1204969
(cherry picked from commit 2bacdf0bc6d5b1cdcb8be37e574ca5f4f0663cae)
Reviewed-on: http://git-master/r/1213451
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In pramin_access_batched(), in each iteration of the
loop we first decide size of data that we should write
in that iteration.
In case this size is equal to length of the chunk, we
need to move to use next chunk for subsequent iteration
But since we change offset variable before we check
above, we end up using same chunk in next iteration
Fix this by correcting the sequnce to first check if
we should move to next chunk and then only adjust
the offset variable
Jira DNVGPU-24
Change-Id: I58c2e24678f4c6dfbe33bf111edd06788629eca8
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1210892
(cherry picked from commit 83cc179199692d28a93b3b884c9bc094ff513298)
Reviewed-on: http://git-master/r/1213450
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Converting return value of sg_dma_address() (which is u64)
into a pointer results in compilation failure on 32 bit
machines
Hence convert address first into uintptr_t and then into
pointer
Change-Id: I8e036af8f4c936b88883cf8af1491f03025ed356
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1211243
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
While mapping the buffer, first check if buffer is in
vidmem, and if yes convert allocation into base address
And then walk through each chunk to decide the alignment
Add new API gk20a_mm_get_align() which returns the
alignment based on scatterlist and aperture, and use
this API to get alignment during mapping
Enable big page support for pci by unsetting disable_bigpage
Jira DNVGPU-97
Change-Id: I358dc98fac8103fdf9d2bde758e61b363fea9ae9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1207673
(cherry picked from commit d14d42290eed4aa7a2dd2be25e8e996917a58e82)
Reviewed-on: http://git-master/r/1210959
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new API gk20a_mem_get_base_addr() which will return vidmem
base address in case of vidmem and IOVA address in case of
sysmem
Even though vidmem allocations are non-contiguous, this API
is useful (and should only be used) for allocations with one
chunk (e.g. page tables)
Also, since page tables could either reside in sysmem or vidmem,
use this API to get address of page tables
Jira DNVGPU-20
Change-Id: Ie04af9ca7bfccfec1a8a8e4be2c507cef5cef8e1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1206403
(cherry picked from commit a8c74dc188878f2948fa1e0e47bf1837fba6c5e0)
Reviewed-on: http://git-master/r/1210957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Allocting blob space for pmu might need fixed address
allocation in vidmem and during boot up
But if some page tables are allocated before blob space,
blob space allocation could fail
Fix this by allocating blob space early during boot up
Jira DNVGPU-20
Change-Id: I30eca1023c8f8f8be101bb7e160ba57a7040911a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1206402
(cherry picked from commit fad4309ce345ed3879f497bda27f2eceb1084dbb)
Reviewed-on: http://git-master/r/1210956
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_fence_wait() api may be interrupted by a signal before actual
its timeout elapsed. This CL does retry (-ERESTARTSYS) mechanism
if gk20a_fence_wait() return before its timeout elapsed.
Bug 200230544
Change-Id: I347ed2004935a8b9413f95dcb6fca2b74bf49f2a
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1206265
(cherry picked from commit d3ef533942487785d84d109f985ae648eb3c2434)
Reviewed-on: http://git-master/r/1210955
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
API pramin_access_batched() currenly only supports contiguous
allocations.
Modify this API to support non-contiguous allocations from
page allocator as well
Update gk20a_mem_wr32() and gk20a_mem_rd32()to reuse
pramin_access_batched()
Use gk20a_memset() in gk20a_gmmu_free_attr_vid() to clear
vidmem pages for kernel buffers
Jira DNVGPU-30
Change-Id: I43630912f4837d8ebc6b9c58f4f427218ef9725b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1204303
(cherry picked from commit 2f84f141d02fd2f641cb18a48896fb3ae5f7e51f)
Reviewed-on: http://git-master/r/1210954
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Store allocator pointer for each mem_desc
This pointer should be used while freeing the mem
instead of assuming a common allocator
Add flag user_mem to mem_desc which will be set
only in case of User vidmem allocations
We will delay free of mem in worker only if this
flag is set on mem. Otherwise, we will free it
immediately
This is needed so that all kernel allocations
can work with both sysmem and vidmem
Jira DNVGPU-84
Change-Id: Ib9a9209b164bc56b7880448f86bd6d42b324cc86
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1203099
(cherry picked from commit 8f0b0122f36a0b6f1932fa9a98d7eb03b1f623d1)
Reviewed-on: http://git-master/r/1210953
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In __gk20a_alloc_pages(), if we fail to allocate a chunk
we free previously allocated chunks in error path
But we do not free up the memory reserved in those chunks
which could lead to OOM situations
Fix this by calling gk20a_free() for each chunk in error
path
Jira DNVGPU-96
Change-Id: I68aa18d68a5282405016e688c790ccbc0c2a0d69
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1203098
(cherry picked from commit f096bd1675600f4e2fc2d686f2911bb945fbbf0b)
Reviewed-on: http://git-master/r/1210952
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We clear buffers allocated in vidmem in buffer free path.
But to clear buffers, we need to submit CE jobs and this
could cause issues/races if free called from critical
path
Hence solve this by moving buffer clear/free to a worker
gk20a_gmmu_free_attr_vid() will now just put mem_desc into
a list and schedule a worker
And worker thread will traverse the list and clear/free
the allocations
In struct gk20a_vidmem_buf, mem variable is statically
allocated. But since we delay free of mem, convert this
variable into a pointer and allocate it dynamically
Since we delay free of vidmem memory, it is now possible
to face OOM conditions during allocations. Hence while
allocating block until we have sufficient memory
available with an upper limit of 1S
Jira DNVGPU-84
Change-Id: I7925590644afae50b6fc04c6e1e43bbaa1c220fd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1201346
(cherry picked from commit b4dec4a30de2431369d677acca00e420f8e581a5)
Reviewed-on: http://git-master/r/1210950
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
We currently clear vidmem pages in gk20a_gmmu_alloc_attr_vid_at()
i.e. allocation path for each buffer
But since buffer allocation path could be latency critical,
clear whole vidmem first and before first User allcation
in gk20a_vidmem_buf_alloc()
And then clear buffer pages while releasing the buffer
In this way, we can ensure that vidmem pages are already cleared
during buffer allocation path
At a later stage, clearing of pages can be removed from free path
and moved to a separate worker as well
At this point, first allocation has overhead of clearing whole
vidmem which takes about 380mS and this should improve once
clocks are raised.
Also, this is one time larency, and subsequent allocations
should not have any overhead for clearing at all
Add API gk20a_vidmem_clear_all() to clear whole vidmem
We have WPR buffers allocated during boot up and
at fixed address in vidmem.
To prevent overwriting to these buffers in gk20a_vidmem_clear_all(),
clear whole vidmem except for the bootstrap allocator carveout
Add new API gk20a_gmmu_clear_vidmem_mem() to clear one mem_desc
Jira DNVGPU-84
Change-Id: I5661700585c6241a6a1ddeb5b7c068d3d2aed4b3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1194301
(cherry picked from commit 950ab61a04290ea405968d8b0d03e3bd044ce83d)
Reviewed-on: http://git-master/r/1193158
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>