A wmb() next to each gk20a_mem_wr32() via PRAMIN may be overly careful,
so support not inserting these barriers for performance, in cases where
they are not necessary, where the caller would do an explicit barrier
after a bunch of reads.
Also, move those optional wmb()s to be done at the end of the whole
internally batched write for gk20a_mem_{wr_n,memset} from the per-batch
subloops that may run multiple times.
Jira DNVGPU-23
Change-Id: I61ee65418335863110bca6f036b2e883b048c5c2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225149
(cherry picked from commit d2c40327d1995f76e8ab9cb4cd8c76407dabc6de)
Reviewed-on: http://git-master/r/1227474
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
wmb() should come after the writes to ensure that the writes have
completed before progressing.
Bug 1811382
Change-Id: I98fba317b1760240c0b5de531accf398fe69c9b3
Signed-off-by: Alex Waterman <alexw@nvidia.com>
(cherry picked from commit 1b1201b9c109061590e6e25260d7230ae2c89888)
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225251
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support for GPU railgating using common clock framework and Tegra
DVFS on k4.4
Bug: 200233943
Change-Id: Ief9afd7a5bf3f447e9b91ab181f26dcefff0a8c8
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-on: http://git-master/r/1232290
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Add NVGPU_GPU_IOCTL_GET_MEMORY_STATE to read the amount of free
device-local video memory, if applicable.
Some reserved fields are added to support different types of queries in
the future (e.g. context-local free amount).
Bug 1787771
Bug 200233138
Change-Id: Id5ffd02ad4d6ed3a6dc196541938573c27b340ac
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1223762
(cherry picked from commit 96221d96c7972c6387944603e974f7639d6dbe70)
Reviewed-on: http://git-master/r/1235980
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Amount of free space in the buddy allocator is computed from the
complete capacity minus currently used bytes.
The page allocator just queries its underlying allocator.
Bug 1787771
Bug 200233138
Change-Id: I9b6f5ef90119236a13de14e14cd0a3ee72144a11
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1223761
(cherry picked from commit 0b324a60ebdf67e793ade869c252a8ddd56c04f8)
Reviewed-on: http://git-master/r/1235979
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Change clears_pending to bytes_pending and track accordingly the number
of bytes to be freed instead of the number of buffers. This, atomically
combined with the amount of space in the allocator, is the total amount
of free memory available.
Bug 200233138
Change-Id: Ibbb4e80a32728781ba19a74307d8a8ac1a4d7431
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1231422
(cherry picked from commit 025e765f312c253b201ecf2dbbe0f4972fe1d4bc)
Reviewed-on: http://git-master/r/1235957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The boolean flag mm_gk20a.vidmem.cleared is shared across threads, so
mark it volatile to prevent compiler from wrongly optimizing accesses to
it.
Jira DNVGPU-84
Change-Id: I1fe66b26966685d3f74ed95ba53b198f810231b9
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1233016
(cherry picked from commit dc6c9db56ea8a5f55f28f97fdfc3c1ac60d8b195)
Reviewed-on: http://git-master/r/1235317
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
There are eight tiles per map tile register and
depending on how many tpcs are present, there is
a chance that s/w will be accessing un-allocated
memory for reading tile values from temp buffers.
Bug 1735760
Change-Id: I5c0e09ec75099aaf6ad03dde964b9e93c2dc2408
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/1221580
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
The lowest page table level may hold very few entries for mappings of
large pages, but a new page is allocated for each list of entries at the
lowest level, wasting memory and performance. Compact these so that the
new "allocation" of ptes is appended at the end of the previous
allocation, if there is space.
4 KB page is still the smallest size requested from the allocator; any
possible overhead in the allocator (e.g., internally allocating big
pages only) is not taken into account.
Bug 1736604
Change-Id: I03fb795cbc06c869fcf5f1b92def89a04583ee83
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1221841
(cherry picked from commit fa92017ed48e1d5f48c1a12c512641c6ce9924af)
Reviewed-on: http://git-master/r/1234996
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Protect the initial vidmem zeroing performed during the first userspace
alloc with a mutex, so that it blocks next concurrent users and is run
only once. Otherwise, multiple clears could end up running in parallel,
so that the next ones corrupt memory allocated by the thread that has
finished earlier and advanced to allocate and use memory.
Jira DNVGPU-84
Change-Id: If497749abf481b230835250191d011c4a9d1483b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1232461
(cherry picked from commit 79435a68e6d2713b78acdb0ec6f77cfd78651d7f)
Reviewed-on: http://git-master/r/1234990
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
A use-after-free scenario is possible where one thread in
gk20a_free_error_notifiers() is trying to free the error
notifier and another thread in gk20a_set_error_notifier()
is still using the error notifier
Fix this by introducing mutex error_notifier_mutex for
error notifier accesses
Take mutex in gk20a_free_error_notifiers() and in
gk20a_set_error_notifier() before accessing notifier
In gk20a_init_error_notifier(), set the pointer
ch->error_notifier_ref inside the mutex and only
after notifier is completely initialized
Bug 1824788
Change-Id: I47e1ab57d54f391799f5a0999840b663fd34585f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1233988
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix below sparse warning by including nvgpu_common.h
from nvgpu_common.c
nvgpu/drivers/gpu/nvgpu/nvgpu_common.c:105:5: warning: symbol
'nvgpu_probe' was not declared. Should it be static?
Bug 200088648
Change-Id: I81f20a5be1c16ba33d6c17a6c72836107878d1df
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1233960
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Suppress error message when nvgpu tries to load VBIOS overlay, but
one is not found. This situation is normal. This is done by moving
gk20a_request_firmware() to be nvgpu generic function
nvgpu_request_firmware(), and adding a NO_WARN flag to it.
Introduce also a NO_SOC flag to suppress attempt to load firmware
from SoC specific directory in addition to the chip specific
directory. Use it for dGPU firmware files.
Bug 200236777
Change-Id: I0294d3308f029a6a6d3c2effa579d5f69a91e418
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1223840
(cherry picked from commit cca44c3f010f15918cdd2259c15170ba1917828a)
Reviewed-on: http://git-master/r/1233353
GVS: Gerrit_Virtual_Submit
JIRA DNVGPU-118
move vidmem allocation for pmuboardobj to cmd specific
functions and do a copy of data from pmu incase of
getstatus. fixes for getstatus boardobjgrp implementation
and added one #define for rail id to make getstatus of vf table
more meaningful
Change-Id: I366a022c13e51e823116ce2354794babc48981a2
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/1209841
(cherry picked from commit 8c12599f801decc77bbc1acfd1937dfefb21f35e)
Reviewed-on: http://git-master/r/1231839
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add the known dGPU SKUs to the PCIe device id table, and remove the
wildcard ANY_GPU_ID wildcard. This makes nvgpu to not try to probe on unknown
GPUs.
JIRA DNVGPU-72
Change-Id: Ie32c3137e9fa89a9e6dcf1e578c0b9d7339d7e75
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1219129
(cherry picked from commit 5c56088fbf8cb815d8be3355ecbb597fb7bfc795)
Reviewed-on: http://git-master/r/1231042
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
bug 1809509
latest pmu now returns information about 3 queues
only. nvgpu pmu driver still support 5 queues to
be compatible with older firmware. handling this
properly
Change-Id: I4bc166712465f4b52537c97e6d254760c59e0d16
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/1215533
(cherry picked from commit c7428c031a095b2d42512b7a8a0a9d818290e376)
Reviewed-on: http://git-master/r/1231040
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move wmb() before the loop in pramin-accessed batch writes and use
writel_relaxed() directly, instead of calling gk20a_writel() that would
do wmb() on each iteration separately.
Jira DNVGPU-24
Change-Id: I4c1375a819266727f97e2f109d3132b5b0974ac6
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1213600
(cherry picked from commit 79e3e38e0c5384ababfd55b8e6cd9723eb8f7b66)
Reviewed-on: http://git-master/r/1184343
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Both gk20a_request_firmware() and its callers wrote an error when a
file could not be found. Remove the error in
gk20a_request_firmware().
JIRA DNVGPU-143
Change-Id: I74cb6a6774762732d7702f1eadbeef19dcb9a85e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1211612
(cherry picked from commit 818364189036c6732b19682debb63a033c6a6c2a)
Reviewed-on: http://git-master/r/1229491
GVS: Gerrit_Virtual_Submit
As the size of the golden_ctx_image is large,
the allocation may intermittently fail when using
kzalloc. Since we don't need physically continguous
memory, use vzalloc instead.
Bug 200231436
Change-Id: Ic2fb31dea94c8721832dc257334608e1fc283943
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1207172
(cherry picked from commit 994a7b162ec74518ae0f50dfb5ac197e44019992)
Reviewed-on: http://git-master/r/1229472
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
It attaches the neccesary namemap structures to the clock struct so we can enumerate the clock domains in the debugfs code in nvgpu-t18x.
the other is to add an accessor for the fields.
JIRA DNVGPU-98
Change-Id: I6e5c6e763b2b88daa1995f4136a9a7b33ea25b17
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1199083
Reviewed-on: http://git-master/r/1204016
(cherry picked from commit b9d95a45791b93ddc010d1aeddbe798d2a9705d4)
Reviewed-on: http://git-master/r/1227910
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Do not call load prod callbacks that are set to NULL.
Bug 1799537
Change-Id: Ie951fb71fa8eacd10623abcd058f32db59004c2e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1208467
(cherry picked from commit c020e16adfa2b2bc2e3e8d0c63527a6089c59906)
Reviewed-on: http://git-master/r/1227268
GVS: Gerrit_Virtual_Submit
It is possible to allocate larger size than user requested
e.g. If we allocate at 64k granularity, and user asks for
32k buffer, we end up allocating 64k chunk.
User still asks to map the buffer with size 32k and
hence we reserve mapping addresses only for 32k
But due to bug in mapping in update_gmmu_ptes_locked()
we end up creating mappings considering size of 64k
and corrupt some mappings
Fix this by considering min(chunk->length, map_size) while
mapping address range for a chunk
Also, map_size will be zero once we map all requested
address range. So bail out from the loop if map_size
is zero
Bug 1805064
Change-Id: I125d3ce261684dce7e679f9cb39198664f8937c4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1217755
(cherry picked from commit 3ee1c6bc0718fb8dd9a28a37eff43a2872bdd5c0)
Reviewed-on: http://git-master/r/1221775
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Fix the rmb() location of the rmb() in the buddy and bitmap allocators.
The previous fix was not quite right. The rmb() needs to be after the
init value is read so that any subsequent reads occur after the init
value is read. If this is not done then subsequent reads could be loaded
before the value of init is checked and possibly be invalid.
Bug 1811382
Change-Id: I6d1fa25cc16c5e19fd2769d489878afa2f8e3e35
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1221061
(cherry picked from commit f2ddb6c56e554c39733c8fc9ae870dfc12e47b44)
Reviewed-on: http://git-master/r/1223458
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Putting the wmb() before the write only ensures that any previous
writes are done. But this doesn't really do anything for the
writel_relaxed(). The point of the wmb() here is to ensure that
the write performed by the writel_relaxed() is actually done
before proceeding.
Bug 1811382
Change-Id: I7250ea074b8548c899acfd34d816de466cf53b6f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1216434
(cherry picked from commit c9aa02dc61138615d971902fe58dc6a113cdf00a)
Reviewed-on: http://git-master/r/1223457
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Make sure that all writes have been commited before allowing
the variable storing the init status to be seen as non-zero.
Pair this with a read memory barrier where the check for the
status is done.
Bug 1799159
Change-Id: I938dffdfc2f39187b0dad11b7e283381560961b4
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1211523
(cherry picked from commit 6dd673d24a93c05834c9d96d2022b359ced5b73b)
Reviewed-on: http://git-master/r/1223456
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Use a carveout for the WPR region in the VIDMEM.
Jira DNVGPU-84
Change-Id: I191ecc3bb317ae3af6b56f5970194e646c513964
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1208527
(cherry picked from commit 7edf74d7468dcff1f01cbd901d83aa0e32602f0e)
Reviewed-on: http://git-master/r/1223455
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement carveout support by just calling through to the buddy
allocator's carveout support.
Jira DNVGPU-84
Change-Id: I1940873394a4cbff0152f1b6c9c4fd659e0076e1
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1203392
(cherry picked from commit 499ee0407bf525e161a14cfb8bbbc101ac934329)
Reviewed-on: http://git-master/r/1223454
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Implement carveout support in the buddy allocator so that the WPR space in
the VIDMEM can be carved out. This is needed since the buddy allocator is
used internally by the page allocator which is what manages the VIDMEM space.
Jira DNVGPU-84
Change-Id: I864faa7e20fca5547cc3a8f85f1bc4c36af53ee0
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1203391
(cherry picked from commit a8a5fd265a8ae33093d144cd6ec5222e93280a0f)
Reviewed-on: http://git-master/r/1223453
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>