linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 09:12:24 +03:00

Author	SHA1	Message	Date
Alex Waterman	1fc57375eb	gpu: nvgpu: Validate buffer_offset argument Validate the mapping_size argument in the VM mapping IOCTL before attempting to use the argument for anything. Bug 1954931 Bug 1965443 Change-Id: I81b22dc566c6c6f89e5e62604ce996376b33a343 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1547046 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit `e68391690c` in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/1601531 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Bibek Basu <bbasu@nvidia.com>	2017-11-22 01:31:14 -08:00
David Nieto	abee92ab92	gpu: nvgpu: refactor teardown to support unbind This change refactors the teardown in remove to ensure that it is possible to unload the driver while leaving fds open. This is achieved by making sure that the SW state is kept alive till all fds are closed and by checking that subsequent calls to ioctls after the teardown fail. Normally, this would be achieved ny calls into gk20a_busy(), but in kickoff we dont call into that to reduce latency, so we need to check the driver status directly, and also in some of the functions as we need to make sure the ioctl does not dereference the device or platform struct bug 200277762 JIRA: EVLR-1023 Change-Id: I163e47a08c29d4d5b3ab79f0eb531ef234f40bde Signed-off-by: David Nieto <dmartineznie@nvidia.com> Reviewed-on: http://git-master/r/1320219 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> Reviewed-by: Shreshtha Sahu <ssahu@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> (cherry picked from commit `e0f2afe5eb`) Reviewed-on: http://git-master/r/1327755 GVS: Gerrit_Virtual_Submit Reviewed-by: Sumeet Gupta <sumeetg@nvidia.com>	2017-03-27 12:06:06 -07:00
Terje Bergstrom	ccccde66e7	gpu: nvgpu: Enable CE always All GPUs have a copy engine. So delete the flag has_ce, because it's always true. JIRA NVGPU-16 Change-Id: I89db74c7cf66b24db84301b79832862ef28100b9 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1325355 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> (cherry picked from commit `81660ab58c`) Reviewed-on: http://git-master/r/1328222 Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com> Tested-by: David Martinez Nieto <dmartineznie@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Sumeet Gupta <sumeetg@nvidia.com>	2017-03-27 12:06:06 -07:00
Alex Waterman	707ea45e0f	gpu: nvgpu: kmem abstraction and tracking Implement kmem abstraction and tracking in nvgpu. The abstraction helps move nvgpu's core code away from being Linux dependent and allows kmem allocation tracking to be done for Linux and any other OS supported by nvgpu. Bug 1799159 Bug 1823380 Change-Id: Ieaae4ca1bbd1d4db4a1546616ab8b9fc53a4079d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1283828 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-03-03 10:34:48 -08:00
Alex Waterman	3966efc2e5	gpu: nvgpu: Give nvgpu_kalloc a less generic name Change nvgpu_kalloc() to nvgpu_big_[mz]alloc(). This is necessary since the natural free function name for this is nvgpu_kfree() but that conflicts with nvgpu_k[mz]alloc() (implemented in a subsequent patch). This API exists becasue not all allocation sizes can be determined at compile time and in some cases sizes may vary across the system page size. Thus always using kmalloc() could lead to OOM errors due to fragmentation. But always using vmalloc() is wastful of memory for small allocations. This API tries to alleviate those problems. Bug 1799159 Bug 1823380 Change-Id: I49ec5292ce13bcdecf112afbb4a0cfffeeb5ecfc Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1283827 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-03-03 10:34:43 -08:00
Deepak Nibade	64e1782aee	gpu: nvgpu: optimize duplicate buffer lookup in case of fixed offsets In gk20a_vm_map_duplicate_locked(), we always do a linear search in rb-tree to find a duplicate entry of the buffer In case NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET is set, we first traverse whole rb-tree linearly and then compare offset_align with the address searched from rb-tree If size of rb-tree is very large this linear lookup takes upto 7mS and causes huge delays Hence in case of NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET, we can use offset_align to perform a binary search on rb-tree and then verify that dmabuf and kind match with the node obtained from the search This saves a lot of time per-lookup Bug 1874516 Change-Id: Ia4924b64d66e586c14341ae2e2283beac394bf6f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1309343 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-03-02 07:53:55 -08:00
Deepak Nibade	8cdb91c527	gpu: nvgpu: remove use of DEFINE_MUTEX() API DEFINE_MUTEX() is defined in Linux and might not be available in other OSs. Hence remove its usage from nvgpu Declare and explicitly initialize below mutexes for both nvgpu and vgpu g->mm.priv_lock g->mm.tlb_lock Jira NVGPU-13 Change-Id: If72885a6da0227a1552303206172f1f2b751471d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1298042 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-02-22 04:15:08 -08:00
Deepak Nibade	8ee3aa4b31	gpu: nvgpu: use common nvgpu mutex/spinlock APIs Instead of using Linux APIs for mutex and spinlocks directly, use new APIs defined in <nvgpu/lock.h> Replace Linux specific mutex/spinlock declaration, init, lock, unlock APIs with new APIs e.g struct mutex is replaced by struct nvgpu_mutex and mutex_lock() is replaced by nvgpu_mutex_acquire() And also include <nvgpu/lock.h> instead of including <linux/mutex.h> and <linux/spinlock.h> Add explicit nvgpu/lock.h includes to below files to fix complilation failures. gk20a/platform_gk20a.h include/nvgpu/allocator.h Jira NVGPU-13 Change-Id: I81a05d21ecdbd90c2076a9f0aefd0e40b215bd33 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1293187 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-02-22 04:15:02 -08:00
Alex Waterman	e7a0c0ae8b	gpu: nvgpu: Move from gk20a_ to nvgpu_ in semaphore code Change the prefix in the semaphore code to 'nvgpu_' since this code is global to all chips. Bug 1799159 Change-Id: Ic1f3e13428882019e5d1f547acfe95271cc10da5 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1284628 Reviewed-by: Varun Colbert <vcolbert@nvidia.com> Tested-by: Varun Colbert <vcolbert@nvidia.com>	2017-02-13 18:15:03 -08:00
Alex Waterman	aa36d3786a	gpu: nvgpu: Organize semaphore_gk20a.[ch] Move semaphore_gk20a.c drivers/gpu/nvgpu/common/ since the semaphore code is common to all chips. Move the semaphore_gk20a.h header file to drivers/gpu/nvgpu/include/nvgpu and rename it to semaphore.h. Also update all places where the header is inluced to use the new path. This revealed an odd location for the enum gk20a_mem_rw_flag. This should be in the mm headers. As a result many places that did not need anything semaphore related had to include the semaphore header file. Fixing this oddity allowed the semaphore include to be removed from many C files that did not need it. Bug 1799159 Change-Id: Ie017219acf34c4c481747323b9f3ac33e76e064c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1284627 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-02-13 18:14:45 -08:00
Alex Waterman	7e403974d3	gpu: nvgpu: Simplify ref-counting on VMs Simplify ref-counting on VMs: take a ref when a VM is bound to a channel and drop a ref when a channel is freed. Previously ref-counts were scattered over the driver. Also the CE and CDE code would bind channels with custom rolled code. This was because the gk20a_vm_bind_channel() function took an as_share as the VM argument (the VM was then inferred from that as_share). However, it is trivial to abtract that bit out and allow a central bind channel function that just takes a VM and a channel. Bug 1846718 Change-Id: I156aab259f6c7a2fa338408c6c4a3a464cd44a0c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1261886 Reviewed-by: Richard Zhao <rizhao@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-02-07 14:54:02 -08:00
Alex Waterman	95a3eb454c	gpu: nvgpu: Conditional address space unification Allow platforms to choose whether or not to have unified GPU VA spaces. This is useful for the dGPU where having a unified address space has no problems. On iGPUs testing issues is getting in the way of enabling this feature. Bug 1396644 Bug 1729947 Change-Id: I65985f1f9a818f4b06219715cc09619911e4824b Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265303 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-31 16:23:07 -08:00
Alex Waterman	b9b94c073c	gpu: nvgpu: Remove separate fixed address VMA Remove the special VMA that could be used for allocating fixed addresses. This feature was never used and is not worth maintaining. Bug 1396644 Bug 1729947 Change-Id: I06f92caa01623535516935acc03ce38dbdb0e318 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265302 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-31 16:23:07 -08:00
Alex Waterman	321537b8ed	gpu: nvgpu: Cleanup gk20a_init_vm() Cleanup and simplify the gk20a_init_vm() function to ease the implementation of a platform dependent address space unification decision. Bug 1396644 Bug 1729947 Change-Id: Id8487d0e3d3c65e3357e3528063fb17c8a85f7da Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265301 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-31 16:23:07 -08:00
Alex Waterman	d630f1d99f	gpu: nvgpu: Unify the small and large page address spaces The basic structure of this patch is to make the small page allocator and the large page allocator into pointers (where they used to be just structs). Then assign each of those pointers to the same actual allocator since the buddy allocator has supported mixed page sizes since its inception. For the rest of the driver some changes had to be made in order to actually support mixed pages in a single address space. 1. Unifying the allocation page size determination Since the allocation and map operations happen at distinct times both mapping and allocation of GVA space must agree on page size. This is because the allocation has to separate allocations into separate PDEs to avoid the necessity of supporting mixed PDEs. To this end a function __get_pte_size() was introduced which is used both by the balloc code and the core GPU MM code. It determines page size based only on the length of the mapping/ allocation. 2. Fixed address allocation + page size Similar to regular mappings/GVA allocations fixed address mapping page size determination had to be modified. In the past the address of the mapping determined page size since the address space split was by address (low addresses were small pages, high addresses large pages). Since that is no longer the case the page size field in the reserve memory ioctl is now honored by the mapping code. When, for instance, CUDA makes a memory reservation it specifies small or large pages. When CUDA requests mappings to be made within that address range the page size is then looked up in the reserved memory struct. Fixed address reservations were also modified to now always allocate at a PDE granularity (64M or 128M depending on large page size. This prevents non-fixed allocations from ending up in the same PDE and causing kernel panics or GMMU faults. 3. The rest... The rest of the changes are just by products of the above. Lots of places required minor updates to use a pointer to the GVA allocator struct instead of the struct itself. Lastly, this change is not truly complete. More work remains to be done in order to fully remove the notion that there was such a thing as separate address spaces for different page sizes. Basically after this patch what remains is cleanup and proper documentation. Bug 1396644 Bug 1729947 Change-Id: If51ab396a37ba16c69e434adb47edeef083dce57 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265300 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-31 16:23:07 -08:00
Alex Waterman	793791ebb7	gpu: nvgpu: use map_offset for PTE size computation Make sure that map_offset is set to the fixed map address or 0) before determining PTE size. Then use map_offset instead of offset_align for computing the PTE size since offset_align could be either an alignment ora fixed mapping offset. Also is the minimum of the buffer size and the buffer alignment for computing page size. This is necessary is the GMMU is doing page gathering (i.e the buffer does not appear as a continguous IOMMU range to the GPU). Is such cases a large page sized buffer may be made up of a bunch of discontiguous 4k pages. Bug 1396644 Bug 1729947 Change-Id: I6464ee6a4ccab2495ccb31cd1ddf1db467d2b215 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1271359 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-31 16:23:07 -08:00
Shardar Shariff Md	a470647ad7	gpu: nvgpu: use soc/tegra/chip-id.h for soc header The soc tegra headers are unified and moved all the content of linux/tegra-soc.h to the soc/tegra/chip-id.h to have the single soc header for Tegra. Change-Id: I281e19dd3eb1538b8dfbea4eb0779fb64d1fcffa Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com> Reviewed-on: http://git-master/r/1288365 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-20 08:24:01 -08:00
Alex Waterman	6e2237ef62	gpu: nvgpu: Use timer API in gk20a code Use the timers API in the gk20a code instead of Linux specific API calls. This also changes the behavior of several functions to wait for the full timeout for each operation that can timeout. Previously the timeout was shared across each operation. Bug 1799159 Change-Id: I2bbed54630667b2b879b56a63a853266afc1e5d8 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1273826 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-01-18 16:46:33 -08:00
Alex Waterman	b928f10d37	gpu: nvgpu: Start re-organizing the HW headers Reorganize the HW headers of gk20a. The headers are moved to a new directory: include/nvgpu/hw/gk20a And from the code are included like so: #include <nvgpu/hw/gk20a/hw_pwr_gk20a.h> This is the first step in reorganizing all of the HW headers for gm20b, gm206, etc. This is part of a larger effort to re-structure and make the driver more readable and scalable. Bug 1799159 Change-Id: Ic151155cbc2e6f75009f2d9d597b364a1bed2c4c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1244790 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-01-11 12:44:14 -08:00
Alex Waterman	6df3992b60	gpu: nvgpu: Move allocators to common/mm/ Move the GPU allocators to common/mm/ since the allocators are common code across all GPUs. Also rename the allocator code to move away from gk20a_ prefixed structs and functions. This caused one issue with the nvgpu_alloc() and nvgpu_free() functions. There was a function for allocating either with kmalloc() or vmalloc() depending on the size of the allocation. Those have now been renamed to nvgpu_kalloc() and nvgpu_kfree(). Bug 1799159 Change-Id: Iddda92c013612bcb209847084ec85b8953002fa5 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1274400 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2017-01-09 12:33:16 -08:00
Alex Waterman	91d977ced4	gpu: nvgpu: Misc fixes for crashes on shutdown Fix miscellaneous issues seen during driver shutdown. o Make sure pointers are valid before accessing them. o Busy the GPU during channel timeout. o Cancel delayed work on channels. o Avoid access to channels that may have been freed. Bug 1816516 Bug 1807277 Change-Id: I62df40373fdfb1c4a011364e8c435176a08a7a96 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1250026 (cherry picked from commit 64a95fc96c8ef7c5af9c53c4bb3402626e0d2f60) Reviewed-on: http://git-master/r/1274474 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2017-01-04 15:53:55 -08:00
Konsta Holtta	92fe000749	gpu: nvgpu: rename timeout_check to timeout_expired Change "check" to "expired" in nvgpu_timeout_check* and append _expired to nvgpu_timeout_peek to clarify what the boolean-like return value means and thus avoid bugs. Bug 200260715 Change-Id: I47e097ee922e856005a79fa9e27eddb1c8d77f8b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1269366 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-12-19 15:40:36 -08:00
Terje Bergstrom	a4731b3282	gpu: nvgpu: Use end of vidmem as bootstrap region Instead of hard coding bootstrap region, it should always be set to the last 256MB of vidmem. Bug 200244445 Change-Id: I91779d1bf861f4f23a0b646f70b1febbbc4581b5 Signed-off-by: David Nieto <dmartineznie@nvidia.com> Reviewed-on: http://git-master/r/1242409 Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-on: http://git-master/r/1267124 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-12-09 15:05:12 -08:00
Konsta Holtta	54810444d2	gpu: nvgpu: fix timeout retry usage in mm_gk20a.c Loop conditions of timeout checking introduced in commit `2109478311` ("gpu: nvgpu: Use timeout retry API in mm_gk20a.c") were flipped by accident, so each usage in a loop actually did not wait enough but ran only one iteration. Fix the conditions to loop as long as the timeout is NOT expired. Also restore l2 flush timeout to 10 ms from 1, which was done in commit `030ef82bdd` ("gpu: nvgpu: increase l2 flush timeout") but overwritten by the above "use timeout" commit. Bug 200260715 Change-Id: I0db16be79a1a27caa3d97fac9d4361582cc232e8 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1268482 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-12-09 13:44:53 -08:00
Alex Waterman	2109478311	gpu: nvgpu: Use timeout retry API in mm_gk20a.c Use the retry API that is part of the nvgpu timeout API to keep track of retry attempts in the vrious flushing and invalidating operations in the MM code. Bug 1799159 Change-Id: I36e98d37183a13d7c3183262629f8569f64fe4d7 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1255866 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-12-05 16:16:24 -08:00
Krishna Reddy	89e707d2b4	gpu: nvgpu: fix hardcoded page size reference Bug 1843356 Bug 1769772 Change-Id: I6c2a3a72f7082074bbf1165a74d5070195e1e653 Signed-off-by: Krishna Reddy <vdumpa@nvidia.com> Reviewed-on: http://git-master/r/1258352 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-12-05 13:49:17 -08:00
seshendra Gadagottu	030ef82bdd	gpu: nvgpu: increase l2 flush timeout Under heavy throttling case gpu runs 8 times slower. This is making l2 flush to timeout, to avoid this increase timeout to 10msec from 1msec. Bug 1787261 Change-Id: Ia112ce968c136135ccb9df4a7364073103684403 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1216559 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-12-02 05:37:03 -08:00
Deepak Nibade	af5d2d208a	gpu: nvgpu: API to access fb memory Add IOCTL API NVGPU_DBG_GPU_IOCTL_ACCESS_FB_MEMORY to read/write fb/vidmem memory Interface will accept dmabuf_fd of the buffer in vidmem, offset into the buffer to access, temporary buffer to copy data across API, size of read/write and command indicating either read or write operation API will first parse all the inputs, and then call gk20a_vidbuf_access_memory() to complete fb access gk20a_vidbuf_access_memory() will then just use gk20a_mem_rd_n() or gk20a_mem_wr_n() depending on the command issued Bug 1804714 Jira DNVGPU-192 Change-Id: Iba3c42410abe12c2884d3b603fa33d27782e4c56 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1255556 (cherry picked from commit 2c49a8a79d93fc526adbf6f808484fa9a3fa2498) Reviewed-on: http://git-master/r/1260471 GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>	2016-11-30 02:18:17 -08:00
seshendra Gadagottu	ef95e43d97	gpu: nvgpu: chip specific init_inst_block Add function pointer to add chip specific init_inst_block. Update this function pointer for gk20a and gm20b. JIRA GV11B-21 Change-Id: I74ca6a8b4d5d1ed36f7b25b7f62361c2789b9540 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1254875 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-11-21 08:50:45 -08:00
Terje Bergstrom	d29afd2c9e	gpu: nvgpu: Fix signed comparison bugs Fix small problems related to signed versus unsigned comparisons throughout the driver. Bump up the warning level to prevent such problems from occuring in future. Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1252068 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-11-16 21:35:36 -08:00
Alex Waterman	2fa54c94a6	gpu: nvgpu: Remove global debugfs variable Remove a global debugfs variable and instead save the allocator debugfs root node in the gk20a struct. Bug 1799159 Change-Id: If4eed34fa24775e962001e34840b334658f2321c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225611 (cherry picked from commit 1908fde10bb1fb60ce898ea329f5a441a3e4297a) Reviewed-on: http://git-master/r/1242390 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-10-26 11:10:01 -07:00
Alex Waterman	93eea1d729	gpu: nvgpu: Move CE cleanup Move the CE cleanup to before the FIFO cleanup. Since the CE closes a channel during its cleanup the FIFO needs to be initialized since the FIFO code maintains the vmalloc()'ed channels. Bug 1816516 Change-Id: Ia7a97059a12a0c2b52368ffe411e597f803e8e6e Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225613 (cherry picked from commit 707bd2a6d4672c6a7b7a8b2e581ea3a606ed971d) Reviewed-on: http://git-master/r/1240106 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-10-26 11:09:57 -07:00
Alex Waterman	f0fe1c2f02	gpu: nvgpu: Only cleanup existing semaphore pools Not all VMs have semaphore pools made for them even when semaphores are going to be used. Thus only VMs with existing semaphore pools should have their pools cleaned up. Bug 1816516 Change-Id: I07828708faef451f1711f58c0d5b3f8e4d296dd0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225612 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> (cherry picked from commit 6cdb7b6650765465dca68dc3c23b3d795ccdafb5) Reviewed-on: http://git-master/r/1240105 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-10-26 11:09:56 -07:00
Alex Waterman	fc4f0ddddb	gpu: nvgpu: SLAB allocation for page allocator Add the ability to do "SLAB" allocation in the page allocator. This is generally useful since the allocator manages 64K pages but often we only need 4k chunks (for example when allocating memory for page table entries). Bug 1799159 JIRA DNVGPU-100 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225322 (cherry picked from commit 299a5639243e44be504391d9155b4ae17d914aa2) Change-Id: Ib3a8558d40ba16bd3a413f4fd38b146beaa3c66b Reviewed-on: http://git-master/r/1227924 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-18 12:24:33 -07:00
Konsta Holtta	70e0462861	gpu: nvgpu: skip pramin barriers for page tables Page table updates have an explicit write barrier at the end of a pte update operation in update_gmmu_ptes_locked(), so the per-wr32 wmb()s are not necessary. Jira DNVGPU-23 Change-Id: I2e2596f0900d840fadb369ee1261c5e2305f2070 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225150 (cherry picked from commit 6664a667ea326e9663a6b502765f858d8669f4d9) Reviewed-on: http://git-master/r/1227475 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-17 14:39:10 -07:00
Konsta Holtta	a8e260bc8d	gpu: nvgpu: allow skipping pramin barriers A wmb() next to each gk20a_mem_wr32() via PRAMIN may be overly careful, so support not inserting these barriers for performance, in cases where they are not necessary, where the caller would do an explicit barrier after a bunch of reads. Also, move those optional wmb()s to be done at the end of the whole internally batched write for gk20a_mem_{wr_n,memset} from the per-batch subloops that may run multiple times. Jira DNVGPU-23 Change-Id: I61ee65418335863110bca6f036b2e883b048c5c2 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225149 (cherry picked from commit d2c40327d1995f76e8ab9cb4cd8c76407dabc6de) Reviewed-on: http://git-master/r/1227474 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-17 14:39:00 -07:00
Alex Waterman	718af968f0	gpu: nvgpu: Fix wmb() order in pramin access wmb() should come after the writes to ensure that the writes have completed before progressing. Bug 1811382 Change-Id: I98fba317b1760240c0b5de531accf398fe69c9b3 Signed-off-by: Alex Waterman <alexw@nvidia.com> (cherry picked from commit 1b1201b9c109061590e6e25260d7230ae2c89888) Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225251 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-17 14:38:50 -07:00
Konsta Holtta	fa6ab1943e	gpu: nvgpu: add ioctl for querying memory state Add NVGPU_GPU_IOCTL_GET_MEMORY_STATE to read the amount of free device-local video memory, if applicable. Some reserved fields are added to support different types of queries in the future (e.g. context-local free amount). Bug 1787771 Bug 200233138 Change-Id: Id5ffd02ad4d6ed3a6dc196541938573c27b340ac Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1223762 (cherry picked from commit 96221d96c7972c6387944603e974f7639d6dbe70) Reviewed-on: http://git-master/r/1235980 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-14 08:12:34 -07:00
Konsta Holtta	a4e4466a0c	gpu: nvgpu: track pending bytes for vidmem clears Change clears_pending to bytes_pending and track accordingly the number of bytes to be freed instead of the number of buffers. This, atomically combined with the amount of space in the allocator, is the total amount of free memory available. Bug 200233138 Change-Id: Ibbb4e80a32728781ba19a74307d8a8ac1a4d7431 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1231422 (cherry picked from commit 025e765f312c253b201ecf2dbbe0f4972fe1d4bc) Reviewed-on: http://git-master/r/1235957 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-14 08:12:17 -07:00
Konsta Holtta	8728da1c6e	gpu: nvgpu: compact pte buffers The lowest page table level may hold very few entries for mappings of large pages, but a new page is allocated for each list of entries at the lowest level, wasting memory and performance. Compact these so that the new "allocation" of ptes is appended at the end of the previous allocation, if there is space. 4 KB page is still the smallest size requested from the allocator; any possible overhead in the allocator (e.g., internally allocating big pages only) is not taken into account. Bug 1736604 Change-Id: I03fb795cbc06c869fcf5f1b92def89a04583ee83 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1221841 (cherry picked from commit fa92017ed48e1d5f48c1a12c512641c6ce9924af) Reviewed-on: http://git-master/r/1234996 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-13 08:09:16 -07:00
Konsta Holtta	f5069622bb	gpu: nvgpu: make sure vidmem is cleared only once Protect the initial vidmem zeroing performed during the first userspace alloc with a mutex, so that it blocks next concurrent users and is run only once. Otherwise, multiple clears could end up running in parallel, so that the next ones corrupt memory allocated by the thread that has finished earlier and advanced to allocate and use memory. Jira DNVGPU-84 Change-Id: If497749abf481b230835250191d011c4a9d1483b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1232461 (cherry picked from commit 79435a68e6d2713b78acdb0ec6f77cfd78651d7f) Reviewed-on: http://git-master/r/1234990 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-10-12 13:02:13 -07:00
seshendra Gadagottu	fda4ddfa79	gpu: nvgpu: userd allocation from sysmem When bar1 memory is not supported then userd will be allocated from sysmem. Functions gp_get and gp_put are updated accordingly. JIRA GV11B-1 Change-Id: Ia895712a110f6cca26474228141488f5f8ace756 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1225384 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-11 09:16:03 -07:00
Konsta Holtta	0bb1f45958	gpu: nvgpu: optimize barrier in batch pramin writes Move wmb() before the loop in pramin-accessed batch writes and use writel_relaxed() directly, instead of calling gk20a_writel() that would do wmb() on each iteration separately. Jira DNVGPU-24 Change-Id: I4c1375a819266727f97e2f109d3132b5b0974ac6 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1213600 (cherry picked from commit 79e3e38e0c5384ababfd55b8e6cd9723eb8f7b66) Reviewed-on: http://git-master/r/1184343 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-10-06 23:34:57 -07:00
Deepak Nibade	5c049b5c79	gpu: nvgpu: fix allocation and map size mismatch while mapping It is possible to allocate larger size than user requested e.g. If we allocate at 64k granularity, and user asks for 32k buffer, we end up allocating 64k chunk. User still asks to map the buffer with size 32k and hence we reserve mapping addresses only for 32k But due to bug in mapping in update_gmmu_ptes_locked() we end up creating mappings considering size of 64k and corrupt some mappings Fix this by considering min(chunk->length, map_size) while mapping address range for a chunk Also, map_size will be zero once we map all requested address range. So bail out from the loop if map_size is zero Bug 1805064 Change-Id: I125d3ce261684dce7e679f9cb39198664f8937c4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1217755 (cherry picked from commit 3ee1c6bc0718fb8dd9a28a37eff43a2872bdd5c0) Reviewed-on: http://git-master/r/1221775 GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>	2016-09-26 03:33:49 -07:00
Alex Waterman	f361f89b12	gpu: nvgpu: Use actual carveouts for WPR region Use a carveout for the WPR region in the VIDMEM. Jira DNVGPU-84 Change-Id: I191ecc3bb317ae3af6b56f5970194e646c513964 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1208527 (cherry picked from commit 7edf74d7468dcff1f01cbd901d83aa0e32602f0e) Reviewed-on: http://git-master/r/1223455 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>	2016-09-20 14:56:43 -07:00
Konsta Holtta	919b642214	gpu: nvgpu: log page table addr only for sysmem Don't attempt to use get_iova_addr() on vidmem which does not make sense. Jira DNVGPU-20 Change-Id: Ibfe1516b88ed8b60b8134c330e6b0569d52cbb5b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1217077 (cherry picked from commit c912f0349d24fde033dbcd9874948ff14ad89a43) Reviewed-on: http://git-master/r/1221264 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-09-15 21:58:53 -07:00
Deepak Nibade	f919aab509	gpu: nvgpu: add safety for vidmem addresses Add new API set_vidmem_page_alloc() which sets BIT(0) in sg_dma_address() only for vidmem allocation Add and use new API get_vidmem_page_alloc() which receives scatterlist and returns pointer to vidmem allocation i.e. struct gk20a_page_alloc *alloc In this API, check if BIT(0) is set or not in sg_dma_address() before converting it to allocation address In gk20a_mm_smmu_vaddr_translate(), ensure that the address is pure IOVA address by verifying that BIT(0) is not set in that address Jira DNVGPU-22 Change-Id: Ib53ff4b63ac59a8d870bc01d0af59839c6143334 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1216142 (cherry picked from commit 03c9fbdaa40746dc43335cd8fbe9f97ef2ef50c9) Reviewed-on: http://git-master/r/1219705 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-09-15 12:24:45 -07:00
Deepak Nibade	f036594190	gpu: nvgpu: move all instance blocks to vidmem Use gk20a_gmmu_alloc() in gk20a_alloc_inst_block() so that we always try to allocate all inst blocks in vidmem first Also use common API gk20a_alloc_inst_block() in channel_gk20a_alloc_inst() as well Jira DNVGPU-22 Change-Id: I6c47c19aae1189d7e57f47a51d21a32e2df53c1f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1216140 (cherry picked from commit 6c84961a50eb8a8b080b2db08f87e58143f5a6e8) Reviewed-on: http://git-master/r/1219704 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-09-15 12:24:38 -07:00
Konsta Holtta	97512aecb6	gpu: nvgpu: fix inst block leak for vidmem Test for size, not cpu_va, to check for buffer validity before attempting to free. Jira DNVGPU-22 Change-Id: I416c0963bf4e1819aa2f8d200c69a2d989524f83 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1215575 (cherry picked from commit ce0077feca55bfb5665c82972598a075abd8f2a0) Reviewed-on: http://git-master/r/1219702 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-09-15 12:24:24 -07:00
Deepak Nibade	130dd3dd3e	gpu: nvgpu: use get_base_addr() to get inst_block address Since inst_block could reside either in sysmem or vidmem, use gk20a_mem_get_base_addr() to get it's base address Jira DNVGPU-22 Change-Id: Ic9b4370e0a88b585483e78ea81df0ec6ff799487 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1212702 (cherry picked from commit ecdffa7664f48dba0bcbd15b1340af5bf3b45802) Reviewed-on: http://git-master/r/1219700 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2016-09-15 12:24:10 -07:00

1 2 3 4 5 ...

254 Commits