Account NvMap allocated memory into both RSS and CG tracking to make
efficient OOM kill decisions during memory pressure.
NvMap allocates memory via kernel APIs like alloc_pages, the kernel
memory is not accounted on behalf of process who requests the
allocation. Hence in case OOM, the OOM killer never kills the process
who has allocated memory via NvMap even though this process might be
holding most of the memory.
Solve this issue using following approach:
- Use __GFP_ACCOUNT and __GFP_NORETRY flag
- __GFP_NORETRY will not let the current allocation flow to go into OOM
path, so that it will never trigger OOM.
- __GFP_ACCOUNT causes the allocation to be accounted to kmemcg. So any
allocation done by NvMap will be definitely accounted to kmemcg and
cgroups can be used to define memory limits.
- Add RSS counting for the process which allocates by NvMap, so that OOM
score for that process will get updated and OOM killer can pick this
process based upon the OOM score.
- Every process that has a reference to NvMap Handle would have the
memory size accounted into its RSS. On releasing the reference to
handle, the RSS would be reduced.
Bug 5222690
Change-Id: I3fa9b76ec9fc8d7f805111cb96e11e2ab1db42ce
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3427871
(cherry picked from commit 2ca91098aa)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3453101
(cherry picked from commit cfe6242c8c)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3454642
Reviewed-by: Amulya Yarlagadda <ayarlagadda@nvidia.com>
Tested-by: Amulya Yarlagadda <ayarlagadda@nvidia.com>
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
Consider the following execution scenario, it can lead to a data race
Thread 1 Thread 2
-------------------------------------------------------
NvRmMemHandleAllocAttr
NvRmMemGetFd
NvRmMemHandleFromFd close(fd), NvRmMemHandleFree
When thread 1 is executing NvRmMemHandleFromFd while thread 2 is closing
the fd and freeing the handle, then following sequence will lead to
accessing a freed dma-buf and may lead to dereference issue.
- TH1: Get handle from fd and increment handle's refcount.
- TH2: Close the fd and start execution of NvRmMemHandleFree.
- TH2: Decrement ref's dup count, it will become 0, hence ref would be
freed and dma-buf as well, but as handle's refcount is incremented in
step 1, handle won't be freed.
- TH1: Resume HandleFromFd part, call to nvmap_duplicate_handle. Ref is
already freed, so generate new ref and increment dma-buf's count but as
dma-buf is freed already, accessing dma-buf will lead to dereferene
issue. Hence, we need to add a null check here and return error value in
such scenario.
Also, add check for return value of nvmap_handle_get, at missing places.
Bug 4214453
Change-Id: Ib6ef66b4a7126bef2ed1dbb48643445a4ded1bab
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2994962
(cherry picked from commit 7ba6a3abfe65f7cffeeb4004cf0868468121fc32)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2994440
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add more checks in nvmap code so as to avoid any possible races.
- Update is_nvmap_id_ro and is_nvmap_dmabuf_fd_ro functions so that they
return error value during error conditions and also update their callers
to handle those error values.
- Move all trace statements from end of the function to before handle
refcount or dup count is decremented, this make sure we are not
dereferencing any freed handle/reference/dambuf.
- Increment ref's dup count wherever we feel data race is possible, and
decrement it accordingly towards end of function.
Bug 4253911
Change-Id: I50fc7cc98ebbf3c50025bc2f9ca32882138fb272
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2972602
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add support for Serial Id feature which will be used by Nsight for
buffer tracking purpose. This feature expects a unique serial id per
buffer even if it is shared across multiple client processes.
Add following code:
- Create a new global counter field for serial id in nvmap device.
Initialize it to 0 when nvmap device is initialized.
- Introduce a new field for serial_id in nvmap_handle struct.
- When nvmap_handle is created, assign it's serial_id field with global
counter's value, and increment global counter.
- During NvRmMemQueryHandleParameters return this serial_id associated
with the handle.
- Do not decrement counter for serial_id even after freeing the handle.
Bug 4138373
Change-Id: Ic1fe22b082eefb352986f8fa44d4c38d186a366f
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2918510
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
When one process is trying to duplicate RO handle while other process is
trying to free the same RO handle, then race can occur and second
process can decrement the dma-buf's refcount and it may reach to 0. The
first process can then call get_dma_buf on it, leading to NULL pointer
dereference and ultimately to kernel panic. Fix this by taking an extra
dma-buf refcount before duplicating the handle and then decrease it once
duplication is completed.
Bug 3991243
Change-Id: I99901ce19d8a5d23c5192cb10a17efd2ebaf9d3a
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2865519
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
There is a potential data race for RO dma-buf in the following scenario:
------------------------------------------------------------------
Process 1 | Process 2 | Process 3 |
------------------------------------------------------------------|
AllocAttr handle H1 | | |
MemMap (H1) | | |
AllocAttr(H2) | | |
MemMap(H2) | | |
id1 = GetSciIpcId(H1)| | |
id2 = GetSciIpcId(H2)|H3=HandleFromSciIpcId | |
id3 = GetSciIpcId(H1)| (id1, RO) |H4=HandleFromSciIpcId|
MemUnmap(H2) |QueryHandlePararms(H3)|(id2, RO) |
MemUnmap(H1) |MemMap(H3) |QueryHandleParams(H4)|
HandleFree(H2) |MemUnmap(H3) |MemMap(H4) |
HandleFree(H1) |HandleFree(H3) |H5=HandleFromSciIpcId|
| |(id3, RO) |
| |QueryHandleParams(H5)|
| |MemMap(H5) |
| |MemUnmap(H4) |
| |MemUnmap(H5) |
| |HandleFree(H4) |
| |HandleFree(H5) |
-------------------------------------------------------------------
The race is happening between the HandleFree(H3) in process 2 and
HandleFromSciIpcId(id3, RO) in process 3. Process 2 tries to free the
H3, and function nvmap_free_handle decrements the RO dma-buf's counter,
so that it reaches 0, but nvmap_dmabuf_release is not called immediately
because of which the process 3 get's false value for the following check
if (is_ro && h->dmabuf_ro == NULL)
It results in calling nvmap_duplicate_handle and then meanwhile function
nvmap_dmabuf_release is called and it makes h->dmabuf_ro to NULL. Hence
get_dma_buf fails with null pointer dereference error.
Fix this issue with following approach:
- Before using dmabuf_ro, take the handle->lock, then check if it is not
NULL.
- If not NULL, then call get_file_rcu on the file associated with RO
dma-buf and check return value.
- If return value is false, then dma-buf's ref counter is zero and it is
going away. So wait until dmabuf_ro is set to NULL; and then create a
new dma-buf for RO.
- Otherwise, use the existing RO dma-buf and decrement the refcount
taken with get_file_rcu.
Bug 3741751
Change-Id: I8987efebc476a794b240ca968b7915b4263ba664
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2850394
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>