video: tegra: nvmap: Add multithreaded cache flush support

On TOT, NvMap does page by page cache flush i.e. it takes virtual
address of each page present in the buffer and then perform cache
flush on it using dcache_by_line_op. This result in very poor
performance for larger buffers. ~70% of the time taken by
NvRmMemHandleAllocAttr is consumed in cache flush.
Address this perf issue using multithreaded cache flush
- Use a threshold value of 32768 pages which is derived from perf
experiments and as per discussion with cuda as per usecases.
- When the cache flush request of >= 32768 pages is made, then vmap
pages to map them in contiguous VA space and create n number of kernel
threads; where n indicate the number of online CPUs.
- Divide the above VA range among the threads and each thread would do
cache flush on the VA range assigned to it.

This logic in resulting into following % improvement for alloc tests.
-----------------------------------
Buffer Size in MB | % improvement |
----------------------------------|
128               |   52          |
256               |   56          |
512               |   57          |
1024              |   58          |
1536              |   57          |
2048              |   58          |
2560              |   57          |
3072              |   58          |
3584              |   58          |
4096              |   58          |
4608              |   58          |
5120              |   58          |
-----------------------------------

Bug 4628529

Change-Id: I803ef5245ff9283fdc3afc497a6b642c97e89c06
Signed-off-by: Ketan Patil <ketanp@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3187871
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
GVS: buildbot_gerritrpt <buildbot_gerritrpt@nvidia.com>
This commit is contained in:
Ketan Patil
2024-08-05 04:09:13 +00:00
committed by Jon Hunter
parent a57d56284d
commit 9feb2a4347
3 changed files with 114 additions and 11 deletions

View File

@@ -1,5 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
/* SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
#ifndef __NVMAP_ALLOC_INT_H
#define __NVMAP_ALLOC_INT_H
@@ -14,6 +14,19 @@
#define NVMAP_PP_BIG_PAGE_SIZE (0x10000)
#endif /* CONFIG_ARM64_4K_PAGES */
/*
* Indicate the threshold number of pages after which
* the multithreaded cache flush will be used.
*/
#define THRESHOLD_PAGES_CACHE_FLUSH 32768
struct nvmap_cache_thread {
pid_t thread_id;
void *va_start;
size_t size;
struct task_struct *task;
};
struct dma_coherent_mem_replica {
void *virt_base;
dma_addr_t device_base;