gpu: nvgpu: pd_cache enablement for >4k allocations in qnx

Mapping of large buffers to GMMU end up needing many pages for the PTE tables. Allocating these one by one can end up being a performance bottleneck, particularly in the virtualized case. This is adding the following changes: - As the TLB invalidation doesn't have access to mem_off, allow top-level allocation by alloc_cache_direct(). - Define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab for the PD cache, effectively set to 64K bytes - Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE When freeing up cached entries, avoid prefetch errors by invalidating the entry (memset to 0). - Try to fall back to direct allocation of smaller chunk for contiguous allocation failures. - Unit test changes. Bug 200649243 Change-Id: I0a667af0ba01d9147c703e64fc970880e52a8fbc Signed-off-by: dt <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2404371 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2025-12-23 09:57:08 +03:00 · 2020-08-26 16:25:36 -07:00
parent 94bc3a8135
commit a331fd4b3a
16 changed files with 122 additions and 22 deletions
--- a/userspace/units/mm/gmmu/pd_cache/pd_cache.c
+++ b/userspace/units/mm/gmmu/pd_cache/pd_cache.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
@@ -321,7 +321,7 @@ int test_pd_free_empty_pd(struct unit_module *m, struct gk20a *g,

 	/* And now direct frees. */
 	memset(&pd, 0U, sizeof(pd));
-	err = nvgpu_pd_alloc(&vm, &pd, PAGE_SIZE);
+	err = nvgpu_pd_alloc(&vm, &pd, NVGPU_PD_CACHE_SIZE);
 	if (err != 0) {
 		unit_return_fail(m, "PD alloc failed");
 	}
@@ -610,7 +610,7 @@ static int do_test_pd_cache_packing_size(struct unit_module *m, struct gk20a *g,
 {
 	int err;
 	u32 i;
-	u32 n = PAGE_SIZE / pd_size;
+	u32 n = NVGPU_PD_CACHE_SIZE / pd_size;
 	struct nvgpu_gmmu_pd pds[n], pd;
 	struct nvgpu_posix_fault_inj *dma_fi =
 		nvgpu_dma_alloc_get_fault_injection();
@@ -667,7 +667,7 @@ static int do_test_pd_reusability(struct unit_module *m, struct gk20a *g,
 {
 	int err = UNIT_SUCCESS;
 	u32 i;
-	u32 n = PAGE_SIZE / pd_size;
+	u32 n = NVGPU_PD_CACHE_SIZE / pd_size;
 	struct nvgpu_gmmu_pd pds[n];
 	struct nvgpu_posix_fault_inj *dma_fi =
 		nvgpu_dma_alloc_get_fault_injection();