gpu: nvgpu: userd slab allocator

We had to force allocation of physically contiguous memory for USERD in nvlink case, as a channel's USERD address is computed as an offset from fifo->userd address, and nvlink bypasses SMMU. With 4096 channels, it can become difficult to allocate 2MB of physically contiguous sysmem for USERD on a busy system. PBDMA does not require any sort of packing or contiguous USERD allocation, as each channel has a direct pointer to that channel's 512B USERD region. When BAR1 is supported we only need the GPU VAs to be contiguous, to setup the BAR1 inst block. - Add slab allocator for USERD. - Slabs are allocated in SYSMEM, using PAGE_SIZE for slab size. - Contiguous channels share the same page (16 channels per slab). - ch->userd_mem points to related nvgpu_mem descriptor - ch->userd_offset is the offset from the beginning of the slab - Pre-allocate GPU VAs for the whole BAR1 - Add g->ops.mm.bar1_map() method - gk20a_mm_bar1_map() uses fixed mapping in BAR1 region - vgpu_mm_bar1_map() passes the offset in TEGRA_VGPU_CMD_MAP_BAR1 - TEGRA_VGPU_CMD_MAP_BAR1 is called for each slab. Bug 2422486 Bug 200474793 Change-Id: I202699fe55a454c1fc6d969e7b6196a46256d704 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/1959032 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2025-12-24 02:22:34 +03:00 · 2018-11-20 16:34:21 -08:00
parent 0eb555c5c0
commit 7e68e5c83d
22 changed files with 264 additions and 147 deletions
--- a/drivers/gpu/nvgpu/os/linux/vgpu/vgpu_linux.c
+++ b/drivers/gpu/nvgpu/os/linux/vgpu/vgpu_linux.c
@@ -507,10 +507,11 @@ int vgpu_remove(struct platform_device *pdev)

 bool vgpu_is_reduced_bar1(struct gk20a *g)
 {
-	struct fifo_gk20a *f = &g->fifo;
 	struct nvgpu_os_linux *l = nvgpu_os_linux_from_gk20a(g);
+	struct fifo_gk20a *f = &g->fifo;
+	u32 size = f->num_channels * f->userd_entry_size;

-	return resource_size(l->bar1_mem) == (resource_size_t)f->userd.size;
+	return resource_size(l->bar1_mem) == size;
 }

 int vgpu_tegra_suspend(struct device *dev)