gpu: nvgpu: fix l2_flush errors during rmmod

The function gk20a_mm_l2_flush incorrectly returns an error value when it skips l2_flush when hardware is powered off. This causes the following prints to occur even when the behavior is expected. gv11b_mm_l2_flush:43 [ERR] gk20a_mm_l2_flush failed nvgpu_gmmu_unmap_locked:1043 [ERR] gk20a_mm_l2_flush[1] failed The above errors occur from the following paths 1) gk20a_remove -> gk20a_free_cb -> gk20a_remove_support -> nvgpu_pmu_remove_support -> nvgpu_pmu_pg_deinit -> nvgpu_dma_unmap_free 2) gk20a_remove -> gk20a_free_cb -> gk20a_remove_support -> nvgpu_remove_mm_support -> gv11b_mm_mmu_fault_info_mem_destroy -> nvgpu_dma_unmap_free Since, these do not belong in the Poweron/Poweroff path, its okay to skip flushing them when the hardware has powered off. Fixed the userspace tests by allocating g->mm.bar1.vm to prevent NULL access in gv11b_mm_l2_flush->tlb_invalidate. Jira LS-77 Change-Id: I3ca71f5118daf4b2eeacfe5bf83d94317f29d446 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2523751 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit
2025-12-22 09:12:24 +03:00 · 2021-05-03 16:33:00 +05:30
parent 74deaae0bf
commit 096f4ef055
10 changed files with 116 additions and 14 deletions
--- a/userspace/units/mm/gmmu/page_table/page_table.c
+++ b/userspace/units/mm/gmmu/page_table/page_table.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
@@ -350,6 +350,19 @@ static int init_mm(struct unit_module *m, struct gk20a *g)
 	g->ops.mm.get_default_va_sizes(NULL, &mm->channel.user_size,
 		&mm->channel.kernel_size);

+	mm->bar1.aperture_size = bar1_aperture_size_mb_gk20a() << 20;
+	mm->bar1.vm = nvgpu_vm_init(g,
+			g->ops.mm.gmmu.get_default_big_page_size(),
+			low_hole,
+			0ULL,
+			nvgpu_safe_sub_u64(mm->bar1.aperture_size, low_hole),
+			0ULL,
+			true, false, false,
+			"bar1");
+	if (mm->bar1.vm == NULL) {
+		unit_return_fail(m, "nvgpu_vm_init failed\n");
+	}
+
 	mm->pmu.vm = nvgpu_vm_init(g,
 				   g->ops.mm.gmmu.get_default_big_page_size(),
 				   low_hole,
@@ -399,6 +412,7 @@ int test_nvgpu_gmmu_clean(struct unit_module *m, struct gk20a *g, void *args)
 {
 	g->log_mask = 0;
 	nvgpu_vm_put(g->mm.pmu.vm);
+	nvgpu_vm_put(g->mm.bar1.vm);

 	return UNIT_SUCCESS;
 }