gpu: nvgpu: ga10b: Correct VAB implementation

This patch performs the following improvements for VAB:
1) It avoids an infinite loop when collecting VAB information.
   Previously, nvgpu incorrectly assumed that the valid bit would
   be eventually set for the checker when polling. It may not be set
   if a VAB-related fault has occurred.
2) It handles the VAB_ERROR mmu fault which may be caused for various
   reasons: invalid vab buffer address, tracking in protected mode,
   etc. The recovery sequence is to set the vab buffer size to 0 and
   then to the original size. This clears the VAB_ERROR bit. After
   reseting, the old register values are again set in the recovery
   code sequence.
3) Use correct number of VAB buffers. There's only one VAB buffer on
   ga10b, not two.
4) Simplify logic.

Bug 3374805
Bug 3465734
Bug 3473147

Change-Id: I716f460ef37cb848ddc56a64c6f83024c4bb9811
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621290
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This commit is contained in:
Martin Radev
2021-10-27 16:12:12 +03:00
committed by mobile promotions
parent b2db6a8453
commit b67a3cd053
13 changed files with 268 additions and 155 deletions

View File

@@ -818,7 +818,7 @@ static int nvgpu_prof_ioctl_vab_flush(struct nvgpu_profiler_object *prof,
{
int err;
struct gk20a *g = prof->g;
u64 *user_data = nvgpu_kzalloc(g, arg->buffer_size);
u8 *user_data = nvgpu_kzalloc(g, arg->buffer_size);
err = gk20a_busy(g);
if (err != 0) {