gpu: nvgpu: decouple channel watchdog dependencies

The channel code needs the watchdog code and vice versa. Cut this
circular dependency with a few simplifications so that the watchdog
wouldn't depend on so much.

When calling watchdog APIs that cause stores or comparisons of channel
progress, provide a snapshot of the current progress instead of a whole
channel pointer. struct nvgpu_channel_wdt_state is added as an interface
for this to track gp_get and pb_get.

When periodically checking the watchdog state, make the channel code ask
whether a hang has been detected and abort the channel from within
channel code instead of asking the watchdog to abort the channel. The
debug dump verbosity flag is also moved back to the channel data.

Move the functionality to restart all channels' watchdogs to channel
code from watchdog code. Looping over active channels is not a good
feature for the watchdog; it's better for the channel handling to just
use the watchdog as a tracking tool.

Move a few unserviceable checks up in the stack to the callers of the
wdt code. They're a kludge but this will do for now and demonstrates
what needs to be eventually fixed.

This does not leave much code in the watchdog unit. Now the purpose of
the watchdog is to only isolate the logic to couple a timer and progress
snapshots with careful locking to start and stop the tracking.

Jira NVGPU-5582

Change-Id: I7c728542ff30d88b1414500210be3fbaf61e6e8a
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369820
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This commit is contained in:
Konsta Hölttä
2020-08-12 18:10:40 +03:00
committed by Alex Waterman
parent 281006ae7d
commit e8201d6ce3
6 changed files with 214 additions and 125 deletions

View File

@@ -322,7 +322,7 @@ static int gk20a_channel_set_wdt_status(struct nvgpu_channel *ch,
if (set_timeout)
nvgpu_channel_wdt_set_limit(ch->wdt, args->timeout_ms);
nvgpu_channel_wdt_set_debug_dump(ch->wdt, !disable_dump);
nvgpu_channel_set_wdt_debug_dump(ch, !disable_dump);
return 0;
#else