Commit Graph

10 Commits

Author SHA1 Message Date
Alex Waterman
fba96fdc09 gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device
Delete the struct nvgpu_engine_info as it's essentially identical to
struct nvgpu_device. Duplicating data structures is not ideal as it's
terribly confusing what does what.

Update all uses of nvgpu_engine_info to use struct nvgpu_device. This
is often a fairly straight forward replacement. Couple of places though
where things got interesting:

  - The enum_type that engine_info uses is defined in engines.h and
    has a bit of SW abstraction - in particular the GRCE type. The only
    place this seemed to be actually relevant (the IOCTL providing device
    info to userspace) the GRCE engines can be worked out by comparing
    runlist ID.
  - Addition of masks based on intr_id and reset_id; those can be
    computed easily enough using BIT32() but this is an area that
    could be improved on.

This reaches into a lot of extraneous code that traverses the fifo
active engines list and dramtically simplifies this. Now, instead of
having to go through a table of engine IDs that point to the list of
all host engines, the active engine list is just a list of pointers to
valid engines. It's now trivial to do a for-all-active-engines type
loop. This could even be turned into a generic macro or otherwise
abstracted in the future.

JIRA NVGPU-5421

Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
ab2b0b5949 gpu: nvgpu: Set unserviceable flag early during RC
During recovery, we set ch->unserviceable at the end after we preempt
the TSG and reset the engines. It might be too late and user-space
might submit more work to the broken channel which is not desirable.
Move setting this unserviceable flag right at the start
of recovery sequence.
Another thread doing a submit can still read the unserviceable flag
just before it is set here, leaving that submit stuck if recovery
completes before the submit thread advances enough to set up a post
fence visible for other threads. This could be fixed with a big lock
or with a double check at the end of the submit code after the job
data has been made visible.
We still release the fences, semaphore and error notifier wait queues
at the end; so user-space would not trigger channel unbind while
channel is being recovered.

Also, change the handle_mmu_fault APIs to return void as the
debug_dump return value is not used in any of the caller APIs.

JIRA NVGPU-5843

Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
359fc24aaf gpu: nvgpu: Rework engine management to work with vGPU
Currently the vGPU engine management rewrites a lot of the common
device agnostic engine management code.

With the new top HAL parsing one device at a time, it is now more
easily possible to tie the vGPU into the new common device framework
by implementing the top HAL but with the vGPU engine list backend.

This lets the vGPU inherit all the common engine and device
management code. By doing so the vGPU HAL need only implement a
trivial and simple HAL.

This also gets us a step closer to merging all of the CE init
code: logically it just iterates through all CE engines whatever
they may be. The only reason this differs between chips is because
of the swap from CE0-2 to LCEs in the Pascal generation. This could
be abstracted by the unit code easily enough.

Also, the pbdma_id for each engine has to be added to the device
struct. Eventually this was going to happen anyway, since the
device struct will soon replace the nvgpu_engine_info struct.
It's a little bit of an abuse but might be worth it long term. If
not, it should not be difficult to replace uses of dev->pbdma_id
with a proper lookup of PBDMA ID based on the device info.

JIRA NVGPU-5421

Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
194fac7f3c gpu: nvgpu: Remove clutter in engine code
Remove the get_mask_on_id() HAL and replace it's usage with the
global nvgpu_engine_get_mask_on_id() function. There's no need
to have this function as a HAL.

JIRA NVGPU-5420

Change-Id: I4fc843beff8e65806da26a0addc83fa218d390ac
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361315
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Philip Elcan
06fd513e1e gpu: nvgpu: move common.unit into common.mc
nvgpu.common.unit was just an enum used for passing to nvgpu.common.mc
APIs. So, move the enum into mc.h, and replace the include of unit.h
with mc.h where appropriate. And update the yaml arch.

JIRA NVGPU-4144

Change-Id: I210ea4d3b49cd494e43add1b52f3fbcdb020a1e3
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2216106
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Seema Khowala
39070c653f gpu: nvgpu: move FIFO_INVAL_* out of fifo_gk20a.h
Move and rename
FIFO_INVAL_ENGINE_ID -> NVGPU_INVALID_ENG_ID
FIFO_INVAL_TSG_ID -> NVGPU_INVALID_TSG_ID
FIFO_INVAL_RUNLIST_ID -> NVGPU_INVALID_RUNLIST_ID
FIFO_INVAL_SYNCPT_ID -> NVGPU_INVALID_SYNCPT_ID
FIFO_INVAL_CHANNEL_ID -> NVGPU_INVALID_CHANNEL_ID

JIRA NVGPU-2012

Change-Id: Ic4cc16ece64d85e22f16e4d28dcfd0c187bb65f3
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109011
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-02 23:40:26 -07:00
Thomas Fleury
258a6141fd gpu: nvgpu: rename runlist functions
Renamed:
- gk20a_runlist_reload -> nvgpu_runlist_reload
- gk20a_fifo_interleave_level_name -> nvgpu_runlist_interleave_level_name
- gk20a_runlist_update_for_channel -> nvgpu_runlist_update_for_channel
- nvgpu_fifo_lock_active_runlists -> nvgpu_runlist_lock_active_runlists
- nvgpu_fifo_unlock_active_runlists -> nvgpu_runlist_unlock_active_runlists
- nvgpu_fifo_get_runlists_mask -> nvgpu_runlist_get_runlists_mask
- nvgpu_fifo_unlock_runlists -> nvgpu_runlist_unlock_runlists
- gk20a_runlist_update -> nvgpu_runlist_update

Jira NVGPU-3198

Change-Id: Ifc5ad2aae546614667c174643ee07283d2716adc
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2108029
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-30 12:46:02 -07:00
Seema Khowala
60633ca551 gpu: nvgpu: move gv11b rc code to rc_gv11b.c
Move chip specific recovery code for volta onwards
architecture to hal/rc/rc_gv11b.c

Rename
fifo.teardown_ch_tsg -> fifo.recover
gk20a_runlist_update_locked -> nvgpu_runlist_update_locked

Remove
Unused h/w headers from fifo_gv11b.c

Use local variable f instead of g->fifo

JIRA NVGPU-1314

Change-Id: Ia535bbe4780e7241fdd911a8f577c6b98cf0fe53
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2102897
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-24 20:23:06 -07:00
Nicolas Benech
0435ca4eb3 gpu: nvgpu: fix MISRA 17.7 in nvgpu.common.hal.fifo.*
MISRA Rule-17.7 requires the return value of all functions to be
used. Fix is either to use the return value or change the function
to return void. This patch contains fixes for all 17.7 violations
in the following units:
- nvgpu.common.hal.fifo.runlist
- nvgpu.common.hal.fifo.fifo

JIRA NVGPU-3039

Change-Id: I9483f5cb623cfe36d6b26e41c33f124c24710c08
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2098765
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-19 19:04:05 -07:00
Seema Khowala
da9dee85e2 gpu: nvgpu: move mmu fault handling to hal/fifo
Move chip specific mmu fault handling from
fifo_gk20a.c to hal/fifo/mmu_fault_gk20a.c

Move gk20a_teardown_ch_tsg to hal/rc/rc_gk20a.c

JIRA NVGPU-1314

Change-Id: Idf88b1c312bc9f46c2508f2c63e948d71d622297
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2094051
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-18 15:56:08 -07:00