Commit Graph

16 Commits

Author SHA1 Message Date
Vedashree Vidwans
26fc64fb0b gpu: nvgpu: update common.mc function and docs
- Update documentation for common.mc and gops_mc functions.
- Rename test_setup_env and test_free_env to test_mc_setup_env and
test_mc_free_env respectively. This will make sure that mc test has
independent setup and free functions.
- Add doxygen comments for mc.enable and mc.disable.
- Modify MC unit test description.

Jira NVGPU-6240

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Change-Id: I87291ee5f90b8e3c29c475c00a78c7855de5740e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2457183
(cherry picked from commit c62ff36f87878a8a7513bef06e111117d96c61c8)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480602
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-04 11:04:15 -08:00
Vedashree Vidwans
e0dd79cd43 gpu: nvgpu: rearch mc reset and enable hals
Remove current mc hals
- mc.reset()
- mc.enable()
- mc.disable()
- mc.reset_mask()
- mc.reset_engine()
- mc.reset_engine_enable()

Add new mc hals
- mc.enable_units(g, units, enable)
  > enable/disable given unit(s)
- mc.enable_dev(g, dev, enable)
  > enable/disable engine represented by given device pointer
- mc.enable_devtype(g, devtype)
  > enable/disable all engines of given devtype

Move common mc intr functions to common/mc/mc_intr.c.
Add below common mc functions
- nvgpu_mc_reset_units(g, units)
  > reset given logical OR of nvgpu unit bitmap
- nvgpu_mc_reset_dev(g, dev)
  > reset given single engine via dev
  > if engine is graphics, reset gpcs for nvgpu_next
- nvgpu_mc_reset_devtype(g, devtype)
  > reset all engines of given devtype
  > if devtype is graphics, reset gpcs for nvgpu_next

Bug 200648985
Bug 3109773

Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fbb6a5bc1c gpu: nvgpu: Remove fifo->pbdma_map
The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists.
This array allows other software to query what PBDMA(s) serves a given
runlist. The PBDMA map is read verbatim from an array of host registers.
These registers are stored in a kmalloc()'ed array.

This causes a problem for the device management code. The device
management initialization executes well before the rest of the FIFO
PBDMA initialization occurs. Thus, if the device management code
queries the PBDMA mapping for a given device/runlist, the mapping has
yet to be populated.

In the next patches in this series the engine management code is subsumed
into the device management code. In other words the device struct is
reused by the engine management and all host SW does is pull pointers to
the host managed devices from the device manager. This means that all
engine initialization that used to be done on top of the device
management needs to move to the device code.

So, long story short, the PBDMA map needs to be read from the registers
directly, instead of an array that gets allocated long after the device
code has run.

This patch removes the pbdma map array, deletes two HALs that managed
that, and instead provides a new HAL to query this map directly from
the registers so that the device code can use it.

JIRA NVGPU-5421

Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Seema Khowala
db30ea3362 gpu: nvgpu: move mc_intr_pbus from stall (intr_0) to nonstall (intr_1) tree
Nvgpu does not support nested interrupts and as a result priv/pbus
interrupt do not reach cpu while other interrupts on intr_0 (stall)
tree are being processed. This issue is not specific to priv/pbus
but since pbus errors are critical, it is important to detect it
early on.

Below is the snippet from one of the failing logs where nvgpu
is doing recovery to process gr interrupt.
Right after GR engine is reset (PGRAPH of PMC_ENABLE), failing priv
accesses should have triggered pbus interrupt but it does not reach cpu
until gr interrupt is handled. Any interrupt that requires recovery will
take longer to finish isr as recovery is done as part of isr.
Also intr_0 (stall) interrupts are paused while stall interrupt is being
processed.

gm20b_gr_falcon_bind_instblk:147  [ERR]  arbiter idle timeout, status: badf1020
gm20b_gr_falcon_wait_for_fecs_arb_idle:125  [ERR]  arbiter idle timeout, fecs ctxsw status: 0xbadf1020

Fix to detect pbus intr while other stall interrupts are being processed
is to move pbus intr enable/disable/clear/handle to nonstall (intr_1)
tree. Configure pbus_intr_en_1 to route pbus to nostall tree.
Priv interrupts cannot be moved to nonstall (intr_1) tree due
to h/w not supporting this.

In Turing, moving pbus intr to nonstall is not feasible as mc_intr(1)
tree is deprecated. Add Turing specific stall intr handler hals with
original logic to route pbus intr to mc_intr(0).

JIRA NVGPU-25
Bug 200603566

Change-Id: I36fc376800802f20a0ea581b4f787bcc6c73ec7e
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354192
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Alex Waterman
59eb714c48 unit: Disable some unit tests for device work
Fix what unit tests can be easily fixed, but disable some others. It's
not clear why the MM related tests started failing - there's really zero
reason for this. The list of disable tests are primarily engine related
but there are some others that get inflenced by the device and engine
structure.

  test_poweroff.init_poweroff=2
  test_is_stall_and_eng_intr_pending.intr_is_stall_and_eng_intr_pending=2
  test_isr_nonstall.isr_nonstall=2
  test_isr_stall.isr_stall=2
  test_engine_enum_from_type.enum_from_type=2
  test_engine_find_busy_doing_ctxsw.find_busy_doing_ctxsw=2
  test_engine_get_active_eng_info.get_active_eng_info=2
  test_engine_get_fast_ce_runlist_id.get_fast_ce_runlist_id=2
  test_engine_get_gr_runlist_id.get_gr_runlist_id=2
  test_engine_get_mask_on_id.get_mask_on_id=2
  test_engine_get_runlist_busy_engines.get_runlist_busy_engines=2
  test_engine_ids.ids=2
  test_engine_init_info.init_info=2
  test_engine_interrupt_mask.interrupt_mask=2
  test_engine_is_valid_runlist_id.is_valid_runlist_id=2
  test_engine_mmu_fault_id.mmu_fault_id=2
  test_engine_mmu_fault_id_veid.mmu_fault_id_veid=2
  test_engine_setup_sw.setup_sw=2
  test_engine_status.status=2
  test_fifo_init_support.init_support=2
  test_fifo_remove_support.remove_support=2
  test_gp10b_engine_init_ce_info.engine_init_ce_info=2
  test_nvgpu_mem_iommu_translate.mem_iommu_translate=2
  test_nvgpu_mem_phys_ops.nvgpu_mem_phys_ops=2

And delete unit tests for functions that no longer exist:

  test_device_info_parse_enum.top_device_info_parse_enum
  test_get_device_info.top_get_device_info
  test_get_num_engine_type_entries.top_get_num_engine_type_entries
  test_is_engine_ce.top_is_engine_ce
  test_is_engine_gr.top_is_engine_gr

JIRA NVGPU-5421

Change-Id: I343c0b1ea44c472b22356c896672153fc889ffc0
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2355300
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
319520ff57 gpu: nvgpu: Add a new device manager unit
This adds a new device management unit in the common code responsible
for facilitating the parsing of the GPU top device list and providing
that info to other units in nvgpu.

The basic idea is to read this list once from HW and store it in a
set of lists corresponding to each device type (graphics, LCE, etc).
Many of the HALs in top can be deleted and instead implemented using
common code parsing the SW representation.

Every time the driver queries the device list it does so using a
device type and instance ID. This is common code. The HAL is responsible
for populating the device list in such a way that the driver can
query it in a chip agnostic manner.

Also delete some of the unit tests for functions that no longer
exist. This code will require new unit tests in time; those should be
quite simple to write once unit testing is needed.

JIRA NVGPU-5421

Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
2a3bb9107f gpu: nvgpu: rename <nvgpu/top.h> to <nvgpu/device.h>
top.h is a description of "devices" available on the GPU. As such
rename this header to device.h.

device.h will ultimately be a unit of actual C code that will rely
on the top HAL to fill a device list.

JIRA NVGPU-5421

Change-Id: If6e4a537d2209e429a678761a34713723da7a00a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319648
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
5f0fdf085c nvgpu: unit: Add new mock register framework
Many tests used various incarnations of the mock register framework.
This was based on a dump of gv11b registers. Tests that greatly
benefitted from having generally sane register values all rely
heavily on this framework.

However, every test essentially did their own thing. This was not
efficient and has caused a some issues in cleaning up the device and
host code.

Therefore introduce a much leaner and simplified register framework.
All unit tests now automatically get a good subset of the gv11b
registers auto-populated. As part of this also populate the HAL with
a nvgpu_detect_chip() call. Many tests can now _probably_ have all
their HAL init (except dummy HAL stuff) deleted. But this does
require a few fixups here and there to set HALs to NULL where tests
expect HALs to be NULL by default.

Where necessary HALs are cleared with a memset to prevent unwanted
code from executing.

Overall, this imposes a far smaller burden on tests to initialize
their environments.

Something to consider for the future, though, is how to handle
supporting multiple chips in the unit test world.

JIRA NVGPU-5422

Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Seema Khowala
aff5497907 gpu: nvgpu: add intr_unit_bitmask i/p param for fb.intr.isr
tu104 onwards, fb interrupt status/enable/disable moved from
fb_niso_intr_* reg to fb_*vector* registers.
At the top level, fb interrupt status/enable/disable is done
using hub_intr bit in mc_intr registers.

Starting nvgpu-next, this has changed.

JIRA NVGPU-5032

Change-Id: Ib54170b055b83e2696312c811c2e3ba678749359
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2330867
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sagar Kamble
57f3968cb9 gpu: nvgpu: mc: address code inspection gaps
Address following issues uncovered during inspection:
1. Remove the doxygen comment from nvgpu_wait_for_deferred_interrupts
   definition.
2. Use NVGPU_MC_INTR_STALLING instead of hardcoding the index.
3. Define doxygen groups NVGPU_MC_UNIT_ENUMS,
   NVGPU_MC_INTR_TYPE_DEFINES, NVGPU_MC_INTR_UNIT_DEFINES and
   NVGPU_MC_INTR_ENABLE_DEFINES.
4. Update the doxygen comments.
5. Fix the cleanup, typo in the description of the test
   test_wait_for_deferred_interrupts.

JIRA NVGPU-4795

Change-Id: Ifc6756832aabf9dd42ee174eb1373495e6d38c86
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287627
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
shashank singh
d34bad0a27 nvgpu: gpu: simplify waiting logic for interrupt handler
The atomic counter in interrupt handler can overflow and result in
calling of BUG() which will crash the process. The equivalent
functionality can be implemented with just setting an atomic variable at
start of handler and resetting at end of handler. The wait can be longer
in case there is constant interrupts coming but ultimately it will end.
Generally the wait path is not time critical so it should not be an
issue. Also, fix the unit tests for mc.

Change-Id: I9b8a236f72e057e89a969d2e98d4d3f9be81b379
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2247819
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:10:29 -06:00
Philip Elcan
2b86f65477 gpu: nvgpu: mc: cleanup SWVR traceability
Cleanup issues with traceability for common.mc:
- Move these declarations under macros or @cond as they are either
  non-fusa or private functions to the unit:
  - gm20b_mc_is_enabled
  - mc_gp10b_log_pending_intrs
  - mc_gp10b_ltc_isr
  - gv11b_mc_is_intr_hub_pending
- Fix typo in SWUTS for gv11b_mc_is_stall_and_eng_intr_pending

JIRA NVGPU-4818

Change-Id: I53a332627772e4d793430159ac1924c8f9ce8c1c
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2280640
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:10:29 -06:00
Philip Elcan
12df00c943 gpu: nvgpu: unit: update SWUTS for mc
Update targets for mc to use new gops_mc.func syntax.
Update Test Type as appropriate.

JIRA NVGPU-4818

Change-Id: I03d68743aa802a391d6782076607266939a4a133
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2279483
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Nicolas Benech
b682091b13 gpu: nvgpu: SWUTS: clean up test types
Apply the following changes to test types:
* "Init" --> "Other (setup)"
* "Coverage" --> Removed since it's implied for all tests
* "Feature based" --> "Feature"
* "Boundary Value analysis" and "Boundary values based" --> "Boundary values"
* "Error guessing based" --> "Error guessing"

JIRA NVGPU-3510

Change-Id: I3a9c0c59e6ad806f3479caa5e9a62f4d89f76923
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2265670
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Sagar Kamble
fba516ffae gpu: nvgpu: enable PMU ECC interrupt early
PMU IRQs were not enabled assuming entire functionality for LS PMU.
Debugging early init issues of PMU falcon ECC errors triggered
during nvgpu power-on will be cumbersome if interrupts are not
enabled early. FMEA analysis of the nvgpu init path also
requires this interrupt be enabled earlier.

Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron.
pmu_enable_irq is updated to enable interrupts differently for
safety and non-safety. PMU interrupts disabling is moved out
of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new
wrapper API nvgpu_pmu_enable_irq.

PMU ECC init and isr mutex init is moved to the beginning of
nvgpu_pmu_early_init as for safety, ls pmu code path is
disabled. Fixed the pmu_early_init dependent and mc
interrupt related unit tests.

Update the doxygen for changed functions.

JIRA NVGPU-4439

Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2251732
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Philip Elcan
5f0d1f39c2 gpu: nvgpu: unit: create mc unit test
JIRA NVGPU-2224

Change-Id: Ic433e8bc2ac583c1735203d1b5f0fd61942c33d4
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2257128
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00