linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-23 18:16:01 +03:00

Author	SHA1	Message	Date
Vedashree Vidwans	26fc64fb0b	gpu: nvgpu: update common.mc function and docs - Update documentation for common.mc and gops_mc functions. - Rename test_setup_env and test_free_env to test_mc_setup_env and test_mc_free_env respectively. This will make sure that mc test has independent setup and free functions. - Add doxygen comments for mc.enable and mc.disable. - Modify MC unit test description. Jira NVGPU-6240 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Change-Id: I87291ee5f90b8e3c29c475c00a78c7855de5740e Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2457183 (cherry picked from commit c62ff36f87878a8a7513bef06e111117d96c61c8) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480602 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-03-04 11:04:15 -08:00
Vedashree Vidwans	e0dd79cd43	gpu: nvgpu: rearch mc reset and enable hals Remove current mc hals - mc.reset() - mc.enable() - mc.disable() - mc.reset_mask() - mc.reset_engine() - mc.reset_engine_enable() Add new mc hals - mc.enable_units(g, units, enable) > enable/disable given unit(s) - mc.enable_dev(g, dev, enable) > enable/disable engine represented by given device pointer - mc.enable_devtype(g, devtype) > enable/disable all engines of given devtype Move common mc intr functions to common/mc/mc_intr.c. Add below common mc functions - nvgpu_mc_reset_units(g, units) > reset given logical OR of nvgpu unit bitmap - nvgpu_mc_reset_dev(g, dev) > reset given single engine via dev > if engine is graphics, reset gpcs for nvgpu_next - nvgpu_mc_reset_devtype(g, devtype) > reset all engines of given devtype > if devtype is graphics, reset gpcs for nvgpu_next Bug 200648985 Bug 3109773 Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	fbb6a5bc1c	gpu: nvgpu: Remove fifo->pbdma_map The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists. This array allows other software to query what PBDMA(s) serves a given runlist. The PBDMA map is read verbatim from an array of host registers. These registers are stored in a kmalloc()'ed array. This causes a problem for the device management code. The device management initialization executes well before the rest of the FIFO PBDMA initialization occurs. Thus, if the device management code queries the PBDMA mapping for a given device/runlist, the mapping has yet to be populated. In the next patches in this series the engine management code is subsumed into the device management code. In other words the device struct is reused by the engine management and all host SW does is pull pointers to the host managed devices from the device manager. This means that all engine initialization that used to be done on top of the device management needs to move to the device code. So, long story short, the PBDMA map needs to be read from the registers directly, instead of an array that gets allocated long after the device code has run. This patch removes the pbdma map array, deletes two HALs that managed that, and instead provides a new HAL to query this map directly from the registers so that the device code can use it. JIRA NVGPU-5421 Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Seema Khowala	db30ea3362	gpu: nvgpu: move mc_intr_pbus from stall (intr_0) to nonstall (intr_1) tree Nvgpu does not support nested interrupts and as a result priv/pbus interrupt do not reach cpu while other interrupts on intr_0 (stall) tree are being processed. This issue is not specific to priv/pbus but since pbus errors are critical, it is important to detect it early on. Below is the snippet from one of the failing logs where nvgpu is doing recovery to process gr interrupt. Right after GR engine is reset (PGRAPH of PMC_ENABLE), failing priv accesses should have triggered pbus interrupt but it does not reach cpu until gr interrupt is handled. Any interrupt that requires recovery will take longer to finish isr as recovery is done as part of isr. Also intr_0 (stall) interrupts are paused while stall interrupt is being processed. gm20b_gr_falcon_bind_instblk:147 [ERR] arbiter idle timeout, status: badf1020 gm20b_gr_falcon_wait_for_fecs_arb_idle:125 [ERR] arbiter idle timeout, fecs ctxsw status: 0xbadf1020 Fix to detect pbus intr while other stall interrupts are being processed is to move pbus intr enable/disable/clear/handle to nonstall (intr_1) tree. Configure pbus_intr_en_1 to route pbus to nostall tree. Priv interrupts cannot be moved to nonstall (intr_1) tree due to h/w not supporting this. In Turing, moving pbus intr to nonstall is not feasible as mc_intr(1) tree is deprecated. Add Turing specific stall intr handler hals with original logic to route pbus intr to mc_intr(0). JIRA NVGPU-25 Bug 200603566 Change-Id: I36fc376800802f20a0ea581b4f787bcc6c73ec7e Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354192 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	59eb714c48	unit: Disable some unit tests for device work Fix what unit tests can be easily fixed, but disable some others. It's not clear why the MM related tests started failing - there's really zero reason for this. The list of disable tests are primarily engine related but there are some others that get inflenced by the device and engine structure. test_poweroff.init_poweroff=2 test_is_stall_and_eng_intr_pending.intr_is_stall_and_eng_intr_pending=2 test_isr_nonstall.isr_nonstall=2 test_isr_stall.isr_stall=2 test_engine_enum_from_type.enum_from_type=2 test_engine_find_busy_doing_ctxsw.find_busy_doing_ctxsw=2 test_engine_get_active_eng_info.get_active_eng_info=2 test_engine_get_fast_ce_runlist_id.get_fast_ce_runlist_id=2 test_engine_get_gr_runlist_id.get_gr_runlist_id=2 test_engine_get_mask_on_id.get_mask_on_id=2 test_engine_get_runlist_busy_engines.get_runlist_busy_engines=2 test_engine_ids.ids=2 test_engine_init_info.init_info=2 test_engine_interrupt_mask.interrupt_mask=2 test_engine_is_valid_runlist_id.is_valid_runlist_id=2 test_engine_mmu_fault_id.mmu_fault_id=2 test_engine_mmu_fault_id_veid.mmu_fault_id_veid=2 test_engine_setup_sw.setup_sw=2 test_engine_status.status=2 test_fifo_init_support.init_support=2 test_fifo_remove_support.remove_support=2 test_gp10b_engine_init_ce_info.engine_init_ce_info=2 test_nvgpu_mem_iommu_translate.mem_iommu_translate=2 test_nvgpu_mem_phys_ops.nvgpu_mem_phys_ops=2 And delete unit tests for functions that no longer exist: test_device_info_parse_enum.top_device_info_parse_enum test_get_device_info.top_get_device_info test_get_num_engine_type_entries.top_get_num_engine_type_entries test_is_engine_ce.top_is_engine_ce test_is_engine_gr.top_is_engine_gr JIRA NVGPU-5421 Change-Id: I343c0b1ea44c472b22356c896672153fc889ffc0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2355300 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	2a3bb9107f	gpu: nvgpu: rename <nvgpu/top.h> to <nvgpu/device.h> top.h is a description of "devices" available on the GPU. As such rename this header to device.h. device.h will ultimately be a unit of actual C code that will rely on the top HAL to fill a device list. JIRA NVGPU-5421 Change-Id: If6e4a537d2209e429a678761a34713723da7a00a Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319648 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	5f0fdf085c	nvgpu: unit: Add new mock register framework Many tests used various incarnations of the mock register framework. This was based on a dump of gv11b registers. Tests that greatly benefitted from having generally sane register values all rely heavily on this framework. However, every test essentially did their own thing. This was not efficient and has caused a some issues in cleaning up the device and host code. Therefore introduce a much leaner and simplified register framework. All unit tests now automatically get a good subset of the gv11b registers auto-populated. As part of this also populate the HAL with a nvgpu_detect_chip() call. Many tests can now _probably_ have all their HAL init (except dummy HAL stuff) deleted. But this does require a few fixups here and there to set HALs to NULL where tests expect HALs to be NULL by default. Where necessary HALs are cleared with a memset to prevent unwanted code from executing. Overall, this imposes a far smaller burden on tests to initialize their environments. Something to consider for the future, though, is how to handle supporting multiple chips in the unit test world. JIRA NVGPU-5422 Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Seema Khowala	aff5497907	gpu: nvgpu: add intr_unit_bitmask i/p param for fb.intr.isr tu104 onwards, fb interrupt status/enable/disable moved from fb_niso_intr_* reg to fb_vector registers. At the top level, fb interrupt status/enable/disable is done using hub_intr bit in mc_intr registers. Starting nvgpu-next, this has changed. JIRA NVGPU-5032 Change-Id: Ib54170b055b83e2696312c811c2e3ba678749359 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2330867 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sagar Kamble	57f3968cb9	gpu: nvgpu: mc: address code inspection gaps Address following issues uncovered during inspection: 1. Remove the doxygen comment from nvgpu_wait_for_deferred_interrupts definition. 2. Use NVGPU_MC_INTR_STALLING instead of hardcoding the index. 3. Define doxygen groups NVGPU_MC_UNIT_ENUMS, NVGPU_MC_INTR_TYPE_DEFINES, NVGPU_MC_INTR_UNIT_DEFINES and NVGPU_MC_INTR_ENABLE_DEFINES. 4. Update the doxygen comments. 5. Fix the cleanup, typo in the description of the test test_wait_for_deferred_interrupts. JIRA NVGPU-4795 Change-Id: Ifc6756832aabf9dd42ee174eb1373495e6d38c86 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287627 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
shashank singh	d34bad0a27	nvgpu: gpu: simplify waiting logic for interrupt handler The atomic counter in interrupt handler can overflow and result in calling of BUG() which will crash the process. The equivalent functionality can be implemented with just setting an atomic variable at start of handler and resetting at end of handler. The wait can be longer in case there is constant interrupts coming but ultimately it will end. Generally the wait path is not time critical so it should not be an issue. Also, fix the unit tests for mc. Change-Id: I9b8a236f72e057e89a969d2e98d4d3f9be81b379 Signed-off-by: shashank singh <shashsingh@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2247819 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Philip Elcan	2b86f65477	gpu: nvgpu: mc: cleanup SWVR traceability Cleanup issues with traceability for common.mc: - Move these declarations under macros or @cond as they are either non-fusa or private functions to the unit: - gm20b_mc_is_enabled - mc_gp10b_log_pending_intrs - mc_gp10b_ltc_isr - gv11b_mc_is_intr_hub_pending - Fix typo in SWUTS for gv11b_mc_is_stall_and_eng_intr_pending JIRA NVGPU-4818 Change-Id: I53a332627772e4d793430159ac1924c8f9ce8c1c Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2280640 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Philip Elcan	12df00c943	gpu: nvgpu: unit: update SWUTS for mc Update targets for mc to use new gops_mc.func syntax. Update Test Type as appropriate. JIRA NVGPU-4818 Change-Id: I03d68743aa802a391d6782076607266939a4a133 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2279483 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Nicolas Benech	b682091b13	gpu: nvgpu: SWUTS: clean up test types Apply the following changes to test types: * "Init" --> "Other (setup)" * "Coverage" --> Removed since it's implied for all tests * "Feature based" --> "Feature" * "Boundary Value analysis" and "Boundary values based" --> "Boundary values" * "Error guessing based" --> "Error guessing" JIRA NVGPU-3510 Change-Id: I3a9c0c59e6ad806f3479caa5e9a62f4d89f76923 Signed-off-by: Nicolas Benech <nbenech@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2265670 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	fba516ffae	gpu: nvgpu: enable PMU ECC interrupt early PMU IRQs were not enabled assuming entire functionality for LS PMU. Debugging early init issues of PMU falcon ECC errors triggered during nvgpu power-on will be cumbersome if interrupts are not enabled early. FMEA analysis of the nvgpu init path also requires this interrupt be enabled earlier. Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron. pmu_enable_irq is updated to enable interrupts differently for safety and non-safety. PMU interrupts disabling is moved out of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new wrapper API nvgpu_pmu_enable_irq. PMU ECC init and isr mutex init is moved to the beginning of nvgpu_pmu_early_init as for safety, ls pmu code path is disabled. Fixed the pmu_early_init dependent and mc interrupt related unit tests. Update the doxygen for changed functions. JIRA NVGPU-4439 Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2251732 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	5f0d1f39c2	gpu: nvgpu: unit: create mc unit test JIRA NVGPU-2224 Change-Id: Ic433e8bc2ac583c1735203d1b5f0fd61942c33d4 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2257128 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00

16 Commits