linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 09:12:24 +03:00

Author	SHA1	Message	Date
Sagar Kamble	4b73eb8a43	gpu: nvgpu: add BVEC test for LTC isr Add BVEC tests for following common.ltc unit API: gops_ltc_intr.isr Add unit test for boundary value check for ltc parameter of the LTC isr. JIRA NVGPU-6398 Change-Id: I0e075a3244d969d11faa4fd99e7e364218da6e30 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2549802 (cherry picked from commit 3133a7173b0853a699e4ebf2fc50e866e3ac6211) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623636 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-14 08:58:47 -07:00
Tejal Kudav	b80b2bdab8	gpu: nvgpu: Add CE interrupt handling a. LAUNCH_ERR - Userspace error. - Triggered due to faulty launch. - Handle using recovery to reset CE engine and teardown the faulty channel. b. An INVALID_CONFIG - - Triggered when LCE is mapped to floorswept PCE. - On iGPU, we use the default PCE 2 LCE HW mapping. The default mapping can be read from NV_CE_PCE2LCE_CONFIG INIT value in CE refmanual. - NvGPU driver configures the mapping on dGPUs (currently only on Turing). - So, this interrupt can only be triggered if there is kernel or HW error - Recovery ( which is killing the context + engine reset) will not help resolve this error. - Trigger Quiesce as part of handling. c. A MTHD_BUFFER_FAULT - - NvGPU driver allocates fault buffers for all TSGs or contexts, maps them in BAR2 VA space and writes the VA into channel instance block. - Can be triggered only due to kernel bug - Recovery will not help, need quiesce d. FBUF_CRC_FAIL - Triggered when the CRC entry read from the method fault buffer does not match the computed CRC from the methods contained in the buffer. - This indicates memory corruption and is a fatal interrupt which at least requires the LCE to be reset before operations can start again, if not the entire GPU. - Better to quiesce on memory corruption CE Engine reset (via recovery) will not help. e. FBUF_MAGIC_CHK_FAIL - Triggered when the MAGIC_NUM entry read from the method fault buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL - This indicates memory corruption and is a fatal interrupt - Better to quiesce on memory corruption f. STALLING_DEBUG - Only triggered with SW write for debug purposes - Debug interrupt, currently ignored Move launch error handling from GP10b to GV11b HAL as - 1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not defined on Pascal 2. We do not support GP10b on dev-main ToT JIRA NVGPU-8102 Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-14 17:12:14 -07:00
Tejal Kudav	3fe70bf86e	gpu: nvgpu: Update CE Intr code as per Orin HSIs Below CE interrupts do not have any users(usecases) on safety build; disable them only on safety build. 1. BLOCKPIPE stall intr: Not used by GFX(VKSC) and CUDA on safety. 2. NONBLOCK_PIPE nonstall intr: Non-stall intrs are not supported on safety build. Also, this one is not used by GFX(VKSC) and CUDA. 3. STALLING_DEBUG intr: Added in Orin tree. It is only needed for debugging. Disable on safety build as there is no current usage in driver. 4. POISON_ERROR intr: Poison is a fault containment and not supported on GA10b. 5. INVALID_CONFIG intr: Floor sweeping not supported on functional safety SKU. Bug 3548082 Change-Id: I8d97ccb38f138b2c04a780e1c255a64d28723405 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671927 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-08 11:41:26 -08:00
Shashank Singh	5ec241a1d8	gpu: nvgpu: remove non stall intr from top handler for safety On safety nonstall interrupt is not used and should be compiled out to rule out any chance of interference with safety code. Remove top handler support of nonstall interrupt for safety which is currently not applicable to linux. Jira NVGPU-7066 Jira NVGPU-4078 Change-Id: I278efc8da6ddd0f22c128af6630cfd1b20ba4784 Signed-off-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589006 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671586 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-02-21 02:31:38 -08:00
Sagar Kamble	cdfbd4313b	gpu: nvgpu: add BVEC tests for common.mc unit Add BVEC tests for following common.mc unit APIs: 1. nvgpu_mc_intr_stall_unit_config 2. nvgpu_mc_intr_nonstall_unit_config 3. mc.reset_mask Changed the WARN to nvgpu_err in mc.reset_mask. Invalid inputs are handled properly. Updated the MC unit test logic w.r.t mc_intr_en_r, mc_intr_en_set_r and mc_intr_en_clear_r semantics. JIRA NVGPU-6399 Change-Id: I6a3ae42ac37cd6b586f6c71de338595e6cb04a37 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542591 (cherry picked from commit b9908c979e8964a216141cc6ed475c7de2f2cc0b) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623631 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-11-14 01:30:25 -08:00
Tejal Kudav	b33079d47e	gpu: nvgpu: Move intr data members from MC to CIC Move interrupt specific data-members from common.mc to common.cic Some of these data members like sw_irq_stall_last_handled_cond need To be initialized much earlier during the OS specific init/probe stage. Also, some more members from struct nvgpu_interrupts(like stall_size, stall_lines[]), which will soon be moved to CIC will also need to be initialized early during the OS specific probe stage. However, the chip specific LUT can only be initialized after the hal_init stage where the HALs are all initialized. Split the CIC init to accommodate the above initialization requirements. JIRA NVGPU-6899 Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 18:06:28 -07:00
tkudav	0526e7eaa9	gpu: nvgpu: Create CIC-mon and CIC-rm subunits common.cic unit is divided into common.cic.mon and common.cic.rm based on rm and mon process split. CIC-mon subunit includes the code which is utilized in critical interrupt handling path like initialization, error detection and error reporting path. CIC-rm subunit includes the code corresponding to rest of interrupt handling(like collecting error debug data from registers) and ISR status management (status of deferred interrupts). Split the CIC APIs and data-members into above two subunits. JIRA NVGPU-6899 Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-02 09:57:56 -07:00
Tejal Kudav	9f43914933	gpu: nvgpu: Move Intr handling common code to CIC CIC (Central Interrupt controller) will be responsible for the interrupt handling. common.cic unit is the placeholder for all interrupt related code. Move interrupt related defines and Public APIs present in common.mc to common.cic. Note: The common.mc interrupts related struct definitions are not moved as part of this patch. Adapt the code to use interrupt handling related defines and public APIs migrated from common.mc to common.cic JIRA NVGPU-6899 Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-31 19:37:31 -07:00
Vedashree Vidwans	26fc64fb0b	gpu: nvgpu: update common.mc function and docs - Update documentation for common.mc and gops_mc functions. - Rename test_setup_env and test_free_env to test_mc_setup_env and test_mc_free_env respectively. This will make sure that mc test has independent setup and free functions. - Add doxygen comments for mc.enable and mc.disable. - Modify MC unit test description. Jira NVGPU-6240 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Change-Id: I87291ee5f90b8e3c29c475c00a78c7855de5740e Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2457183 (cherry picked from commit c62ff36f87878a8a7513bef06e111117d96c61c8) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480602 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-03-04 11:04:15 -08:00
Vedashree Vidwans	e0dd79cd43	gpu: nvgpu: rearch mc reset and enable hals Remove current mc hals - mc.reset() - mc.enable() - mc.disable() - mc.reset_mask() - mc.reset_engine() - mc.reset_engine_enable() Add new mc hals - mc.enable_units(g, units, enable) > enable/disable given unit(s) - mc.enable_dev(g, dev, enable) > enable/disable engine represented by given device pointer - mc.enable_devtype(g, devtype) > enable/disable all engines of given devtype Move common mc intr functions to common/mc/mc_intr.c. Add below common mc functions - nvgpu_mc_reset_units(g, units) > reset given logical OR of nvgpu unit bitmap - nvgpu_mc_reset_dev(g, dev) > reset given single engine via dev > if engine is graphics, reset gpcs for nvgpu_next - nvgpu_mc_reset_devtype(g, devtype) > reset all engines of given devtype > if devtype is graphics, reset gpcs for nvgpu_next Bug 200648985 Bug 3109773 Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	fbb6a5bc1c	gpu: nvgpu: Remove fifo->pbdma_map The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists. This array allows other software to query what PBDMA(s) serves a given runlist. The PBDMA map is read verbatim from an array of host registers. These registers are stored in a kmalloc()'ed array. This causes a problem for the device management code. The device management initialization executes well before the rest of the FIFO PBDMA initialization occurs. Thus, if the device management code queries the PBDMA mapping for a given device/runlist, the mapping has yet to be populated. In the next patches in this series the engine management code is subsumed into the device management code. In other words the device struct is reused by the engine management and all host SW does is pull pointers to the host managed devices from the device manager. This means that all engine initialization that used to be done on top of the device management needs to move to the device code. So, long story short, the PBDMA map needs to be read from the registers directly, instead of an array that gets allocated long after the device code has run. This patch removes the pbdma map array, deletes two HALs that managed that, and instead provides a new HAL to query this map directly from the registers so that the device code can use it. JIRA NVGPU-5421 Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Seema Khowala	db30ea3362	gpu: nvgpu: move mc_intr_pbus from stall (intr_0) to nonstall (intr_1) tree Nvgpu does not support nested interrupts and as a result priv/pbus interrupt do not reach cpu while other interrupts on intr_0 (stall) tree are being processed. This issue is not specific to priv/pbus but since pbus errors are critical, it is important to detect it early on. Below is the snippet from one of the failing logs where nvgpu is doing recovery to process gr interrupt. Right after GR engine is reset (PGRAPH of PMC_ENABLE), failing priv accesses should have triggered pbus interrupt but it does not reach cpu until gr interrupt is handled. Any interrupt that requires recovery will take longer to finish isr as recovery is done as part of isr. Also intr_0 (stall) interrupts are paused while stall interrupt is being processed. gm20b_gr_falcon_bind_instblk:147 [ERR] arbiter idle timeout, status: badf1020 gm20b_gr_falcon_wait_for_fecs_arb_idle:125 [ERR] arbiter idle timeout, fecs ctxsw status: 0xbadf1020 Fix to detect pbus intr while other stall interrupts are being processed is to move pbus intr enable/disable/clear/handle to nonstall (intr_1) tree. Configure pbus_intr_en_1 to route pbus to nostall tree. Priv interrupts cannot be moved to nonstall (intr_1) tree due to h/w not supporting this. In Turing, moving pbus intr to nonstall is not feasible as mc_intr(1) tree is deprecated. Add Turing specific stall intr handler hals with original logic to route pbus intr to mc_intr(0). JIRA NVGPU-25 Bug 200603566 Change-Id: I36fc376800802f20a0ea581b4f787bcc6c73ec7e Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354192 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	59eb714c48	unit: Disable some unit tests for device work Fix what unit tests can be easily fixed, but disable some others. It's not clear why the MM related tests started failing - there's really zero reason for this. The list of disable tests are primarily engine related but there are some others that get inflenced by the device and engine structure. test_poweroff.init_poweroff=2 test_is_stall_and_eng_intr_pending.intr_is_stall_and_eng_intr_pending=2 test_isr_nonstall.isr_nonstall=2 test_isr_stall.isr_stall=2 test_engine_enum_from_type.enum_from_type=2 test_engine_find_busy_doing_ctxsw.find_busy_doing_ctxsw=2 test_engine_get_active_eng_info.get_active_eng_info=2 test_engine_get_fast_ce_runlist_id.get_fast_ce_runlist_id=2 test_engine_get_gr_runlist_id.get_gr_runlist_id=2 test_engine_get_mask_on_id.get_mask_on_id=2 test_engine_get_runlist_busy_engines.get_runlist_busy_engines=2 test_engine_ids.ids=2 test_engine_init_info.init_info=2 test_engine_interrupt_mask.interrupt_mask=2 test_engine_is_valid_runlist_id.is_valid_runlist_id=2 test_engine_mmu_fault_id.mmu_fault_id=2 test_engine_mmu_fault_id_veid.mmu_fault_id_veid=2 test_engine_setup_sw.setup_sw=2 test_engine_status.status=2 test_fifo_init_support.init_support=2 test_fifo_remove_support.remove_support=2 test_gp10b_engine_init_ce_info.engine_init_ce_info=2 test_nvgpu_mem_iommu_translate.mem_iommu_translate=2 test_nvgpu_mem_phys_ops.nvgpu_mem_phys_ops=2 And delete unit tests for functions that no longer exist: test_device_info_parse_enum.top_device_info_parse_enum test_get_device_info.top_get_device_info test_get_num_engine_type_entries.top_get_num_engine_type_entries test_is_engine_ce.top_is_engine_ce test_is_engine_gr.top_is_engine_gr JIRA NVGPU-5421 Change-Id: I343c0b1ea44c472b22356c896672153fc889ffc0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2355300 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	2a3bb9107f	gpu: nvgpu: rename <nvgpu/top.h> to <nvgpu/device.h> top.h is a description of "devices" available on the GPU. As such rename this header to device.h. device.h will ultimately be a unit of actual C code that will rely on the top HAL to fill a device list. JIRA NVGPU-5421 Change-Id: If6e4a537d2209e429a678761a34713723da7a00a Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319648 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	5f0fdf085c	nvgpu: unit: Add new mock register framework Many tests used various incarnations of the mock register framework. This was based on a dump of gv11b registers. Tests that greatly benefitted from having generally sane register values all rely heavily on this framework. However, every test essentially did their own thing. This was not efficient and has caused a some issues in cleaning up the device and host code. Therefore introduce a much leaner and simplified register framework. All unit tests now automatically get a good subset of the gv11b registers auto-populated. As part of this also populate the HAL with a nvgpu_detect_chip() call. Many tests can now _probably_ have all their HAL init (except dummy HAL stuff) deleted. But this does require a few fixups here and there to set HALs to NULL where tests expect HALs to be NULL by default. Where necessary HALs are cleared with a memset to prevent unwanted code from executing. Overall, this imposes a far smaller burden on tests to initialize their environments. Something to consider for the future, though, is how to handle supporting multiple chips in the unit test world. JIRA NVGPU-5422 Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Seema Khowala	aff5497907	gpu: nvgpu: add intr_unit_bitmask i/p param for fb.intr.isr tu104 onwards, fb interrupt status/enable/disable moved from fb_niso_intr_* reg to fb_vector registers. At the top level, fb interrupt status/enable/disable is done using hub_intr bit in mc_intr registers. Starting nvgpu-next, this has changed. JIRA NVGPU-5032 Change-Id: Ib54170b055b83e2696312c811c2e3ba678749359 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2330867 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sagar Kamble	57f3968cb9	gpu: nvgpu: mc: address code inspection gaps Address following issues uncovered during inspection: 1. Remove the doxygen comment from nvgpu_wait_for_deferred_interrupts definition. 2. Use NVGPU_MC_INTR_STALLING instead of hardcoding the index. 3. Define doxygen groups NVGPU_MC_UNIT_ENUMS, NVGPU_MC_INTR_TYPE_DEFINES, NVGPU_MC_INTR_UNIT_DEFINES and NVGPU_MC_INTR_ENABLE_DEFINES. 4. Update the doxygen comments. 5. Fix the cleanup, typo in the description of the test test_wait_for_deferred_interrupts. JIRA NVGPU-4795 Change-Id: Ifc6756832aabf9dd42ee174eb1373495e6d38c86 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2287627 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
shashank singh	d34bad0a27	nvgpu: gpu: simplify waiting logic for interrupt handler The atomic counter in interrupt handler can overflow and result in calling of BUG() which will crash the process. The equivalent functionality can be implemented with just setting an atomic variable at start of handler and resetting at end of handler. The wait can be longer in case there is constant interrupts coming but ultimately it will end. Generally the wait path is not time critical so it should not be an issue. Also, fix the unit tests for mc. Change-Id: I9b8a236f72e057e89a969d2e98d4d3f9be81b379 Signed-off-by: shashank singh <shashsingh@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2247819 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Sagar Kamble	fba516ffae	gpu: nvgpu: enable PMU ECC interrupt early PMU IRQs were not enabled assuming entire functionality for LS PMU. Debugging early init issues of PMU falcon ECC errors triggered during nvgpu power-on will be cumbersome if interrupts are not enabled early. FMEA analysis of the nvgpu init path also requires this interrupt be enabled earlier. Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron. pmu_enable_irq is updated to enable interrupts differently for safety and non-safety. PMU interrupts disabling is moved out of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new wrapper API nvgpu_pmu_enable_irq. PMU ECC init and isr mutex init is moved to the beginning of nvgpu_pmu_early_init as for safety, ls pmu code path is disabled. Fixed the pmu_early_init dependent and mc interrupt related unit tests. Update the doxygen for changed functions. JIRA NVGPU-4439 Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2251732 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	5f0d1f39c2	gpu: nvgpu: unit: create mc unit test JIRA NVGPU-2224 Change-Id: Ic433e8bc2ac583c1735203d1b5f0fd61942c33d4 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2257128 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00

21 Commits