linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-24 10:34:43 +03:00

Author	SHA1	Message	Date
Sagar Kamble	40064ef1ec	gpu: nvgpu: fix ecc counter free ECC counter structures are freed without removing the node from the stats_list. This can lead to invalid access due to dangling pointers. Update the ecc counter free logic to set them to NULL upon free, to remove them from stats_list and free them by validation. Also updated some of the ecc init paths where error was not propa- gated to callers and full ecc counters deallocation was not done. Now, calling unit ecc_free from any context (with counters alloc- ated or not) is harmless as requisite checks are in place. bug 3326612 bug 3345977 Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264 Tested-by: Ashish Mhetre <amhetre@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-11 01:55:08 -07:00
tkudav	0526e7eaa9	gpu: nvgpu: Create CIC-mon and CIC-rm subunits common.cic unit is divided into common.cic.mon and common.cic.rm based on rm and mon process split. CIC-mon subunit includes the code which is utilized in critical interrupt handling path like initialization, error detection and error reporting path. CIC-rm subunit includes the code corresponding to rest of interrupt handling(like collecting error debug data from registers) and ISR status management (status of deferred interrupts). Split the CIC APIs and data-members into above two subunits. JIRA NVGPU-6899 Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-02 09:57:56 -07:00
Tejal Kudav	9f43914933	gpu: nvgpu: Move Intr handling common code to CIC CIC (Central Interrupt controller) will be responsible for the interrupt handling. common.cic unit is the placeholder for all interrupt related code. Move interrupt related defines and Public APIs present in common.mc to common.cic. Note: The common.mc interrupts related struct definitions are not moved as part of this patch. Adapt the code to use interrupt handling related defines and public APIs migrated from common.mc to common.cic JIRA NVGPU-6899 Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-31 19:37:31 -07:00
Tejal Kudav	4dcfbc19de	gpu: nvgpu: Trigger quiesce on spurious FBPA intr In Bug 200588835, the spurious FBPA interrupts are seen on couple of boards. These interrupts were found to be EDC (Error detection and Correction) interrupts which are triggered due to ECC errors. The EDC registers are not exposed to the driver, so the interrupt status register cannot be cleared; resulting in interrupt storm. Also, it was concluded that only bad HW can cause this failure scenario. So, in the ISR for FBPA interrupts, get the GPU into quiesce state as we don't expect the GPU to be in usable state post such unrecoverable errors. Adapt the quiesce code for Linux build too. 1. On Linux, we cannot exit the nvgpu process after quiesce like we do on QNX. So, add nvgpu_disable_irqs() call to quiesce implementation which is done as part of process exit handler on QNX. Masking interrupts which is already done as part of quiesce would be sufficient in most cases, but to be fail-safe disable_irqs too. 3. Also, the IOCTL code looks at g->sw_ready, hence add nvgpu_start_gpu_idle() to set g->sw_ready to false along with setting NVGPU_DRIVER_IS_DYING = true. We expect the nvgpu_sw_quiesce() call to finish before quiesce thread wakes up from 50ms sleep. Hence, critical step like nvgpu_start_gpu_idle() is added to nvgpu_sw_quiesce(), whereas the somewhat redundant disable IRQs call is added to quiesce thread. nvgpu_fifo_quiesce() was called twice by mistake; remove one of the them. Bug 2919899 Bug 200588835 Change-Id: I9beec688c2e1c0d8dfc1327ddf122684576f8684 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354537 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sagar Kamble	a8c9c800cd	gpu: nvgpu: reorganization of MC interrupts control Previously, unit interrupt enabling/disabling and corresponding MC level interrupt enabling/disabling was not done at the same time. With this change, stall and nonstall interrupt for units are programmed at MC level along with individual unit interrupts. Kept access to MC interrupt registers through mc.intr_lock spinlock. For doing this separated CE and GR interrupt mask functions. mc.intr_enable is only used when there is global interrupt control to be set. Removed mc_gp10b.c as mc_gp10b_intr_enable is now removed. Removed following functions - mc_gv100_intr_enable, mc_gv11b_intr_enable & intr_tu104_enable. Removed intr_pmu_unit_config as we can use the generic unit interrupt control function. JIRA NVGPU-4336 Change-Id: Ibd296d4a60fda6ba930f18f518ee56ab3f9dacad Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2196178 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	daf5475f50	gpu: nvgpu: split ecc support per GPU HW unit To enable ecc interrupts early during nvgpu_finalize_poweron, ecc support has to be enabled early. ecc support was being initialized together for GR, LTC, PMU, FB units late in the poweron sequence. Move the ecc init for each unit to respective unit's init functions. And separate out the hal ecc functions from GR ecc unit to respective hal units. JIRA NVGPU-4336 Change-Id: I2c42fb6ba3192dece00be61411c64a56ce16740a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2239153 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Prateek Sethi	bf5f86b354	gpu: nvgpu: fix boot time process crash In poweron sequence the mc interrupts are being enabled before initializing ecc support. That means ecc handler can be triggered if there is ecc error occur and can cause Segmentation Fault. To fix this issue adding a check in tu104_fbpa_handle_ecc_intr() Bug 2540926 Change-Id: I277a8946d797bfcaac353d27f4eadf0c7ebbadfa Signed-off-by: Prateek Sethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2125568 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-06-02 21:04:20 -07:00
Philip Elcan	415e427d41	gpu: nvgpu: create nvgpu.common.hal.fbpa unit Move chip specific fbpa files to hal/fbpa. Update Makefiles and include directives to make new locations. JIRA NVGPU-3257 Change-Id: Ifa4eebcd5ac8be620027400e75c199e4cf38bd80 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2107481 Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-29 14:38:35 -07:00

8 Commits