mirror of
git://nv-tegra.nvidia.com/linux-nv-oot.git
synced 2025-12-22 17:25:35 +03:00
Issue: When AER error received during PCIe driver(c2C) probe, it results in EDMA driver de-initing and AER handling happening at same time. PCIe driver probe happens inside device lock and so is AER handling. synchronize_irq() done by EDMA driver causes dead lock with AER device lock. Trace at RP hot-plug: [ 605.149061] INFO: task irq/344-tegra_p:394 blocked for more than 120 seconds. <3>[ 605.149066] Not tainted 5.10.120-rt70-tegra #1 <3>[ 605.149068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ 605.149070] task:irq/344-tegra_p state:D <6>[ 605.149071] sched_debug_info: last_run_cpu 0 wake_cpu 0 <6>[ 605.149073] on_cpu 0 on_rq 0 migrate_dis 0 <4>[ 605.149074] stack: 0 pid: 394 ppid: 2 flags:0x00000028 <6>[ 605.149077] Call trace: <6>[ 605.149078] __switch_to+0xc8/0x120 <6>[ 605.149090] __schedule+0x334/0x930 <6>[ 605.149095] schedule+0x64/0x120 <6>[ 605.149097] synchronize_irq+0x8c/0xc0 <6>[ 605.149101] edma_stop+0x1ac/0x320 <6>[ 605.149105] tegra_pcie_edma_deinit+0x60/0x170 <6>[ 605.149106] nvscic2c_pcie_epc_probe+0x3ec/0x4a0 [nvscic2c_pcie_epc] <6>[ 605.149120] pci_device_probe+0xe8/0x1a0 <6>[ 605.149123] really_probe+0xf8/0x3d0 <6>[ 605.149126] driver_probe_device+0x60/0xc0 <6>[ 605.149128] __device_attach_driver+0x8c/0xd0 <6>[ 605.149130] bus_for_each_drv+0x8c/0xe0 <6>[ 605.149133] __device_attach+0xf8/0x160 <6>[ 605.149135] device_attach+0x28/0x40 <6>[ 605.149137] pci_bus_add_device+0x5c/0xc0 <6>[ 605.149141] pci_bus_add_devices+0x40/0x90 <6>[ 605.149142] pci_bus_add_devices+0x6c/0x90 <6>[ 605.149143] pci_host_probe+0x50/0xd0 <6>[ 605.149145] dw_pcie_host_init+0x1c0/0x420 <6>[ 605.149148] tegra_pcie_config_rp+0x78/0x250 <6>[ 605.149151] tegra_pcie_prsnt_irq+0xb0/0x120 Trace at AER ISR at the same time: <6>[ 605.149335] Call trace: <6>[ 605.149336] __switch_to+0xc8/0x120 <6>[ 605.149339] __schedule+0x334/0x930 <6>[ 605.149342] schedule+0x64/0x120 <6>[ 605.149344] __rt_mutex_slowlock+0xc4/0x150 <6>[ 605.149346] rt_mutex_slowlock_locked+0xb0/0x230 <6>[ 605.149348] rt_mutex_slowlock+0x88/0xf0 <6>[ 605.149350] __rt_mutex_lock_state+0x64/0xa0 <6>[ 605.149352] _mutex_lock_blk_flush+0x58/0x80 <6>[ 605.149356] _mutex_lock+0x28/0x40 <6>[ 605.149358] report_error_detected+0x34/0x120 <6>[ 605.149361] report_frozen_detected+0x30/0x40 <6>[ 605.149363] pci_walk_bus+0x68/0xc0 <6>[ 605.149366] pcie_do_recovery+0x154/0x1d0 <6>[ 605.149368] aer_process_err_devices+0xec/0x110 <6>[ 605.149372] aer_isr+0x154/0x1d0 <6>[ 605.149374] irq_thread_fn+0x34/0xa0 <6>[ 605.149376] irq_thread+0x188/0x280 <6>[ 605.149379] kthread+0x16c/0x1a0 <6>[ 605.149381] ret_from_fork+0x10/0x18 Fix: Perform synchronize_irq() in EDMA de-init, only if there are any descriptors pending processing. This acts as a WAR to avoid dead lock during probe and any other shared interrupt handling in device lock scope. Bug 4414241 Change-Id: Ie389ebc3b32d6a1121c154ab60d08aa6c3c53e36 Signed-off-by: Nagarjuna Kristam <nkristam@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3042451 (cherry picked from commit 2453fa2e09eafd23570f25091c5c1f9d92ed2aa4) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3047403 Reviewed-by: Bitan Biswas <bbiswas@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>