pci: tegra-edma: add check for irq sync

Issue:
When AER error received during PCIe driver(c2C) probe, it results in
EDMA driver de-initing and AER handling happening at same time. PCIe
driver probe happens inside device lock and so is AER handling.
synchronize_irq() done by EDMA driver causes dead lock with AER device
lock.

Trace at RP hot-plug:
[  605.149061] INFO: task irq/344-tegra_p:394 blocked for more than 120 seconds.
<3>[  605.149066]       Not tainted 5.10.120-rt70-tegra #1
<3>[  605.149068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  605.149070] task:irq/344-tegra_p state:D
<6>[  605.149071] sched_debug_info: last_run_cpu 0 wake_cpu 0
<6>[  605.149073]  on_cpu 0 on_rq 0 migrate_dis 0
<4>[  605.149074]  stack:    0 pid:  394 ppid:     2 flags:0x00000028
<6>[  605.149077] Call trace:
<6>[  605.149078]  __switch_to+0xc8/0x120
<6>[  605.149090]  __schedule+0x334/0x930
<6>[  605.149095]  schedule+0x64/0x120
<6>[  605.149097]  synchronize_irq+0x8c/0xc0
<6>[  605.149101]  edma_stop+0x1ac/0x320
<6>[  605.149105]  tegra_pcie_edma_deinit+0x60/0x170
<6>[  605.149106]  nvscic2c_pcie_epc_probe+0x3ec/0x4a0 [nvscic2c_pcie_epc]
<6>[  605.149120]  pci_device_probe+0xe8/0x1a0
<6>[  605.149123]  really_probe+0xf8/0x3d0
<6>[  605.149126]  driver_probe_device+0x60/0xc0
<6>[  605.149128]  __device_attach_driver+0x8c/0xd0
<6>[  605.149130]  bus_for_each_drv+0x8c/0xe0
<6>[  605.149133]  __device_attach+0xf8/0x160
<6>[  605.149135]  device_attach+0x28/0x40
<6>[  605.149137]  pci_bus_add_device+0x5c/0xc0
<6>[  605.149141]  pci_bus_add_devices+0x40/0x90
<6>[  605.149142]  pci_bus_add_devices+0x6c/0x90
<6>[  605.149143]  pci_host_probe+0x50/0xd0
<6>[  605.149145]  dw_pcie_host_init+0x1c0/0x420
<6>[  605.149148]  tegra_pcie_config_rp+0x78/0x250
<6>[  605.149151]  tegra_pcie_prsnt_irq+0xb0/0x120

Trace at AER ISR at the same time:
<6>[  605.149335] Call trace:
<6>[  605.149336]  __switch_to+0xc8/0x120
<6>[  605.149339]  __schedule+0x334/0x930
<6>[  605.149342]  schedule+0x64/0x120
<6>[  605.149344]  __rt_mutex_slowlock+0xc4/0x150
<6>[  605.149346]  rt_mutex_slowlock_locked+0xb0/0x230
<6>[  605.149348]  rt_mutex_slowlock+0x88/0xf0
<6>[  605.149350]  __rt_mutex_lock_state+0x64/0xa0
<6>[  605.149352]  _mutex_lock_blk_flush+0x58/0x80
<6>[  605.149356]  _mutex_lock+0x28/0x40
<6>[  605.149358]  report_error_detected+0x34/0x120
<6>[  605.149361]  report_frozen_detected+0x30/0x40
<6>[  605.149363]  pci_walk_bus+0x68/0xc0
<6>[  605.149366]  pcie_do_recovery+0x154/0x1d0
<6>[  605.149368]  aer_process_err_devices+0xec/0x110
<6>[  605.149372]  aer_isr+0x154/0x1d0
<6>[  605.149374]  irq_thread_fn+0x34/0xa0
<6>[  605.149376]  irq_thread+0x188/0x280
<6>[  605.149379]  kthread+0x16c/0x1a0
<6>[  605.149381]  ret_from_fork+0x10/0x18

Fix:
Perform synchronize_irq() in EDMA de-init, only if there are any
descriptors pending processing. This acts as a WAR to avoid dead
lock during probe and any other shared interrupt handling in device
lock scope.

Bug 4414241

Change-Id: Ie389ebc3b32d6a1121c154ab60d08aa6c3c53e36
Signed-off-by: Nagarjuna Kristam <nkristam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3042451
(cherry picked from commit 2453fa2e09eafd23570f25091c5c1f9d92ed2aa4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/3047403
Reviewed-by: Bitan Biswas <bbiswas@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
This commit is contained in:
Nagarjuna Kristam
2023-12-29 13:25:08 +05:30
committed by mobile promotions
parent 0be6f60c9e
commit e94216eeb9

View File

@@ -2,7 +2,7 @@
/*
* PCIe EDMA Library Framework
*
* Copyright (C) 2021-2023 NVIDIA Corporation. All rights reserved.
* Copyright (C) 2021-2024 NVIDIA Corporation. All rights reserved.
*/
#include <linux/module.h>
@@ -751,6 +751,7 @@ static void edma_stop(struct edma_prv *prv, edma_xfer_status_t st)
struct edma_chan *chan[2], *ch;
int i, j;
u32 mode_cnt[2] = {DMA_WR_CHNL_NUM, DMA_RD_CHNL_NUM};
bool sync_irq = false;
chan[0] = &prv->tx[0];
chan[1] = &prv->rx[0];
@@ -767,13 +768,16 @@ static void edma_stop(struct edma_prv *prv, edma_xfer_status_t st)
/** wait until exisitng xfer submit completed */
mutex_lock(&ch->lock);
mutex_unlock(&ch->lock);
if (ch->w_idx != ch->r_idx)
sync_irq = true;
}
}
edma_hw_deinit(prv, false);
edma_hw_deinit(prv, true);
synchronize_irq(prv->irq);
if (sync_irq)
synchronize_irq(prv->irq);
for (j = 0; j < 2; j++) {
for (i = 0; i < mode_cnt[j]; i++) {