Skip to content

Commit 16b55b1

Browse files
ThinhTrTrankuba-moo
authored andcommitted
net/tg3: fix race condition in tg3_reset_task()
When an EEH error is encountered by a PCI adapter, the EEH driver modifies the PCI channel's state as shown below: enum { /* I/O channel is in normal state */ pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* PCI card is dead */ pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, }; If the same EEH error then causes the tg3 driver's transmit timeout logic to execute, the tg3_tx_timeout() function schedules a reset task via tg3_reset_task_schedule(), which may cause a race condition between the tg3 and EEH driver as both attempt to recover the HW via a reset action. EEH driver gets error event --> eeh_set_channel_state() and set device to one of error state above scheduler: tg3_reset_task() get returned error from tg3_init_hw() --> dev_close() shuts down the interface tg3_io_slot_reset() and tg3_io_resume() fail to reset/resume the device To resolve this issue, we avoid the race condition by checking the PCI channel state in the tg3_reset_task() function and skip the tg3 driver initiated reset when the PCI channel is not in the normal state. (The driver has no access to tg3 device registers at this point and cannot even complete the reset task successfully without external assistance.) We'll leave the reset procedure to be managed by the EEH driver which calls the tg3_io_error_detected(), tg3_io_slot_reset() and tg3_io_resume() functions as appropriate. Adding the same checking in tg3_dump_state() to avoid dumping all device registers when the PCI channel is not in the normal state. Signed-off-by: Thinh Tran <[email protected]> Tested-by: Venkata Sai Duggi <[email protected]> Reviewed-by: David Christensen <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 830139e commit 16b55b1

File tree

1 file changed

+10
-1
lines changed
  • drivers/net/ethernet/broadcom

1 file changed

+10
-1
lines changed

drivers/net/ethernet/broadcom/tg3.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6474,6 +6474,14 @@ static void tg3_dump_state(struct tg3 *tp)
64746474
int i;
64756475
u32 *regs;
64766476

6477+
/* If it is a PCI error, all registers will be 0xffff,
6478+
* we don't dump them out, just report the error and return
6479+
*/
6480+
if (tp->pdev->error_state != pci_channel_io_normal) {
6481+
netdev_err(tp->dev, "PCI channel ERROR!\n");
6482+
return;
6483+
}
6484+
64776485
regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
64786486
if (!regs)
64796487
return;
@@ -11259,7 +11267,8 @@ static void tg3_reset_task(struct work_struct *work)
1125911267
rtnl_lock();
1126011268
tg3_full_lock(tp, 0);
1126111269

11262-
if (tp->pcierr_recovery || !netif_running(tp->dev)) {
11270+
if (tp->pcierr_recovery || !netif_running(tp->dev) ||
11271+
tp->pdev->error_state != pci_channel_io_normal) {
1126311272
tg3_flag_clear(tp, RESET_TASK_PENDING);
1126411273
tg3_full_unlock(tp);
1126511274
rtnl_unlock();

0 commit comments

Comments
 (0)