Skip to content

Commit 2ca1c94

Browse files
khfengkuba-moo
authored andcommitted
tg3: Disable tg3 device on system reboot to avoid triggering AER
Commit d60cd06 ("PM: ACPI: reboot: Use S5 for reboot") caused a reboot hang on one Dell servers so the commit was reverted. Someone managed to collect the AER log and it's caused by MSI: [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: 0000:04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 020000 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00010000 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 40000001 0000030f 90028090 00000000 [ 148.894823] tg3 0000:04:00.0: AER: aer_status: 0x00100000, aer_mask: 0x00010000 [ 148.902795] tg3 0000:04:00.0: AER: [20] UnsupReq (First) [ 148.910234] tg3 0000:04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 0000:04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 0000:04:00.0: AER: TLP Header: 40000001 0000030f 90028090 00000000 The MSI is probably raised by incoming packets, so power down the device and disable bus mastering to stop the traffic, as user confirmed this approach works. In addition to that, be extra safe and cancel reset task if it's running. Cc: Josef Bacik <[email protected]> Link: https://lore.kernel.org/all/b8db79e6857c41dab4ef08bdf826ea7c47e3bafc.1615947283.git.josef@toxicpanda.com/ BugLink: https://bugs.launchpad.net/bugs/1917471 Signed-off-by: Kai-Heng Feng <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 7498a45 commit 2ca1c94

File tree

1 file changed

+6
-2
lines changed
  • drivers/net/ethernet/broadcom

1 file changed

+6
-2
lines changed

drivers/net/ethernet/broadcom/tg3.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18076,16 +18076,20 @@ static void tg3_shutdown(struct pci_dev *pdev)
1807618076
struct net_device *dev = pci_get_drvdata(pdev);
1807718077
struct tg3 *tp = netdev_priv(dev);
1807818078

18079+
tg3_reset_task_cancel(tp);
18080+
1807918081
rtnl_lock();
18082+
1808018083
netif_device_detach(dev);
1808118084

1808218085
if (netif_running(dev))
1808318086
dev_close(dev);
1808418087

18085-
if (system_state == SYSTEM_POWER_OFF)
18086-
tg3_power_down(tp);
18088+
tg3_power_down(tp);
1808718089

1808818090
rtnl_unlock();
18091+
18092+
pci_disable_device(pdev);
1808918093
}
1809018094

1809118095
/**

0 commit comments

Comments
 (0)