Skip to content

Commit 19873ee

Browse files
dcuiliuw
authored andcommitted
Drivers: hv: vmbus: hibernation: do not hang forever in vmbus_bus_resume()
After we Stop and later Start a VM that uses Accelerated Networking (NIC SR-IOV), currently the VF vmbus device's Instance GUID can change, so after vmbus_bus_resume() -> vmbus_request_offers(), vmbus_onoffer() can not find the original vmbus channel of the VF, and hence we can't complete() vmbus_connection.ready_for_resume_event in check_ready_for_resume_event(), and the VM hangs in vmbus_bus_resume() forever. Fix the issue by adding a timeout, so the resuming can still succeed, and the saved state is not lost, and according to my test, the user can disable Accelerated Networking and then will be able to SSH into the VM for further recovery. Also prevent the VM in question from suspending again. The host will be fixed so in future the Instance GUID will stay the same across hibernation. Fixes: d8bd2d4 ("Drivers: hv: vmbus: Resume after fixing up old primary channels") Signed-off-by: Dexuan Cui <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
1 parent b46b4a8 commit 19873ee

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

drivers/hv/vmbus_drv.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2387,7 +2387,10 @@ static int vmbus_bus_suspend(struct device *dev)
23872387
if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
23882388
wait_for_completion(&vmbus_connection.ready_for_suspend_event);
23892389

2390-
WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0);
2390+
if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
2391+
pr_err("Can not suspend due to a previous failed resuming\n");
2392+
return -EBUSY;
2393+
}
23912394

23922395
mutex_lock(&vmbus_connection.channel_mutex);
23932396

@@ -2463,7 +2466,9 @@ static int vmbus_bus_resume(struct device *dev)
24632466

24642467
vmbus_request_offers();
24652468

2466-
wait_for_completion(&vmbus_connection.ready_for_resume_event);
2469+
if (wait_for_completion_timeout(
2470+
&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
2471+
pr_err("Some vmbus device is missing after suspending?\n");
24672472

24682473
/* Reset the event for the next suspend. */
24692474
reinit_completion(&vmbus_connection.ready_for_suspend_event);

0 commit comments

Comments
 (0)