Skip to content

Commit 19f9b37

Browse files
notartomSeanMooney
authored andcommitted
Revert resize: wait for events according to hybrid plug
Since 4817165, when reverting a resized instance back to the source host, the libvirt driver waits for vif-plugged events when spawning the instance. When called from finish_revert_resize() in the source compute manager, libvirt's finish_revert_migration() does not pass vifs_already_plugged to _create_domain_and_network(), making the latter use the default False value. When the source compute manager calls network_api.migrate_instance_finish() in finish_revert_resize(), this updates the port binding back to the source host. If Neutron is configured to use OVS hybrid plug, it will send the vif-plugged event immediately after completing this request. This happens before the virt driver's finish_revert_migration() method is called. This causes the wait in the libvirt driver to time out because the event is received before Nova starts waiting for it. The neutron ovs l2 agent sends vif-plugged events when two conditions are met. First the port must be bound to the host managed by the l2 agent and second, the agent must have completed configuring the port on ovs. This involves assigning the port a local VLAN for tenant isolation, applying security group rules if required and applying QoS policies or other agent extensions like service function chaining. During the boot process, we bind the port first to the host then plug the interface into ovs which triggers the l2 agent to configure it resulting in the emission of the vif-plugged event. In the revert case, as noted above, since the vif is already plugged on the source node when hybrid-plug is used, binding the port to the source node fulfils the second condition to send the vif-plugged event. Events sent immediately after port binding update are hereafter known as "bind-time" events. For ports that do not use OVS hybrid plug, Neutron will continue to send vif-plugged events only when Nova actually plugs the VIF. These types of events are hereafter known as "plug-time" events. OVS hybrid plug is a per agent setting, so for a particular host, bind-time events are an all-or-nothing thing for the ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports have them. In general, a host will only have one network backend. The only exception to this is SR-IOV. SR-IOV is commonly deployed on the same host as other network backends such as OVS or linuxbridge. SR-IOV ports with VNIC_TYPE=direct-physical will always have only bind-time events. If an instance mixes OVS ports with hybrid-plug=False with direct physical ports, it will have both kinds of events. This patch adds functions to the NetworkInfo model that return what kinds of events each VIF has. These are then used in the migration revert logic to decide when to wait for external events: in the compute manager, when binding the port, for bind-time events, and/or in libvirt, when plugging the VIFs, for plug-time events. Closes-bug: 1832028 Co-Authored-By: Sean Mooney [email protected] Change-Id: I51cdcae67be8c68a55bc939de4ea0aba2361dcc4
1 parent a628d2f commit 19f9b37

File tree

7 files changed

+240
-34
lines changed

7 files changed

+240
-34
lines changed

nova/compute/manager.py

Lines changed: 49 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4164,6 +4164,50 @@ def revert_resize(self, context, instance, migration):
41644164
self.compute_rpcapi.finish_revert_resize(context, instance,
41654165
migration, migration.source_compute)
41664166

4167+
def _finish_revert_resize_network_migrate_finish(self, context, instance,
4168+
migration):
4169+
"""Causes port binding to be updated. In some Neutron or port
4170+
configurations - see NetworkModel.get_bind_time_events() - we
4171+
expect the vif-plugged event from Neutron immediately and wait for it.
4172+
The rest of the time, the event is expected further along in the
4173+
virt driver, so we don't wait here.
4174+
4175+
:param context: The request context.
4176+
:param instance: The instance undergoing the revert resize.
4177+
:param migration: The Migration object of the resize being reverted.
4178+
:raises: eventlet.timeout.Timeout or
4179+
exception.VirtualInterfacePlugException.
4180+
"""
4181+
network_info = instance.get_network_info()
4182+
events = []
4183+
deadline = CONF.vif_plugging_timeout
4184+
if deadline and utils.is_neutron() and network_info:
4185+
events = network_info.get_bind_time_events()
4186+
if events:
4187+
LOG.debug('Will wait for bind-time events: %s', events,
4188+
instance=instance)
4189+
error_cb = self._neutron_failed_migration_callback
4190+
try:
4191+
with self.virtapi.wait_for_instance_event(instance, events,
4192+
deadline=deadline,
4193+
error_callback=error_cb):
4194+
# NOTE(hanrong): we need to change migration.dest_compute to
4195+
# source host temporarily.
4196+
# "network_api.migrate_instance_finish" will setup the network
4197+
# for the instance on the destination host. For revert resize,
4198+
# the instance will back to the source host, the setup of the
4199+
# network for instance should be on the source host. So set the
4200+
# migration.dest_compute to source host at here.
4201+
with utils.temporary_mutation(
4202+
migration, dest_compute=migration.source_compute):
4203+
self.network_api.migrate_instance_finish(context,
4204+
instance,
4205+
migration)
4206+
except eventlet.timeout.Timeout:
4207+
with excutils.save_and_reraise_exception():
4208+
LOG.error('Timeout waiting for Neutron events: %s', events,
4209+
instance=instance)
4210+
41674211
@wrap_exception()
41684212
@reverts_task_state
41694213
@wrap_instance_event(prefix='compute')
@@ -4211,17 +4255,8 @@ def finish_revert_resize(self, context, instance, migration):
42114255

42124256
self.network_api.setup_networks_on_host(context, instance,
42134257
migration.source_compute)
4214-
# NOTE(hanrong): we need to change migration.dest_compute to
4215-
# source host temporarily. "network_api.migrate_instance_finish"
4216-
# will setup the network for the instance on the destination host.
4217-
# For revert resize, the instance will back to the source host, the
4218-
# setup of the network for instance should be on the source host.
4219-
# So set the migration.dest_compute to source host at here.
4220-
with utils.temporary_mutation(
4221-
migration, dest_compute=migration.source_compute):
4222-
self.network_api.migrate_instance_finish(context,
4223-
instance,
4224-
migration)
4258+
self._finish_revert_resize_network_migrate_finish(
4259+
context, instance, migration)
42254260
network_info = self.network_api.get_instance_nw_info(context,
42264261
instance)
42274262

@@ -6439,8 +6474,8 @@ def pre_live_migration(self, context, instance, block_migration, disk,
64396474
return migrate_data
64406475

64416476
@staticmethod
6442-
def _neutron_failed_live_migration_callback(event_name, instance):
6443-
msg = ('Neutron reported failure during live migration '
6477+
def _neutron_failed_migration_callback(event_name, instance):
6478+
msg = ('Neutron reported failure during migration '
64446479
'with %(event)s for instance %(uuid)s')
64456480
msg_args = {'event': event_name, 'uuid': instance.uuid}
64466481
if CONF.vif_plugging_is_fatal:
@@ -6518,7 +6553,7 @@ class _BreakWaitForInstanceEvent(Exception):
65186553
disk = None
65196554

65206555
deadline = CONF.vif_plugging_timeout
6521-
error_cb = self._neutron_failed_live_migration_callback
6556+
error_cb = self._neutron_failed_migration_callback
65226557
# In order to avoid a race with the vif plugging that the virt
65236558
# driver does on the destination host, we register our events
65246559
# to wait for before calling pre_live_migration. Then if the

nova/network/model.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,16 @@ def labeled_ips(self):
458458
'ips': ips}
459459
return []
460460

461+
def has_bind_time_event(self):
462+
"""When updating the port binding to a host that already has the
463+
instance in a shutoff state (in practice, this currently means
464+
reverting a resize or cold migration), the following Neutron/port
465+
configurations cause network-vif-plugged events to be sent as soon as
466+
the binding is updated:
467+
- OVS with hybrid plug
468+
"""
469+
return self.is_hybrid_plug_enabled()
470+
461471
def is_hybrid_plug_enabled(self):
462472
return self['details'].get(VIF_DETAILS_OVS_HYBRID_PLUG, False)
463473

@@ -515,6 +525,24 @@ def wait(self, do_raise=True):
515525
def json(self):
516526
return jsonutils.dumps(self)
517527

528+
def get_bind_time_events(self):
529+
"""When updating the port binding to a host that already has the
530+
instance in a shutoff state (in practice, this currently means
531+
reverting a resize or cold migration), return external events that are
532+
sent as soon as the binding is updated.
533+
"""
534+
return [('network-vif-plugged', vif['id'])
535+
for vif in self if vif.has_bind_time_event()]
536+
537+
def get_plug_time_events(self):
538+
"""When updating the port binding to a host that already has the
539+
instance in a shutoff state (in practice, this currently means
540+
reverting a resize or cold migration), return external events that are
541+
sent when the VIF is plugged.
542+
"""
543+
return [('network-vif-plugged', vif['id'])
544+
for vif in self if not vif.has_bind_time_event()]
545+
518546

519547
class NetworkInfoAsyncWrapper(NetworkInfo):
520548
"""Wrapper around NetworkInfo that allows retrieving NetworkInfo

nova/tests/unit/compute/test_compute.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5891,7 +5891,9 @@ def fake_finish_revert_migration_driver(*args, **kwargs):
58915891
old_vm_state = vm_states.ACTIVE
58925892
else:
58935893
old_vm_state = vm_states.STOPPED
5894-
params = {'vm_state': old_vm_state}
5894+
params = {'vm_state': old_vm_state,
5895+
'info_cache': objects.InstanceInfoCache(
5896+
network_info=network_model.NetworkInfo([]))}
58955897
instance = self._create_fake_instance_obj(params)
58965898

58975899
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
@@ -6041,7 +6043,9 @@ def test_finish_revert_resize_validate_source_compute(self):
60416043
def fake(*args, **kwargs):
60426044
pass
60436045

6044-
instance = self._create_fake_instance_obj()
6046+
params = {'info_cache': objects.InstanceInfoCache(
6047+
network_info=network_model.NetworkInfo([]))}
6048+
instance = self._create_fake_instance_obj(params)
60456049

60466050
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
60476051
self.stub_out('nova.virt.fake.FakeDriver.finish_revert_migration',

nova/tests/unit/compute/test_compute_mgr.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5054,6 +5054,81 @@ def test_notify_volume_usage_detach_no_block_stats(self):
50545054
self.context, fake_instance, fake_bdm)
50555055
block_stats.assert_called_once_with(fake_instance, 'vda')
50565056

5057+
def _test_finish_revert_resize_network_migrate_finish(self, vifs, events):
5058+
instance = fake_instance.fake_instance_obj(self.context)
5059+
instance.info_cache = objects.InstanceInfoCache(
5060+
network_info=network_model.NetworkInfo(vifs))
5061+
migration = objects.Migration(
5062+
source_compute='fake-source',
5063+
dest_compute='fake-dest')
5064+
5065+
def fake_migrate_instance_finish(context, instance, migration):
5066+
self.assertEqual(migration.source_compute, 'fake-source')
5067+
# NOTE(artom) This looks weird, but it's checking that the
5068+
# temporaty_mutation() context manager did its job.
5069+
self.assertEqual(migration.dest_compute, 'fake-source')
5070+
5071+
with test.nested(
5072+
mock.patch.object(self.compute.virtapi,
5073+
'wait_for_instance_event'),
5074+
mock.patch.object(self.compute.network_api,
5075+
'migrate_instance_finish',
5076+
side_effect=fake_migrate_instance_finish)
5077+
) as (mock_wait, mock_migrate_instance_finish):
5078+
self.compute._finish_revert_resize_network_migrate_finish(
5079+
self.context, instance, migration)
5080+
mock_wait.assert_called_once_with(
5081+
instance, events, deadline=CONF.vif_plugging_timeout,
5082+
error_callback=self.compute._neutron_failed_migration_callback)
5083+
mock_migrate_instance_finish.assert_called_once_with(
5084+
self.context, instance, migration)
5085+
5086+
def test_finish_revert_resize_network_migrate_finish_wait(self):
5087+
"""Test that we wait for bind-time events if we have a hybrid-plugged
5088+
VIF.
5089+
"""
5090+
self._test_finish_revert_resize_network_migrate_finish(
5091+
[network_model.VIF(id=uuids.hybrid_vif,
5092+
details={'ovs_hybrid_plug': True}),
5093+
network_model.VIF(id=uuids.normal_vif,
5094+
details={'ovs_hybrid_plug': False})],
5095+
[('network-vif-plugged', uuids.hybrid_vif)])
5096+
5097+
def test_finish_revert_resize_network_migrate_finish_dont_wait(self):
5098+
"""Test that we're not waiting for any events if we don't have any
5099+
hybrid-plugged VIFs.
5100+
"""
5101+
self._test_finish_revert_resize_network_migrate_finish(
5102+
[network_model.VIF(id=uuids.hybrid_vif,
5103+
details={'ovs_hybrid_plug': False}),
5104+
network_model.VIF(id=uuids.normal_vif,
5105+
details={'ovs_hybrid_plug': False})],
5106+
[])
5107+
5108+
def test_finish_revert_resize_network_migrate_finish_no_vif_timeout(self):
5109+
"""Test that we're not waiting for any events if vif_plugging_timeout
5110+
is 0.
5111+
"""
5112+
self.flags(vif_plugging_timeout=0)
5113+
self._test_finish_revert_resize_network_migrate_finish(
5114+
[network_model.VIF(id=uuids.hybrid_vif,
5115+
details={'ovs_hybrid_plug': True}),
5116+
network_model.VIF(id=uuids.normal_vif,
5117+
details={'ovs_hybrid_plug': True})],
5118+
[])
5119+
5120+
@mock.patch.object(utils, 'is_neutron', return_value=False)
5121+
def test_finish_revert_resize_network_migrate_finish_not_neutron(self, _):
5122+
"""Test that we're not waiting for any events if we're not using
5123+
Neutron.
5124+
"""
5125+
self._test_finish_revert_resize_network_migrate_finish(
5126+
[network_model.VIF(id=uuids.hybrid_vif,
5127+
details={'ovs_hybrid_plug': True}),
5128+
network_model.VIF(id=uuids.normal_vif,
5129+
details={'ovs_hybrid_plug': True})],
5130+
[])
5131+
50575132

50585133
class ComputeManagerBuildInstanceTestCase(test.NoDBTestCase):
50595134
def setUp(self):

nova/tests/unit/network/test_network_info.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# under the License.
1616

1717
from oslo_config import cfg
18+
from oslo_utils.fixture import uuidsentinel as uuids
1819

1920
from nova import exception
2021
from nova.network import model
@@ -857,6 +858,21 @@ def test_injection_ipv6_with_lxc_no_gateway(self):
857858
libvirt_virt_type='lxc')
858859
self.assertEqual(expected, template)
859860

861+
def test_get_events(self):
862+
network_info = model.NetworkInfo([
863+
model.VIF(
864+
id=uuids.hybrid_vif,
865+
details={'ovs_hybrid_plug': True}),
866+
model.VIF(
867+
id=uuids.normal_vif,
868+
details={'ovs_hybrid_plug': False})])
869+
self.assertEqual(
870+
[('network-vif-plugged', uuids.hybrid_vif)],
871+
network_info.get_bind_time_events())
872+
self.assertEqual(
873+
[('network-vif-plugged', uuids.normal_vif)],
874+
network_info.get_plug_time_events())
875+
860876

861877
class TestNetworkMetadata(test.NoDBTestCase):
862878
def setUp(self):

0 commit comments

Comments
 (0)