Skip to content

Commit 7a7a223

Browse files
notartomSeanMooney
andcommitted
Revert resize: wait for events according to hybrid plug
Since 4817165, when reverting a resized instance back to the source host, the libvirt driver waits for vif-plugged events when spawning the instance. When called from finish_revert_resize() in the source compute manager, libvirt's finish_revert_migration() does not pass vifs_already_plugged to _create_domain_and_network(), making the latter use the default False value. When the source compute manager calls network_api.migrate_instance_finish() in finish_revert_resize(), this updates the port binding back to the source host. If Neutron is configured to use OVS hybrid plug, it will send the vif-plugged event immediately after completing this request. This happens before the virt driver's finish_revert_migration() method is called. This causes the wait in the libvirt driver to time out because the event is received before Nova starts waiting for it. The neutron ovs l2 agent sends vif-plugged events when two conditions are met. First the port must be bound to the host managed by the l2 agent and second, the agent must have completed configuring the port on ovs. This involves assigning the port a local VLAN for tenant isolation, applying security group rules if required and applying QoS policies or other agent extensions like service function chaining. During the boot process, we bind the port first to the host then plug the interface into ovs which triggers the l2 agent to configure it resulting in the emission of the vif-plugged event. In the revert case, as noted above, since the vif is already plugged on the source node when hybrid-plug is used, binding the port to the source node fulfils the second condition to send the vif-plugged event. Events sent immediately after port binding update are hereafter known as "bind-time" events. For ports that do not use OVS hybrid plug, Neutron will continue to send vif-plugged events only when Nova actually plugs the VIF. These types of events are hereafter known as "plug-time" events. OVS hybrid plug is a per agent setting, so for a particular host, bind-time events are an all-or-nothing thing for the ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports have them. In general, a host will only have one network backend. The only exception to this is SR-IOV. SR-IOV is commonly deployed on the same host as other network backends such as OVS or linuxbridge. SR-IOV ports with VNIC_TYPE=direct-physical will always have only bind-time events. If an instance mixes OVS ports with hybrid-plug=False with direct physical ports, it will have both kinds of events. For same host resize reverts we do not update the binding host as the host does not change, as such for same host resize we do not receive bind time events. For same host revert we therefore do not wait for bind time events in the compute manager. This patch adds functions to the NetworkInfo model that return what kinds of events each VIF has. These are then used in the migration revert logic to decide when to wait for external events: in the compute manager, when binding the port, for bind-time events, and/or in libvirt, when plugging the VIFs, for plug-time events. Closes-bug: #1832028 Closes-Bug: #1833902 Co-Authored-By: Sean Mooney <[email protected]> Change-Id: I51673e58fc8d5f051df911630f6d7a928d123a5b
1 parent 2722cab commit 7a7a223

File tree

9 files changed

+351
-44
lines changed

9 files changed

+351
-44
lines changed

nova/compute/manager.py

Lines changed: 48 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4208,6 +4208,49 @@ def revert_resize(self, context, instance, migration):
42084208
self.compute_rpcapi.finish_revert_resize(context, instance,
42094209
migration, migration.source_compute)
42104210

4211+
def _finish_revert_resize_network_migrate_finish(self, context, instance,
4212+
migration):
4213+
"""Causes port binding to be updated. In some Neutron or port
4214+
configurations - see NetworkModel.get_bind_time_events() - we
4215+
expect the vif-plugged event from Neutron immediately and wait for it.
4216+
The rest of the time, the event is expected further along in the
4217+
virt driver, so we don't wait here.
4218+
4219+
:param context: The request context.
4220+
:param instance: The instance undergoing the revert resize.
4221+
:param migration: The Migration object of the resize being reverted.
4222+
:raises: eventlet.timeout.Timeout or
4223+
exception.VirtualInterfacePlugException.
4224+
"""
4225+
network_info = instance.get_network_info()
4226+
events = []
4227+
deadline = CONF.vif_plugging_timeout
4228+
if deadline and utils.is_neutron() and network_info:
4229+
events = network_info.get_bind_time_events(migration)
4230+
if events:
4231+
LOG.debug('Will wait for bind-time events: %s', events)
4232+
error_cb = self._neutron_failed_migration_callback
4233+
try:
4234+
with self.virtapi.wait_for_instance_event(instance, events,
4235+
deadline=deadline,
4236+
error_callback=error_cb):
4237+
# NOTE(hanrong): we need to change migration.dest_compute to
4238+
# source host temporarily.
4239+
# "network_api.migrate_instance_finish" will setup the network
4240+
# for the instance on the destination host. For revert resize,
4241+
# the instance will back to the source host, the setup of the
4242+
# network for instance should be on the source host. So set
4243+
# the migration.dest_compute to source host at here.
4244+
with utils.temporary_mutation(
4245+
migration, dest_compute=migration.source_compute):
4246+
self.network_api.migrate_instance_finish(context,
4247+
instance,
4248+
migration)
4249+
except eventlet.timeout.Timeout:
4250+
with excutils.save_and_reraise_exception():
4251+
LOG.error('Timeout waiting for Neutron events: %s', events,
4252+
instance=instance)
4253+
42114254
@wrap_exception()
42124255
@reverts_task_state
42134256
@wrap_instance_event(prefix='compute')
@@ -4255,17 +4298,8 @@ def finish_revert_resize(self, context, instance, migration):
42554298

42564299
self.network_api.setup_networks_on_host(context, instance,
42574300
migration.source_compute)
4258-
# NOTE(hanrong): we need to change migration.dest_compute to
4259-
# source host temporarily. "network_api.migrate_instance_finish"
4260-
# will setup the network for the instance on the destination host.
4261-
# For revert resize, the instance will back to the source host, the
4262-
# setup of the network for instance should be on the source host.
4263-
# So set the migration.dest_compute to source host at here.
4264-
with utils.temporary_mutation(
4265-
migration, dest_compute=migration.source_compute):
4266-
self.network_api.migrate_instance_finish(context,
4267-
instance,
4268-
migration)
4301+
self._finish_revert_resize_network_migrate_finish(
4302+
context, instance, migration)
42694303
network_info = self.network_api.get_instance_nw_info(context,
42704304
instance)
42714305

@@ -6570,8 +6604,8 @@ def pre_live_migration(self, context, instance, block_migration, disk,
65706604
return migrate_data
65716605

65726606
@staticmethod
6573-
def _neutron_failed_live_migration_callback(event_name, instance):
6574-
msg = ('Neutron reported failure during live migration '
6607+
def _neutron_failed_migration_callback(event_name, instance):
6608+
msg = ('Neutron reported failure during migration '
65756609
'with %(event)s for instance %(uuid)s')
65766610
msg_args = {'event': event_name, 'uuid': instance.uuid}
65776611
if CONF.vif_plugging_is_fatal:
@@ -6649,7 +6683,7 @@ class _BreakWaitForInstanceEvent(Exception):
66496683
disk = None
66506684

66516685
deadline = CONF.vif_plugging_timeout
6652-
error_cb = self._neutron_failed_live_migration_callback
6686+
error_cb = self._neutron_failed_migration_callback
66536687
# In order to avoid a race with the vif plugging that the virt
66546688
# driver does on the destination host, we register our events
66556689
# to wait for before calling pre_live_migration. Then if the

nova/network/model.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,17 @@ def labeled_ips(self):
458458
'ips': ips}
459459
return []
460460

461+
def has_bind_time_event(self, migration):
462+
"""Returns whether this VIF's network-vif-plugged external event will
463+
be sent by Neutron at "bind-time" - in other words, as soon as the port
464+
binding is updated. This is in the context of updating the port binding
465+
to a host that already has the instance in a shutoff state - in
466+
practice, this means reverting either a cold migration or a
467+
non-same-host resize.
468+
"""
469+
return (self.is_hybrid_plug_enabled() and not
470+
migration.is_same_host())
471+
461472
def is_hybrid_plug_enabled(self):
462473
return self['details'].get(VIF_DETAILS_OVS_HYBRID_PLUG, False)
463474

@@ -515,6 +526,20 @@ def wait(self, do_raise=True):
515526
def json(self):
516527
return jsonutils.dumps(self)
517528

529+
def get_bind_time_events(self, migration):
530+
"""Returns whether any of our VIFs have "bind-time" events. See
531+
has_bind_time_event() docstring for more details.
532+
"""
533+
return [('network-vif-plugged', vif['id'])
534+
for vif in self if vif.has_bind_time_event(migration)]
535+
536+
def get_plug_time_events(self, migration):
537+
"""Complementary to get_bind_time_events(), any event that does not
538+
fall in that category is a plug-time event.
539+
"""
540+
return [('network-vif-plugged', vif['id'])
541+
for vif in self if not vif.has_bind_time_event(migration)]
542+
518543

519544
class NetworkInfoAsyncWrapper(NetworkInfo):
520545
"""Wrapper around NetworkInfo that allows retrieving NetworkInfo

nova/objects/migration.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,9 @@ def instance(self):
185185
def instance(self, instance):
186186
self._cached_instance = instance
187187

188+
def is_same_host(self):
189+
return self.source_compute == self.dest_compute
190+
188191

189192
@base.NovaObjectRegistry.register
190193
class MigrationList(base.ObjectListBase, base.NovaObject):

nova/tests/unit/compute/test_compute.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5892,7 +5892,9 @@ def fake_finish_revert_migration_driver(*args, **kwargs):
58925892
old_vm_state = vm_states.ACTIVE
58935893
else:
58945894
old_vm_state = vm_states.STOPPED
5895-
params = {'vm_state': old_vm_state}
5895+
params = {'vm_state': old_vm_state,
5896+
'info_cache': objects.InstanceInfoCache(
5897+
network_info=network_model.NetworkInfo([]))}
58965898
instance = self._create_fake_instance_obj(params)
58975899

58985900
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
@@ -6042,7 +6044,9 @@ def test_finish_revert_resize_validate_source_compute(self):
60426044
def fake(*args, **kwargs):
60436045
pass
60446046

6045-
instance = self._create_fake_instance_obj()
6047+
params = {'info_cache': objects.InstanceInfoCache(
6048+
network_info=network_model.NetworkInfo([]))}
6049+
instance = self._create_fake_instance_obj(params)
60466050

60476051
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
60486052
self.stub_out('nova.virt.fake.FakeDriver.finish_revert_migration',

nova/tests/unit/compute/test_compute_mgr.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5070,6 +5070,97 @@ def test_notify_volume_usage_detach_no_block_stats(self):
50705070
self.context, fake_instance, fake_bdm)
50715071
block_stats.assert_called_once_with(fake_instance, 'vda')
50725072

5073+
def _test_finish_revert_resize_network_migrate_finish(
5074+
self, vifs, events, migration=None):
5075+
instance = fake_instance.fake_instance_obj(self.context)
5076+
instance.info_cache = objects.InstanceInfoCache(
5077+
network_info=network_model.NetworkInfo(vifs))
5078+
if migration is None:
5079+
migration = objects.Migration(
5080+
source_compute='fake-source',
5081+
dest_compute='fake-dest')
5082+
5083+
def fake_migrate_instance_finish(context, instance, migration):
5084+
# NOTE(artom) This looks weird, but it's checking that the
5085+
# temporaty_mutation() context manager did its job.
5086+
self.assertEqual(migration.dest_compute, migration.source_compute)
5087+
5088+
with test.nested(
5089+
mock.patch.object(self.compute.virtapi,
5090+
'wait_for_instance_event'),
5091+
mock.patch.object(self.compute.network_api,
5092+
'migrate_instance_finish',
5093+
side_effect=fake_migrate_instance_finish)
5094+
) as (mock_wait, mock_migrate_instance_finish):
5095+
self.compute._finish_revert_resize_network_migrate_finish(
5096+
self.context, instance, migration)
5097+
mock_wait.assert_called_once_with(
5098+
instance, events, deadline=CONF.vif_plugging_timeout,
5099+
error_callback=self.compute._neutron_failed_migration_callback)
5100+
mock_migrate_instance_finish.assert_called_once_with(
5101+
self.context, instance, migration)
5102+
5103+
def test_finish_revert_resize_network_migrate_finish_wait(self):
5104+
"""Test that we wait for bind-time events if we have a hybrid-plugged
5105+
VIF.
5106+
"""
5107+
self._test_finish_revert_resize_network_migrate_finish(
5108+
[network_model.VIF(id=uuids.hybrid_vif,
5109+
details={'ovs_hybrid_plug': True}),
5110+
network_model.VIF(id=uuids.normal_vif,
5111+
details={'ovs_hybrid_plug': False})],
5112+
[('network-vif-plugged', uuids.hybrid_vif)])
5113+
5114+
def test_finish_revert_resize_network_migrate_finish_same_host(self):
5115+
"""Test that we're not waiting for any events if its a same host
5116+
resize revert.
5117+
"""
5118+
migration = objects.Migration(
5119+
source_compute='fake-source', dest_compute='fake-source')
5120+
5121+
self._test_finish_revert_resize_network_migrate_finish(
5122+
[network_model.VIF(id=uuids.hybrid_vif,
5123+
details={'ovs_hybrid_plug': True}),
5124+
network_model.VIF(id=uuids.normal_vif,
5125+
details={'ovs_hybrid_plug': False})],
5126+
[], migration=migration
5127+
)
5128+
5129+
def test_finish_revert_resize_network_migrate_finish_dont_wait(self):
5130+
"""Test that we're not waiting for any events if we don't have any
5131+
hybrid-plugged VIFs.
5132+
"""
5133+
self._test_finish_revert_resize_network_migrate_finish(
5134+
[network_model.VIF(id=uuids.hybrid_vif,
5135+
details={'ovs_hybrid_plug': False}),
5136+
network_model.VIF(id=uuids.normal_vif,
5137+
details={'ovs_hybrid_plug': False})],
5138+
[])
5139+
5140+
def test_finish_revert_resize_network_migrate_finish_no_vif_timeout(self):
5141+
"""Test that we're not waiting for any events if vif_plugging_timeout
5142+
is 0.
5143+
"""
5144+
self.flags(vif_plugging_timeout=0)
5145+
self._test_finish_revert_resize_network_migrate_finish(
5146+
[network_model.VIF(id=uuids.hybrid_vif,
5147+
details={'ovs_hybrid_plug': True}),
5148+
network_model.VIF(id=uuids.normal_vif,
5149+
details={'ovs_hybrid_plug': True})],
5150+
[])
5151+
5152+
@mock.patch.object(utils, 'is_neutron', return_value=False)
5153+
def test_finish_revert_resize_network_migrate_finish_not_neutron(self, _):
5154+
"""Test that we're not waiting for any events if we're not using
5155+
Neutron.
5156+
"""
5157+
self._test_finish_revert_resize_network_migrate_finish(
5158+
[network_model.VIF(id=uuids.hybrid_vif,
5159+
details={'ovs_hybrid_plug': True}),
5160+
network_model.VIF(id=uuids.normal_vif,
5161+
details={'ovs_hybrid_plug': True})],
5162+
[])
5163+
50735164

50745165
class ComputeManagerBuildInstanceTestCase(test.NoDBTestCase):
50755166
def setUp(self):

nova/tests/unit/network/test_network_info.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@
1515
# under the License.
1616

1717
from oslo_config import cfg
18+
from oslo_utils.fixture import uuidsentinel as uuids
1819

1920
from nova import exception
2021
from nova.network import model
22+
from nova import objects
2123
from nova import test
2224
from nova.tests.unit import fake_network_cache_model
2325
from nova.virt import netutils
@@ -857,6 +859,34 @@ def test_injection_ipv6_with_lxc_no_gateway(self):
857859
libvirt_virt_type='lxc')
858860
self.assertEqual(expected, template)
859861

862+
def test_get_events(self):
863+
network_info = model.NetworkInfo([
864+
model.VIF(
865+
id=uuids.hybrid_vif,
866+
details={'ovs_hybrid_plug': True}),
867+
model.VIF(
868+
id=uuids.normal_vif,
869+
details={'ovs_hybrid_plug': False})])
870+
same_host = objects.Migration(source_compute='fake-host',
871+
dest_compute='fake-host')
872+
diff_host = objects.Migration(source_compute='fake-host1',
873+
dest_compute='fake-host2')
874+
# Same-host migrations will have all events be plug-time.
875+
self.assertItemsEqual(
876+
[('network-vif-plugged', uuids.normal_vif),
877+
('network-vif-plugged', uuids.hybrid_vif)],
878+
network_info.get_plug_time_events(same_host))
879+
# Same host migration will have no plug-time events.
880+
self.assertEqual([], network_info.get_bind_time_events(same_host))
881+
# Diff-host migration + OVS hybrid plug = bind-time events
882+
self.assertEqual(
883+
[('network-vif-plugged', uuids.hybrid_vif)],
884+
network_info.get_bind_time_events(diff_host))
885+
# Diff-host migration + normal OVS = plug-time events
886+
self.assertEqual(
887+
[('network-vif-plugged', uuids.normal_vif)],
888+
network_info.get_plug_time_events(diff_host))
889+
860890

861891
class TestNetworkMetadata(test.NoDBTestCase):
862892
def setUp(self):

nova/tests/unit/objects/test_migration.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,14 @@ def test_get_by_uuid(self, mock_db_get):
317317
mig = objects.Migration.get_by_uuid(self.context, uuidsentinel.mig)
318318
self.assertEqual(uuidsentinel.mig, mig.uuid)
319319

320+
def test_is_same_host(self):
321+
same_host = objects.Migration(source_compute='fake-host',
322+
dest_compute='fake-host')
323+
diff_host = objects.Migration(source_compute='fake-host1',
324+
dest_compute='fake-host2')
325+
self.assertTrue(same_host.is_same_host())
326+
self.assertFalse(diff_host.is_same_host())
327+
320328

321329
class TestMigrationObject(test_objects._LocalTest,
322330
_TestMigrationObject):

0 commit comments

Comments
 (0)