Skip to content

Commit 8c51a3a

Browse files
Balazs Gibizermarkgoddard
authored andcommitted
Enhance service restart in functional env
Bugfix Icaf1bae8cb040b939f916a19ce026031ddb84af7 showed that restarting a compute service in the functional env is unrealistic causing faults to slip through. During that bug fix only the minimal change was done in the functional env regarding compute service restart to reproduce the reported fault. However the restart of the compute service could be made even more realistic. This patch simulates a compute service restart in the functional env by stopping the original compute service and starting a totally new compute service for the same host and node. This way we can make sure that we get a brand new ComputeManager in the new service and no state can leak between the old and the new service. This change revealed another shortcoming of the functional env. In the real world the nova-compute service could be restarted without loosing any running servers on the compute host. But with the naive implementation of this change the compute service is re-created. This means that a new ComputeManager is instantiated that loads a new FakeDriver instance as well. That new FakeDriver instance then reports an empty hypervisor. This behavior is not totally unrealistic as it simulates such a compute host restart that cleans the hypervisor state as well (e.g. compute host redeployment). However this type of restart shows another bug in the code path that destroys and deallocates evacuated instance from the source host. Therefore this patch implements the compute service restart in a way that simulates only a service restart and not a full compute restart. A subsequent patch will add a test that uses the clean hypervisor case to reproduces the revealed bug. Related-Bug: #1724172 Change-Id: I9d6cd6259659a35383c0c9c21db72a9434ba86b1
1 parent 45e1e87 commit 8c51a3a

File tree

3 files changed

+50
-17
lines changed

3 files changed

+50
-17
lines changed

nova/test.py

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -432,20 +432,29 @@ def start_service(self, name, host=None, **kwargs):
432432
ctxt = context.get_context()
433433
cell_name = kwargs.pop('cell', CELL1_NAME) or CELL1_NAME
434434
cell = self.cell_mappings[cell_name]
435-
hm = objects.HostMapping(context=ctxt,
436-
host=host or name,
437-
cell_mapping=cell)
438-
hm.create()
439-
self.host_mappings[hm.host] = hm
435+
if (host or name) not in self.host_mappings:
436+
# NOTE(gibi): If the HostMapping does not exists then this is
437+
# the first start of the service so we create the mapping.
438+
hm = objects.HostMapping(context=ctxt,
439+
host=host or name,
440+
cell_mapping=cell)
441+
hm.create()
442+
self.host_mappings[hm.host] = hm
440443
svc = self.useFixture(
441444
nova_fixtures.ServiceFixture(name, host, cell=cell, **kwargs))
442445

443446
return svc.service
444447

445-
def restart_compute_service(self, compute):
446-
"""Restart a compute service in a realistic way.
448+
def restart_compute_service(self, compute, keep_hypervisor_state=True):
449+
"""Stops the service and starts a new one to have realistic restart
447450
448451
:param:compute: the nova-compute service to be restarted
452+
:param:keep_hypervisor_state: If true then already defined instances
453+
will survive the compute service restart.
454+
If false then the new service will see
455+
an empty hypervisor
456+
:returns: a new compute service instance serving the same host and
457+
and node
449458
"""
450459

451460
# NOTE(gibi): The service interface cannot be used to simulate a real
@@ -455,12 +464,36 @@ def restart_compute_service(self, compute):
455464
# a stop start. The service.kill() call cannot help as it deletes
456465
# the service from the DB which is unrealistic and causes that some
457466
# operation that refers to the killed host (e.g. evacuate) fails.
458-
# So this helper method tries to simulate a better compute service
459-
# restart by cleaning up some of the internal state of the compute
460-
# manager.
467+
# So this helper method will stop the original service and then starts
468+
# a brand new compute service for the same host and node. This way
469+
# a new ComputeManager instance will be created and initialized during
470+
# the service startup.
461471
compute.stop()
462-
compute.manager._resource_tracker = None
463-
compute.start()
472+
473+
# this service was running previously so we have to make sure that
474+
# we restart it in the same cell
475+
cell_name = self.host_mappings[compute.host].cell_mapping.name
476+
477+
if keep_hypervisor_state:
478+
# NOTE(gibi): FakeDriver does not provide a meaningful way to
479+
# define some servers that exists already on the hypervisor when
480+
# the driver is (re)created during the service startup. This means
481+
# that we cannot simulate that the definition of a server
482+
# survives a nova-compute service restart on the hypervisor.
483+
# Instead here we save the FakeDriver instance that knows about
484+
# the defined servers and inject that driver into the new Manager
485+
# class during the startup of the compute service.
486+
old_driver = compute.manager.driver
487+
with mock.patch(
488+
'nova.virt.driver.load_compute_driver') as load_driver:
489+
load_driver.return_value = old_driver
490+
new_compute = self.start_service(
491+
'compute', host=compute.host, cell=cell_name)
492+
else:
493+
new_compute = self.start_service(
494+
'compute', host=compute.host, cell=cell_name)
495+
496+
return new_compute
464497

465498
def assertJsonEqual(self, expected, observed, message=''):
466499
"""Asserts that 2 complex data structures are json equivalent.

nova/tests/functional/regressions/test_bug_1794996.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,4 +176,4 @@ def test_evacuate_delete_server_restart_original_compute(self):
176176
self._delete_and_check_allocations(server)
177177

178178
# restart the source compute
179-
self.restart_compute_service(self.compute1)
179+
self.compute1 = self.restart_compute_service(self.compute1)

nova/tests/functional/test_servers.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2357,7 +2357,7 @@ def test_evacuate(self):
23572357
self.assertFlavorMatchesAllocation(self.flavor1, dest_allocation)
23582358

23592359
# restart the source compute
2360-
self.restart_compute_service(self.compute1)
2360+
self.compute1 = self.restart_compute_service(self.compute1)
23612361

23622362
self.admin_api.put_service(
23632363
source_compute_id, {'forced_down': 'false'})
@@ -2434,7 +2434,7 @@ def test_evacuate_forced_host(self):
24342434
self.assertFlavorMatchesAllocation(self.flavor1, dest_allocation)
24352435

24362436
# restart the source compute
2437-
self.restart_compute_service(self.compute1)
2437+
self.compute1 = self.restart_compute_service(self.compute1)
24382438
self.admin_api.put_service(
24392439
source_compute_id, {'forced_down': 'false'})
24402440

@@ -2520,7 +2520,7 @@ def fake_move_claim(*args, **kwargs):
25202520
self.assertFlavorMatchesAllocation(self.flavor1, source_allocation)
25212521

25222522
# restart the source compute
2523-
self.restart_compute_service(self.compute1)
2523+
self.compute1 = self.restart_compute_service(self.compute1)
25242524
self.admin_api.put_service(
25252525
source_compute_id, {'forced_down': 'false'})
25262526

@@ -2593,7 +2593,7 @@ def fake_rebuild(*args, **kwargs):
25932593
self.assertFlavorMatchesAllocation(self.flavor1, source_allocation)
25942594

25952595
# restart the source compute
2596-
self.restart_compute_service(self.compute1)
2596+
self.compute1 = self.restart_compute_service(self.compute1)
25972597
self.admin_api.put_service(
25982598
source_compute_id, {'forced_down': 'false'})
25992599

0 commit comments

Comments
 (0)