Skip to content

Commit b12f7eb

Browse files
melwittgibizer
authored andcommitted
Retry in CellDatabases fixture when global DB state changes
There is a NOTE in the CellDatabases code about an unlikely but possible race that can occur between taking the writer lock to set the last DB context manager and taking the reader lock to call target_cell(). When the race is detected, a RuntimeError is raised. We can handle the race by retrying setting the last DB context manager when the race is detected, as described in the NOTE. Closes-Bug: #1959677 Change-Id: I5c0607ce5910dce581ab9360cc7fc69ba9673f35 (cherry picked from commit 1c8122a) (cherry picked from commit 8756688)
1 parent 568769e commit b12f7eb

File tree

1 file changed

+46
-20
lines changed

1 file changed

+46
-20
lines changed

nova/tests/fixtures/nova.py

Lines changed: 46 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import functools
2222
import logging as std_logging
2323
import os
24+
import time
2425
import warnings
2526

2627
import eventlet
@@ -431,6 +432,13 @@ def _wrap_target_cell(self, context, cell_mapping):
431432
# yield to do the actual work. We can do schedulable things
432433
# here and not exclude other threads from making progress.
433434
# If an exception is raised, we capture that and save it.
435+
# Note that it is possible that another thread has changed the
436+
# global state (step #2) after we released the writer lock but
437+
# before we acquired the reader lock. If this happens, we will
438+
# detect the global state change and retry step #2 a limited number
439+
# of times. If we happen to race repeatedly with another thread and
440+
# exceed our retry limit, we will give up and raise a RuntimeError,
441+
# which will fail the test.
434442
# 4. If we changed state in #2, we need to change it back. So we grab
435443
# a writer lock again and do that.
436444
# 5. Finally, if an exception was raised in #3 while state was
@@ -449,29 +457,47 @@ def _wrap_target_cell(self, context, cell_mapping):
449457

450458
raised_exc = None
451459

452-
with self._cell_lock.write_lock():
453-
if cell_mapping is not None:
454-
# This assumes the next local DB access is the same cell that
455-
# was targeted last time.
456-
self._last_ctxt_mgr = desired
460+
def set_last_ctxt_mgr():
461+
with self._cell_lock.write_lock():
462+
if cell_mapping is not None:
463+
# This assumes the next local DB access is the same cell
464+
# that was targeted last time.
465+
self._last_ctxt_mgr = desired
457466

458-
with self._cell_lock.read_lock():
459-
if self._last_ctxt_mgr != desired:
460-
# NOTE(danms): This is unlikely to happen, but it's possible
461-
# another waiting writer changed the state between us letting
462-
# it go and re-acquiring as a reader. If lockutils supported
463-
# upgrading and downgrading locks, this wouldn't be a problem.
464-
# Regardless, assert that it is still as we left it here
465-
# so we don't hit the wrong cell. If this becomes a problem,
466-
# we just need to retry the write section above until we land
467-
# here with the cell we want.
468-
raise RuntimeError('Global DB state changed underneath us')
467+
# Set last context manager to the desired cell's context manager.
468+
set_last_ctxt_mgr()
469469

470+
# Retry setting the last context manager if we detect that a writer
471+
# changed global DB state before we take the read lock.
472+
for retry_time in range(0, 3):
470473
try:
471-
with self._real_target_cell(context, cell_mapping) as ccontext:
472-
yield ccontext
473-
except Exception as exc:
474-
raised_exc = exc
474+
with self._cell_lock.read_lock():
475+
if self._last_ctxt_mgr != desired:
476+
# NOTE(danms): This is unlikely to happen, but it's
477+
# possible another waiting writer changed the state
478+
# between us letting it go and re-acquiring as a
479+
# reader. If lockutils supported upgrading and
480+
# downgrading locks, this wouldn't be a problem.
481+
# Regardless, assert that it is still as we left it
482+
# here so we don't hit the wrong cell. If this becomes
483+
# a problem, we just need to retry the write section
484+
# above until we land here with the cell we want.
485+
raise RuntimeError(
486+
'Global DB state changed underneath us')
487+
try:
488+
with self._real_target_cell(
489+
context, cell_mapping
490+
) as ccontext:
491+
yield ccontext
492+
except Exception as exc:
493+
raised_exc = exc
494+
# Leave the retry loop after calling target_cell
495+
break
496+
except RuntimeError:
497+
# Give other threads a chance to make progress, increasing the
498+
# wait time between attempts.
499+
time.sleep(retry_time)
500+
set_last_ctxt_mgr()
475501

476502
with self._cell_lock.write_lock():
477503
# Once we have returned from the context, we need

0 commit comments

Comments
 (0)