Skip to content

CDRIVER-4114: Reduce contention on the topology description #854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 148 commits into from
Nov 10, 2021

Conversation

vector-of-bool
Copy link
Contributor

@vector-of-bool vector-of-bool commented Aug 27, 2021

This is a fairly substantial changeset, and may require an equally substantial
explanation.

The main mutable state of a mongoc_topology_t is all contained within the
description, which contains the known state of the associated server topology.
Other pieces of state might also be updated in parallel, but they can be managed
through regular atomic operations.

The purpose of opening this as a draft is to get eyes on the work to ensure I
have a proper understanding of how the topology's state is meant to be
maintained, and that the correct understanding is reflected in the code.

This work builds upon changes in #840 (new atomics) and #847 (new session
pooling).

Important draft TODOs and notes

  • This is still a draft. It may have speeling ,and grammar error .
  • Several new internal APIs have abbreviated names, which is uncommon in the
    rest of the codebase. This was purely to save my hands the strain of typing
    verbose names all the time 🙂. Anything can be renamed before this PR is
    merged.
  • Several doc comments are now outdated.
  • Very few usages of topology::mutex have been removed yet. These will be
    pruned later on once the topology-update-queue is fully implemented.
    Done.
  • Various C99 usage throughout.

A summary of the changes

  • The introduction of a "shared pointer" type. This is analogous to a C++
    shared_ptr in design and behavior:
    • A shared_ptr is created with a pointer than was previously allocated, and a
      destructor+deallocator function.
    • The shared pointer maintains auxiliary data that tell it how many live
      references remain, and the destructor+deallocator.
    • Copying and destroying different shared pointers that refer to the same
      object is thread-safe.
    • Copying a shared pointer object that is also being updated is not
      thread-safe, and must be done using the _atomic() functions.
    • When the reference count of a shared pointer reaches zero, the
      destructor+deallocator function is immediately invoked with the managed
      pointer (this occurs in the thread that decremented the reference count to
      zero).
    • Shared pointer objects are layout-compatible with a regular pointer. This is
      used in union mc_shared_tpl_descr.
  • The mongoc_topology_t::description member has been replaced with a
    union mc_shared_tpl_descr _shared_descr_ member. Users that wish to access
    the description must do so by calling an API that will return them a shared
    pointer to the description, which the caller must then release when it is
    finished. Acquiring this shared pointer is "taking" the description, and
    releasing it is "dropping" the description.
  • In the current code, the macro MC_DECL_TD_TAKE (VarName, Topology) is used
    to simultaneous declare a define mc_shared_tpl_descr VarName as taking the
    shared description of Topology. MC_TD_DROP(VarName) should be used to
    drop that pointer. A shared pointer can be rebound using
    tdptr = mc_tpld_take_ref(topology)
  • The topology referred-to by _shared_descr_ is never updated in-place.
    Instead, a copy is made of the description, updates are applied to that copy,
    and _shared_descr_ is atomically swapped with the modified copy. This
    prevents someone attempting to take the shared description from seeing a torn
    value.
  • Finding out when to use mc_tpld_take_ref() and mc_tpld_drop_ref() is a current
    main difficulty of this changeset.
    • When mc_tpld_take_ref() are called, the
      instantaneous state of the topology is captured into the resulting variable,
      and is guaranteed to remain unchanged when accessed through that shared
      pointer instance. If another shared pointer is taken, then there is no
      guarantee that the new shared pointer refers to the same topology.
    • Referring to points that call bson_mutex_{un}lock is a good guide on where
      to take and drop these references. Within a locked section, it can be
      assumed that the code expects a single view of the topology from the
      beginning to the end.
    • Functions that remark to "expect the caller to hold the topology mutex"
      means that they intend to see a consistent view of the topology during their
      execution (or it means the function is intending to modify the topology,
      but that's discussed later).
      • Because these functions need to refer to the topology description it would
        almost be correct to just take a shared pointer instead of assuming the
        caller holds a mutex, except this is wrong: The function expects a
        consistent view of the topology before and after it is finished
        execution. A caller may hold a shared pointer td1, call one of these
        functions, which will then take a shared pointer td2. There is no
        guarantee that td1 and td2 point to the same topology description, so
        the callee and caller may be working with different descriptions, whereas
        with the mutex they would have been able to assume they are equivalent.
      • The solution is that functions that "expect the topology mutex to be held"
        need to instead be given a handle to the topology description state that
        the caller is expecting them to work on. These functions have had a
        mongoc_topology_description_t* parameter added, and each caller will
        pass-down the topology description that they are keeping alive using a
        shared pointer. These functions that "expect the topology mutex to be
        held" will not refer to the shared pointer of the topology, and defer
        instead to the topology that was given to them by their caller.
  • At some points, functions are invoked that are expected to update the topology
    (e.g. topology scanning). In this case, the caller takes a shared pointer td
    and checks if it is valid/stale. If the td is okay, it continues with td.
    If not, it will invoke a scan operation and block. After the scanning
    completes, the caller must drop td and then rebind td by taking the
    topology description again. This allows the callers to see the updated
    topology after scanning has completed.
  • In the case of multithreaded scanning/monitoring, the topology
    description updates need to be queued and processed one at a time. If
    multiple threads were allowed to concurrently modify the topology, some
    topology updates might be lost. For example: If thread th1 clones the
    topology description and begins updating it, and then thread th2 takes
    another clone, quickly modifies it, and then swaps it back in place, then
    th1 might finish its modifications and write back the topology description
    based on the copy it took before th2 did its modifications, thus losing the
    modifications done by th2. (Solved below)

Update:

const for all-the-things

There was trouble in deciding where to take a copy of the shared topology
description, versus when we would need to duplicate the description so that it
could be modified and later swapped out. The solution was to make the
_shared_descr_ be a pointer-to-const. Thus, any caller of
mc_tpld_take_ref() would obtain a
shared-pointer-to-const mongoc_topology_description_t, and be unable to modify
the description or bind it to a pointer-to-non-const (I'll abbreviate as
mutable*). From this point, it was just a matter of finding everywhere that a
we attempt to create a mutable* and from a const* deciding:

  1. Should the mutable* become a const*?
  2. Are we actually attempting to modify the topology description?

In the case of [1], it was simply a matter of adding more const at the
use-site and in the function signatures where we are trying to pass the
const*. After modifying the callee signatures to take const*, it would often
lead to more sites where we attempt to form mutable* from a const*. In
every case, this would eventually result in:

  • The entire call-tree becomes const-correct, and we can know that any time we
    see const mongoc_topology_description_t*, we know that we are not going to
    attempt to modify it.
  • Eventually, adding a const would be impossible, because we are attempting to
    modify the topology. In this case, we backtrack and find where we originally
    took the shared-const* and mark that as a point where we need to perform a
    topology modification. That is, these sites become an instance of [2] above.

Additionally, mongoc_topology_description_t contains (contained) a member
mongoc_set_t* servers which contains a set of mongoc_server_description_t.
The servers pointer member is an owning-pointer, and the server descriptions
are themselves a part of the topology. Thus, when we have a
mongoc_topology_description_t const*, we should only access it's
mongoc_server_description_t instances through const* as well.

Because pointer members are shallow-const, the servers member was renamed to
_servers_, and any attempt to access the server descriptions of a topology
description now must go through two getter functions:

  • mc_tpld_servers(mongoc_topology_description_t*)
      -> mongoc_set_t*
  • mc_tpld_servers_const(mongoc_topology_description_t const*)
      -> mongoc_set_t const*

Because it is invalid to pass a const* to mc_tpld_servers, it is not
possible to obtain a mutable* for a mongoc_server_description_t when all you
have is a mongoc_topology_description_t const* (unless you do a cast. Don't do
a cast).

Note: There were not yet any const-aware mongoc_set functions, so those
were also added as part of this PR.

Modifying the Topology

Now we need to define the procedure to update the topology. There are a few
requirements:

  • Only one thread may update a topology description at any time. Everyone else
    needs to wait their turn.
  • Any number of threads can still inspect the topology while another thread is
    performing an update.
  • Threads should not see any "partial" updates to the topology. The topology
    update must appear atomic threads other than the one doing the updates.
  • After an update is committed, other threads that still have a reference to the
    old topology should be safe to keep using that old reference.

All of the above are addressed by a few new APIs:

struct mc_tpld_modification {
  mongoc_topology_description_t* new_td;
  mongoc_topology_t* topology;
};

mc_tpld_modification
mc_tpld_modify_begin(mongoc_topology_t* topo);

void
mc_tpld_modify_commit(mc_tpld_modification tdmod);

void
mc_tpld_modify_drop(mc_tpld_modification tdmod);

They work as such:

  • mc_tpld_modification: A composite object that contains state for the
    topology update. The most important member is new_td, which is a
    non-const pointer to a topology description. Through this pointer we
    should apply updates to the topology description. topology is pointer to
    the topology that owns the descriptions.
  • mc_tpld_modify_begin(tpl): Locks a mutex in tpl that is used to serialize
    modifications to the topology description. The current topology description is
    cloned and put into mc_tpld_modification::new_td on the returned object.
  • mc_tpld_modify_drop(tdmod): Drops any pending modifications defined by
    tdmod. Unlocks the topology modification mutex. Destroys and deallocates
    new_td.
  • mc_tpld_modify_commit(tdmod): Perform the atomic update of the topology
    description. After this function returns, the topology description from
    new_td is the new description for the whole topology. Unlocks the
    modification mutex. new_td is bound to the
    shared-pointer stored in topology.

A call to mc_tpld_modify_begin() must be later followed by a call to
either mc_tpld_modify_drop() or mc_tpld_modify_commit() (never both).

For example:

mc_shared_tpl_descr td = mc_tpld_take_ref(tpl); // [1]
if (needs_to_update(td.ptr)) {  // [2]
  mc_tpld_modification tdmod = mc_tpld_modify_begin(tpl);  // [3]
  if (needs_to_update(tdmod.new_td)) {   // [4]
    do_update_topology(tdmod.new_td);    // [5]
    mc_tpld_modification_commit(tdmod);  // [6]
  } else {
    mc_tpld_modification_drop(tdmod);    // [7]
  }
}
mc_tpld_drop_ref(&td);  // [8]

Explained:

  1. Take a reference to the topology description at that exact moment. td.ptr
    will remain valid until [8], and the pointed-to-description will remain
    unchanged regardless of any intermediate code.
  2. Check whether we need to update the topology. This may come in the form of an
    error during normal execution, rather than an explicit check for
    outdated-ness.
  3. Begin an update transaction.
  4. Check again!: The call to mc_tpld_modify_begin() may have been preempted
    by another thread that was in the middle of performing an update. The other
    thread may have already performed the update that we expected ourselves to
    do. It is not guaranteed that the td in the outer scope is the same as the
    tdmod.new_td, since new_td was cloned after securing the exclusive lock
    to perform a topology update.
  5. Pass the mutable* to the topology description to a function that will
    perform the updates. No other thread will have access to this pointer, so no
    further synchronization is necessary.
  6. Commit the update. Now other threads will see our updates the next time they
    pull the topology description.
  7. If needs_to_update(tdmod.new_td) returned false, abandon the update.
  8. td is still holding a reference to the topology description at the moment
    of the call to mc_tpld_take_ref in [1]. We always need to drop those
    references.

@kevinAlbs kevinAlbs self-requested a review October 29, 2021 21:24
Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to renaming of functions such as mongoc_topology_server_by_id -> mongoc_topology_description_server_by_id_const.

👍 for addition of srv_polling_mtx to clear up responsibility of tpld_modification_mtx.

There is a mix of east and west const throughout proposed changes. I recommend making the const formatting consistent (AFAIK west const is the current format standard in the C driver).

@@ -40,11 +40,11 @@ mongoc_topology_description_new_copy (

MONGOC_EXPORT (bool)
mongoc_topology_description_has_readable_server (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is public API. I think adding const is a non backwards breaking API / ABI change, so this is OK to change.

Please update the RST documentation mongoc_topology_description_has_readable_server.rst and mongoc_topology_description_has_writable_server.rst.

/**
* @brief Directly invalidate a server in the topology by its ID.
*
* This is useful for testing purposes, as it provides thread-safe direct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest noting that this is only intended to be used for tests. Non test code should not call this since the error message is fixed.

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I do not think open comments are blockers.

Several new internal APIs have abbreviated names, which is uncommon in the
rest of the codebase. This was purely to save my hands the strain of typing
verbose names all the time 🙂. Anything can be renamed before this PR is
merged.

I defer artistic freedom of naming to you. But I vote for consistency in the abbreviation. There are three different abbreviations for "topology description", consider choosing one. Here is one option:

  • MC_DECL_TD_TAKE => MC_DECL_TPLD_TAKE
  • _topo_descr_destroy_and_free => _tpld_destroy_and_free
  • mc_shared_tpld => (no change)

/**
* @brief This lock is held in order to serialize operations that modify the
* topology description. It *should not* be held while performing read-only
* operations on the topology.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a comment saying which fields tpld_modification_mtx protects, including server_monitors and rtt_monitors. I do not feel strongly about it. I am OK resolving with no change.

If finding an alternative to a CV seems like worthwhile technical debt to address, please file a CDRIVER ticket.

@vector-of-bool vector-of-bool merged commit ded9ae5 into mongodb:master Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants