Skip to content

CDRIVER-3653 use handshake metadata for connection checks #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 18, 2021

Conversation

kevinAlbs
Copy link
Collaborator

@kevinAlbs kevinAlbs commented Jul 10, 2021

Terms

  • handshake description is a server description constructed during connection handshake.
  • monitor description is a server description constructed and updated in server monitoring.

Motivation

  • libmongoc currently uses monitor descriptions when constructing commands (to check wire version, how to pass read preferences, etc.). This is racy, and the cause of bugs like CDRIVER-3404.
  • Supporting load balancer requires setting the monitor description to an empty LoadBalancer description. A load balancer can front many backing servers. Connections must track the handshake description to work correctly with load balancers.

Summary

  • Store handshake descriptions alongside their corresponding connections.
  • Update change stream and transactions wire version checks to check the handshake description instead of the monitor description.
  • Update auth commands to use the handshake description when constructing commands.
  • Add mongoc_client_get_handshake_description for wrapping drivers to obtain a handshake description.

Notes

  • As a happy consequence, change streams now send a killCursors command when resuming due to an error that marks a server unknown. Previously this would clear the wire version on the connection.

@kevinAlbs kevinAlbs changed the title [WIP] CDRIVER-3653 bind server connections from handshake to streams [WIP] CDRIVER-3653 use handshake metadata for connection checks Jul 12, 2021
@kevinAlbs kevinAlbs force-pushed the poc.3653 branch 3 times, most recently from 33e16ed to 508b015 Compare July 14, 2021 01:37
- Update change stream wire version check
- Update transaction wire version check
- Update auth commands to use handshake description to construct commands.
@kevinAlbs kevinAlbs changed the title [WIP] CDRIVER-3653 use handshake metadata for connection checks CDRIVER-3653 use handshake metadata for connection checks Jul 14, 2021
@kevinAlbs kevinAlbs requested a review from jmikola July 14, 2021 02:01
@kevinAlbs kevinAlbs marked this pull request as ready for review July 14, 2021 02:01

:symbol:`mongoc_client_get_handshake_description` is distinct from :symbol:`mongoc_client_get_server_description`. :symbol:`mongoc_client_get_server_description` returns a server description constructed from monitoring, which may differ from the server description constructed from the connection handshake.

:symbol:`mongoc_client_get_handshake_description` does not attempt to establish a connection to the server if a connection was not already established. It will complete authentication on a single-threaded client if the connection has not yet been used by an operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence sounds as if this can have two very different behaviors in single-threaded depending on whether auth is used. Is it true that without auth, this will never initialize SDAM, and with auth it will? If so, is there any way we can avoid that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that without auth, this will never initialize SDAM, and with auth it will?

Regardless of auth mongoc_client_get_handshake_description will not establish a new connection or initialize SDAM. If a connection is not already established, this returns NULL.

But, if auth is enabled, this function will do I/O to complete the authentication handshake. This happens because the auth handshake does not occur as part of connection establishment in monitoring, and is deferred until the first call to mongoc_cluster_fetch_stream_single.

That did not seem problematic to me. But I think I could feasibly change this behavior to avoid completing the authentication handshake if that seems problematic. Adding an argument to mongoc_cluster_fetch_stream_single to skip authentication, and passing that through from mongoc_client_get_handshake_description seems feasible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, does this pertain to monitoring not using auth at all? But on single-threaded, since it's also the application socket auth ends up being completed?

I'm unsure if this current behavior would be problematic, but I think anything we can do to make it consistent between modes of operation would be welcome. It simplifies the documentation and can avoid a subtle impact (which we might not realize now) down the line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, does this pertain to monitoring not using auth at all?

Yes. Monitoring, even for single-threaded, will not authenticate connections. In single-threaded, a connection completes authentication the first time it is used for an operation when mongoc_cluster_fetch_stream_single is called.

I'm unsure if this current behavior would be problematic, but I think anything we can do to make it consistent between modes of operation would be welcome. It simplifies the documentation and can avoid a subtle impact (which we might not realize now) down the line.

I agree it seems simpler to reason about mongoc_client_get_handshake_description if it also attempts to establish a connection.


:symbol:`mongoc_client_get_handshake_description` does not attempt to establish a connection to the server if a connection was not already established. It will complete authentication on a single-threaded client if the connection has not yet been used by an operation.

To establish a connection on a single-threaded client, ensure the server is selectable (the monitoring connection is the only connection made to the server). This can be done with a function like :symbol:`mongoc_client_select_server`, or any function which performs server selection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of:

ensure the server is selectable (the monitoring connection is the only connection made to the server)

Would it be more clear to say that we should ensure SDAM is initialized? It seems odd to refer to "monitoring connection" when that is the only connection in single-threaded.

Should we advise users to either select the server or run some operation on it, since either will initialize SDAM? Wouldn't that also be the only way to get the server_id needed by this function in the first place? IIRC, those aren't allocated until SDAM is actually initialized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more clear to say that we should ensure SDAM is initialized?

Not quite. If SDAM was initialized, but a network error occurred later, there may no longer be an active connection to the server.

Should we advise users to either select the server or run some operation on it

Yes, either of those conditions would suffice to guarantee there is an active application-use connection in single-threaded. I have reworked this section to hopefully clarify.

Wouldn't that also be the only way to get the server_id needed by this function in the first place?

Yes, you would need to select a server to get an ID. But once you have the ID, to check that the connection still exists (and has not been disconnected due to a network error) you need to check these conditions again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be difficult for wrapping drivers to utilize. It may be simpler for this to always attempt to establish a connection if one does not exist. That way, callers do not need to do any special checking to ensure a connection is established.

Do you think that would simplify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be difficult for wrapping drivers to utilize. It may be simpler for this to always attempt to establish a connection if one does not exist. That way, callers do not need to do any special checking to ensure a connection is established.

I do think having mongoc_client_get_handshake_description establish a connection would be desirable. I can't think of a reason why we'd actually want an empty server description returned (unless the connection could not be established in which case it'd likely be an error).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I will proceed with that change. My original thinking was to have this function do no more than necessary. But ensuring a connection is established seems more awkward.

@@ -289,6 +279,19 @@ _make_cursor (mongoc_change_stream_t *stream)
goto cleanup;
}

server_stream = mongoc_cluster_stream_for_reads (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the change that responsible for fixing CDRIVER-4077? I'm curious what part of the diff in particular is responsible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not added an explicit test to ensure that CDRIVER-4077 is resolved. But, this is my current theory:

  1. The wire version check in _mongoc_client_kill_cursor constructs a new server stream. Before this fix, this would take the current server description from the shared topology description.
  2. If the server was marked Unknown in the topology description (e.g. from a "not primary" error) before the resume attempt occurred, that server description would be empty and Unknown. That wire version check would send OP_KILL_CURSORS instead of a killCursors command.

I have commented in CDRIVER-4077. When that is addressed, we should add a test for that specific behavior.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment about docs. I still agree with changing mongoc_client_get_handshake_description to always initialize the connection if possible, as I think that'd be easier to work with (and consistency across modes of operation is always a plus from my end).

Also, note the outstanding typo fix in #822 (comment)

Copy link
Collaborator Author

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated mongoc_client_get_handshake_description to establish a connection and added tests to verify.

@kevinAlbs kevinAlbs requested a review from jmikola July 16, 2021 20:57
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2021

Codecov Report

Merging #822 (10214c0) into master (68d0948) will increase coverage by 0.34%.
The diff coverage is 87.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #822      +/-   ##
==========================================
+ Coverage   56.79%   57.14%   +0.34%     
==========================================
  Files         257      258       +1     
  Lines       70573    70791     +218     
==========================================
+ Hits        40084    40454     +370     
+ Misses      30489    30337     -152     
Impacted Files Coverage Δ
src/libmongoc/src/mongoc/mongoc-client-session.c 9.58% <0.00%> (-0.05%) ⬇️
src/libmongoc/src/mongoc/mongoc-cluster-aws.c 0.00% <ø> (ø)
src/libmongoc/tests/test-mongoc-change-stream.c 6.39% <0.00%> (-0.03%) ⬇️
src/libmongoc/src/mongoc/mongoc-change-stream.c 32.33% <63.63%> (+4.30%) ⬆️
src/libmongoc/src/mongoc/mongoc-cluster.c 61.84% <80.35%> (+0.92%) ⬆️
src/libmongoc/tests/test-mongoc-client.c 28.23% <85.52%> (+3.03%) ⬆️
src/libmongoc/src/mongoc/mongoc-client.c 60.29% <85.71%> (+0.22%) ⬆️
...c/libmongoc/src/mongoc/mongoc-server-description.c 88.65% <100.00%> (+0.88%) ⬆️
src/libmongoc/src/mongoc/mongoc-topology-scanner.c 88.14% <100.00%> (+1.64%) ⬆️
src/libmongoc/tests/test-libmongoc.c 78.52% <100.00%> (+0.02%) ⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68d0948...10214c0. Read the comment docs.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question about mongoc_server_description_type but LGTM.

@@ -4080,8 +4080,8 @@ test_mongoc_client_get_handshake_hello_response_pooled (void)
handshake_sd = mongoc_client_get_handshake_description (
client, monitor_sd->id, NULL /* opts */, &error);
ASSERT_OR_PRINT (handshake_sd, error);
BSON_ASSERT (MONGOC_SERVER_UNKNOWN !=
mongoc_server_description_type (handshake_sd));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongoc_server_description_type always returned a string, so was this original code a bug since MONGOC_SERVER_UNKNOWN is an enum value?

Separate question, is there any reason libmongoc doesn't define constants for these strings? I know the enum type is private. Is this related to just not wanting to declare possible server types in any sort of public ABI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongoc_server_description_type always returned a string, so was this original code a bug since MONGOC_SERVER_UNKNOWN is an enum value?

Yes, this was a bug since it was comparing the string to the enum value.

Separate question, is there any reason libmongoc doesn't define constants for these strings? I know the enum type is private. Is this related to just not wanting to declare possible server types in any sort of public ABI?

I am not aware of any reason since the API predates me. But that seems plausible. SDAM server types are entrenched and long lived enough to where I would not be opposed to defining string constants if that would be useful for wrapping drivers. Feel free to file a ticket or PR to make that change.

@kevinAlbs kevinAlbs merged commit 0da8f1a into mongodb:master Jul 18, 2021
@kevinAlbs kevinAlbs deleted the poc.3653 branch July 18, 2021 23:44
@kevinAlbs kevinAlbs restored the poc.3653 branch February 3, 2022 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants