-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-4193 Ensure OP_MSG for handshakes and fix RPC op_egress counters #1256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nc_cmd_phase_send
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. The comments in the /counters/rpc/op_egress
tests are much appreciated as a reference. LGTM with suggested changes to tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes verified by this patch.
Description
This PR resolves CDRIVER-4193, CDRIVER-4622, and CDRIVER-4265. Verified by this patch.
OP_MSG for Handshakes (CDRIVER-4193 and CDRIVER-4265)
Drivers are required to use the "hello" command with the OP_MSG protocol for initial connection handshakes when either a Stable API Version (
serverApi=<value>
) or Load Balancing mode (loadBalanced=true
) is requested by the client. They should not be using the "legacy hello" command with the OP_QUERY protocol if either of these options are set.Following CDRIVER-4629 (#1248 and #1251), initial connection handshakes performed by clients were verified as correctly selecting OP_MSG vs. OP_QUERY depending on
serverApi
(specifically inmongoc_cluster_run_command_private
).When using client pools, a server monitor thread is spawned, and if the server has a maxWireVersion equivalent to server version 4.4 or newer, an additional RTT thread is spawned. Although the monitor thread was correctly checking for
serverApi
during initial connection handshakes, the RTT thread was unconditionally using OP_QUERY for its initial connection handshakes, and both the monitor and RTT threads were unconditionally using OP_QUERY for their polling hellos. The new_server_monitor_send_and_recv
function is now used by both threads to perform correct OP_MSG/OP_QUERY selection.Subsequently, all sites that check for
serverApi
viamongoc_<object>_uses_server_api
were updated to also check forloadbalanced=true
via the newmongoc_<object>_uses_loadbalanced
function.CDRIVER-4622 RPC op_egress Counters
The
_mongoc_rpc_gather
function served the dual-purpose of translating RPC messages into an array ofiovec
structures that are used to write the messages to the streams, as well as incrementing the op_egress performance counter corresponding to the opcode of the RPC message to be written. This dual purpose led to overcounting within the topology scanner when multiple TCP connections are scheduled for each DNS result, but only the first successful TCP(+TLS) connection is used to actually write the RPC message.This PR redefines
_mongoc_rpc_gather
to perform RPC message translation only. The increment of RPC op_egress counters is now done via the new dedicated_mongoc_rpc_op_egress_inc
function. A temporary_mongoc_rpc_gather_no_inc
function was used to track this separation during refactors (see the individual "RPC egress increment" commits for details). Once all references to the old_mongoc_rpc_gather
were replaced with_mongoc_rpc_gather_no_inc
and calls to_mongoc_rpc_op_egress_inc
were placed where required, the_mongoc_rpc_gather_no_inc
was renamed to_mongoc_rpc_gather
, effectively substituting the old definition with the new non-incrementing definition./counters/rpc/op_egress Tests
The bulk of this PR (besides the addition of
mongoc_<object>_uses_loadbalanced
) is the addition of the/counters/rpc/op_egress
tests to validate correct behavior of the RPC op_egress counters and OP_MSG/OP_QUERY selection during refactors. With the exception of the fix of the RTT thread and the mock server, all refactors of old calls to_mongoc_rpc_gather
were synchronized with updates to the stack traces in the tests that document theconnect/command -> RPC op_egress counter
. These comments should hopefully also serve as a useful reference for developers to understand the C Driver's connection handling process.These tests should cover all code paths that lead to RPC op_egress counter increments. With the exception of the
/counters/rpc/op_egress/auth
tests, the RPC op_egress tests use the mock server to deterministically time and test counter values incremented during initial connection handshakes. The auth tests run against a live server, with a 4.4+ requirement for/op_msg
subtests that useserverApi
to force OP_MSG for initial connection handshakes.Documentation
The documentation for Performance Counters was updated with a note clarifying that
op_egress
counters do not account for connection or handshake/command failures.