Skip to content

GODRIVER-1825 Select the server with fewer in-progress operations. #867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 18, 2022

Conversation

matthewdale
Copy link
Collaborator

@matthewdale matthewdale commented Mar 8, 2022

GODRIVER-1825

Update Topology.SelectServer() to pick 2 random servers from the list of selectable servers, and return the one with fewer in-progress operations to more evenly balance load across the selectable servers.

Changes:

  • Add an operationCount counter in Server that keeps track of the number of requests for connections. Note that operationCount is not decremented while a connection is pinned to a transaction until the transaction is committed or aborted.
  • Pick 2 random servers from the list of selectable servers in Topology.SelectServer() and return the one with fewer in-progress operations.
  • Add a prose test for the one described in operationCount-based Selection Within Latency Window.
  • Add a test runner for the the in_window server selection spec tests and sync all tests in that directory.

@matthewdale matthewdale force-pushed the godriver1825-server-load branch 2 times, most recently from fff2d40 to 4dcc245 Compare March 9, 2022 01:41
@matthewdale matthewdale requested a review from benjirewis March 9, 2022 01:43
@matthewdale matthewdale marked this pull request as ready for review March 9, 2022 01:43
@matthewdale matthewdale requested a review from kevinAlbs March 9, 2022 01:43
@matthewdale matthewdale force-pushed the godriver1825-server-load branch 2 times, most recently from 4dd7cd3 to d212fa8 Compare March 9, 2022 17:49
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the inline comments describing the rationale for spec tests in topology_test.go and the small deviations from the spec.

Suggested one test change and a comment fix up.

// came from this pool. Do this in checkIn() instead of checkInNoEvent() because the latter is
// called by createConnections() and is not necessarily a check-in of an in-use connection. Use
// an int64 instead of a uint64 to mitigate the impact of any possible bugs that could cause the
// uint64 to underflow, which would effectively make the server unselectable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rationale for using an int64 rather than an uint64. I think that is a good precaution.

One suggested edit:

Suggested change
// uint64 to underflow, which would effectively make the server unselectable.
// uint64 to underflow, which would make the server much less selectable.

A server could still be selected if it has the lowest average latency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, will update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@matthewdale matthewdale changed the title GODRIVER-1825 Select the server with fewer in-use connections. GODRIVER-1825 Select the server with fewer in-progress operations. Mar 16, 2022
@matthewdale matthewdale force-pushed the godriver1825-server-load branch from 8be0f17 to 3d75312 Compare March 16, 2022 21:34
@matthewdale matthewdale requested a review from benjirewis March 16, 2022 21:36
@matthewdale matthewdale force-pushed the godriver1825-server-load branch from 3d75312 to 2abcf79 Compare March 16, 2022 23:07
@matthewdale matthewdale force-pushed the godriver1825-server-load branch 2 times, most recently from bf81967 to 9904e77 Compare March 17, 2022 03:41
@matthewdale matthewdale force-pushed the godriver1825-server-load branch from 9904e77 to 34cbc49 Compare March 17, 2022 05:04
@kevinAlbs kevinAlbs self-requested a review March 17, 2022 17:50
// Sleep for 100ms to allow all server state discovery to complete. We need both servers to
// be selectable when we start running the test or the distribution of selected servers will
// be skewed. Unfortunately there's not currently another signal we can block on.
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested offline, consider waiting for two ServerHeartbeatSucceededEvent events rather than sleep.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually used TopologyDescriptionChangedEvents to confirm that the topology has exactly 2 Mongos type instances before continuing to run the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

connImpl, err := s.pool.checkOut(ctx)
// Increment the operation count before calling checkOut to make sure that all connection
// requests are included in the operation count, including those in the wait queue. If we got an
// error insted of a connection, immediately decrement the operation count.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// error insted of a connection, immediately decrement the operation count.
// error instead of a connection, immediately decrement the operation count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@matthewdale matthewdale force-pushed the godriver1825-server-load branch from ce7b1c2 to 0a56eba Compare March 18, 2022 04:39
@matthewdale matthewdale merged commit 630ff7c into mongodb:master Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants