Skip to content

Fix CursorResourceManager.close #1440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ void endOperation() {
void close() {
boolean doClose = withLock(lock, () -> {
State localState = state;
if (localState == State.OPERATION_IN_PROGRESS) {
if (localState.inProgress()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the original code:

If AsyncCommandBatchCursor.close is called concurrently more than once while AsyncCommandBatchCursor.next is running (not necessarily the next method itself, but the asynchronous logic of this method), then the following state transitions were possible:

  1. One of the close calls observes State.OPERATION_IN_PROGRESS, and changes it to State.CLOSE_PENDING. So far so good.
  2. The other close observes State.CLOSE_PENDING and changes it to State.CLOSED, which breaks one of the invariant that are supposed to be maintained by CursorResourceManager: while an operation is in progress, state.inProgress() must be true, i.e., state must be either OPERATION_IN_PROGRESS or CLOSE_PENDING (implies OPERATION_IN_PROGRESS).

state = State.CLOSE_PENDING;
} else if (localState != State.CLOSED) {
state = State.CLOSED;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
Expand Down Expand Up @@ -299,6 +300,24 @@ void testLimitWithGetMore() {
assertTrue(cursor.isClosed());
}

@Test
void attemptJava5516() {
BsonDocument commandResult = executeFindCommand(5, 2);
cursor = new AsyncCommandBatchCursor<>(commandResult, 2, 0, DOCUMENT_DECODER,
null, connectionSource, connection);
assertNotNull(cursorNext());
// Calling `close` twice is the key to reproducing.
// It does not matter whether we call `close` twice from the same thread or not.
ForkJoinPool.commonPool().execute(() -> cursor.close());
ForkJoinPool.commonPool().execute(() -> cursor.close());
Copy link
Member Author

@stIncMale stIncMale Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue this PR fixes is 100% a bug. And it manifests itself with an error that is very similar to the user-reported error (following is the error I reproduced locally):

java.lang.AssertionError: null
	at com.mongodb.assertions.Assertions.assertTrue(Assertions.java:158)
// note that with this reproducer, sometimes the assertion on line 224 succeeds, and the one on line 225 fails
	at com.mongodb.internal.operation.CursorResourceManager.setServerCursor(CursorResourceManager.java:224)
	at com.mongodb.internal.operation.AsyncCommandBatchCursor.lambda$getMoreLoop$2(AsyncCommandBatchCursor.java:186)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.lambda$executeAsync$0(DefaultServer.java:249)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.connection.CommandProtocolImpl.lambda$executeAsync$0(CommandProtocolImpl.java:88)
	at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.lambda$sendAndReceiveAsync$1(DefaultConnectionPool.java:774)
	at com.mongodb.internal.connection.UsageTrackingInternalConnection.lambda$sendAndReceiveAsync$1(UsageTrackingInternalConnection.java:150)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.async.SingleResultCallback.complete(SingleResultCallback.java:67)
	at com.mongodb.internal.async.AsyncSupplier.lambda$onErrorIf$5(AsyncSupplier.java:122)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.connection.InternalStreamConnection.lambda$sendCommandMessageAsync$7(InternalStreamConnection.java:613)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:894)

However, none of this guarantees that the bug the PR fixes is the bug the user is facing, because theoretically, I can imagine other bugs leading to the same error. However, I was unable to find them in our code, so I am saying that the fix proposed in this PR is my current best solution to JAVA-5516.

try {
assertNotNull(cursorNext());
assertNotNull(cursorNext());
} catch (IllegalStateException e) {
// one of the expected outcomes, because we call `cursorNext` concurrently with `close`
}
}

@Test
@DisplayName("test limit with large documents")
void testLimitWithLargeDocuments() {
Expand Down