Skip to content

[netty-nio-client] Ensure in-use channels are not incorrectly closed #2883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 17, 2021

Conversation

Bennett-Lynch
Copy link
Contributor

@Bennett-Lynch Bennett-Lynch commented Dec 2, 2021

IdleConnectionReaperHandler and OldConnectionReaperHandler are responsible for closing idle/old channels, but only when they are not marked as IN_USE. NettyRequestExecutor is responsible for marking a channel as IN_USE as part of its logic to prepare a channel for a request. However, in rare circumstances, it may be possible for a channel to reach its idle time in between being acquired from the pool and having the request initiated (especially when the channel is acquired from a thread outside the EventLoop). We can make this more resilient by more aggressively marking the channel as IN_USE as part of the channel acquire logic, and ensuring this is run before the channel's health is queried by HealthCheckedChannelPool (which will acquire a new channel, if needed).

Previously, NettyRequestExecutor was responsible for setting IN_USE and HandlerRemovingChannelPool was responsible for clearing it. This PR introduces a simple ChannelPoolListener interface that allows implementing classes to register behavior based on channels being acquired/released (while also guaranteeing the listener callbacks are executed within the channel's EventLoop). We can leverage this interface to make the IN_USE behavior more symmetrical, by using a single ChannelPoolListener to both set/clear the flag. The ChannelPoolListener interface can additionally be used to simplify some of our other ChannelPool-wrapping classes (the ones that don't need to actually manipulate the pool control flow). So this commit also takes the opportunity to migrate the existing HandlerRemovingChannelPool to leverage this interface.

License

  • I confirm that this pull request can be released under the Apache 2 license

IdleConnectionReaperHandler and OldConnectionReaperHandler are responsible
for closing idle/old channels, but only when they are not marked as
"in use". NettyRequestExecutor is responsible for marking a channel
as "in use" as part of its logic to prepare a channel for a request.
However, in rare circumstances, it may be possible for a channel to reach
its idle time in between being acquired from the pool and having the request
initiated (especially when the channel is acquired from a thread outside
the EventLoop). We can make this more resilient by more aggressively
marking the channel as "in use" as part of the channel acquire logic.

Previously, NettyRequestExecutor was responsible for setting IN_USE and
HandlerRemovingChannelPool was responsible for clearing it. We can leverage
Netty's ChannelPoolHandler interface to make this behavior more symmetrical,
by using a single ChannelPoolHandler to both set/clear the flag, which is
also guaranteed by Netty to be executed by the Channel's EventLoop. The
ChannelPoolHandler interface can actually be used to simplify some of our
other ChannelPool-wrapping classes (the ones that don't need to actually
manipulate the pool control flow). So this commit also takes the opportunity
to migrate the existing HandlerRemovingChannelPool to leverage this interface.
@Bennett-Lynch Bennett-Lynch requested a review from a team as a code owner December 2, 2021 23:44
@dagnir
Copy link
Contributor

dagnir commented Dec 3, 2021

This approach looks good to me!

@pkgonan
Copy link

pkgonan commented Dec 14, 2021

@Bennett-Lynch @dagnir
Hi. When do yo expect release version or date of this bug fix ?

@Bennett-Lynch
Copy link
Contributor Author

@Bennett-Lynch @dagnir Hi. When do yo expect release version or date of this bug fix ?

Hi @pkgonan. I am actively working on it as a top priority. It should be released by the end of this week.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

89.3% 89.3% Coverage
0.0% 0.0% Duplication

@Bennett-Lynch Bennett-Lynch merged commit 7e499db into aws:master Dec 17, 2021
@Bennett-Lynch
Copy link
Contributor Author

Change merged and will be part of the next release. Our internal testing shows that this change may reduce the occurrence of transient Channel was closed before it could be written to errors by as much as 98.5%.

@pkgonan
Copy link

pkgonan commented Dec 17, 2021

@Bennett-Lynch
Is 100% impossible?

@Bennett-Lynch
Copy link
Contributor Author

Bennett-Lynch commented Dec 17, 2021

@Bennett-Lynch Is 100% impossible?

The new error rate is almost immeasurably low. I suspect any remaining errors are likely outside of the SDK's control. Additionally, with retries enabled on IOExceptions (which includes the default retry policy), it should be exceedingly rare that this error would surface to the application layer. Please let us know if you see a similar improvement or not with the new change.

@Bennett-Lynch
Copy link
Contributor Author

This change was released today as part of version 2.17.101.

@Sangdol
Copy link

Sangdol commented Jan 24, 2022

Hello,

I've upgraded from SDK 1 to SDK 2 (2.17.100) and started getting errors such as this:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Server failed to send complete response. The channel was closed. This may have been done by the client (e.g. because the request was aborted), by the service (e.g. because there was a handshake error, the request took too long, or the client tried to write on a read-only socket), or by an intermediary party (e.g. because the channel was idle for too long).
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:181)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)

The error looks a bit different from the one that I found from the ticket #1615

Do you think my issue is also related to this fix?

Thanks.

@cgallom
Copy link

cgallom commented May 2, 2024

Hello,

I've upgraded from SDK 1 to SDK 2 (2.17.100) and started getting errors such as this:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Server failed to send complete response. The channel was closed. This may have been done by the client (e.g. because the request was aborted), by the service (e.g. because there was a handshake error, the request took too long, or the client tried to write on a read-only socket), or by an intermediary party (e.g. because the channel was idle for too long).
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:181)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)

The error looks a bit different from the one that I found from the ticket #1615

Do you think my issue is also related to this fix?

Thanks.

Hi, I have read this post and others like it and I notice that you never could check if the updates on aws-sdk-java-v2 worked to fix your bug. I am currently using version 2.10.41 and the error persists in this version. Did you find out how to fix this bug ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants