Skip to content

Validate that the response content-length header matches what the ser… #2957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 20, 2022

Conversation

millems
Copy link
Contributor

@millems millems commented Jan 7, 2022

…vice actually returns.

Also update error messaging in the event of service-side connection closes, and add testing to protect these error messages against regression.

@millems millems requested a review from a team as a code owner January 7, 2022 19:40
Comment on lines +138 to +140
Long contentLengthHeader = ctx.channel().attr(RESPONSE_CONTENT_LENGTH).get();
Long actualContentLength = ctx.channel().attr(RESPONSE_DATA_READ).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate this logic to a standalone handler, and that handler raises an exception if it detects this issue? The ResponseHandler would then propagate any channel exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successes and errors are reported in multiple places in the netty client. I can see about how that works, but it's harder for me to guarantee that it will work because of how messed up our success/error reporting is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it, I can already tell it might require me to rewrite our FullHttpResponse handling, since that currently handles does its own finalizeResponse call in parallel with this publisher to complete the future successfully...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Out of scope if it requires any significant refactoring. In general I would consider channel attribute usage a code smell and prefer to try to isolate behavior to standalone handlers, but understood if it's not feasible with the current setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good feedback. I'll see what I can do...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems like FullResponseContentPublisher and PublisherAdapter do their own error handling, so I can't decorate them with something else and expect it to do anything. The onError just gets fired out to the customer's publisher and the netty client ignores it. I'd have to break up those two publishers to put their error handling downstream from the publishers themselves. It would be good to do that, but it's a significant refactor.

Comment on lines 153 to 154
return Optional.of(new IOException("Response had content-length of " + contentLengthHeader + " bytes, but only received "
+ actualContentLength + " before the connection was closed."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Transfer-Encoding: chunked is being used, but we don't receive the final (0 byte) chunk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what happens if an HTTP/2 channel is closed without an endOfStream flag? Does Netty throw an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a different edge case we might need to add validation for. That's much rarer, since we rarely use chunked encoding. This particular code only goes into effect when the content-length is specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we also need to test the end-of-stream flag as well.

Comment on lines +22 to +28
/**
* Diagnostic information that may be useful to help with debugging during error scenarios.
*/
@SdkInternalApi
public class ChannelDiagnostics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to expose more helpful information. I like the idea. However, if it grows in size over time, we may want to consider only emitting it if, e.g., DEBUG logging is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we need to be careful about it.

new FullResponseContentPublisher(channelContext,
bb, ef)));

Optional<IOException> contentLengthFailure = validateResponseContentLength(channelContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think try/catch may be more conventional than Optional here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

} finally {
Optional.ofNullable(fullContent).ifPresent(ByteBuf::release);
}
}
}

private Long responseContentLength(HttpResponse response) {
String length = response.headers().get("content-length");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: io.netty.handler.codec.http.HttpHeaderNames.CONTENT_LENGTH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 153 to 154
return Optional.of(new IOException("Response had content-length of " + contentLengthHeader + " bytes, but only received "
+ actualContentLength + " before the connection was closed."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "but only received X bytes before"

nit: should we emit a different/special message if we received 0 bytes?

nit: should we wrap the IOException inside of Netty's PrematureChannelClosureException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would need to be the other way around: IOException on the outside, so it's seen a retryable. PrematureChannelClosureException is a RuntimeException. Does that change your mind on it? Should I use that inside?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't our default retry logic check 1-level of causes?

What happens if Netty does actually throw a PrematureChannelClosureException itself? E.g., here. Do we retry on that? It seems like we would want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does check 1 level of causes, but I was under the impression that the contract for ExecutableHttpRequest.call is that only IOException will be retried, not necessarily one level of exception. I don't know if I have the wrong impression, or if there's further wrapping that happens upstream before the retry logic that necessitates checking 1 level deeper.

@millems millems force-pushed the millem/verify-response-content-length branch 2 times, most recently from da0b052 to c832143 Compare January 7, 2022 22:43
@millems millems force-pushed the millem/verify-response-content-length branch 6 times, most recently from 26a01a3 to cd5c184 Compare January 19, 2022 00:20
@Override
public String toString() {
return ToString.builder("ChannelDiagnostics")
.add("creationTime", channelCreationTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would channel age be helpful here? It may be able to be inferred by subtracting creation time from a log timestamp time, but it seems age may typically be more relevant than creation time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I can remove creationTime and replace it with channelAge. I'd avoid both for what you mention - redundant information.

makeRequest();
try {
configureChannel();
if (tryConfigurePipeline()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make tryConfigurePipeline throw directly? This would wrap all of its exceptions with Failed to initiate request to.... Or do you want to reserve that for less explicit failure cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not following.

I added this try/catch because exceptions raised within this runnable go unreported and the completable future returned to the customer never returns (likely causing a deadlock in the customer's code). This change could be in a separate commit, since it's unrelated/unneeded to this change.

I just added it here because I ran into this bug during implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryConfigurePipeline raises several exceptions itself (by also invoking handleFailure). I was just wondering if we should wrap all of them with this new logic for the sake of consistency (by changing tryConfigurePipeline to simply throw). But perhaps that just unnecessarily nests the more pertinent error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see. We can combine them, yeah. Good idea.

}

@Override
public String toString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the Channel itself here? Primarily for remote address information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +62 to +65
if (isAcquireTimeoutException(originalCause)) {
return new Throwable(getMessageForAcquireTimeoutException(), originalCause);
} else if (isTooManyPendingAcquiresException(originalCause)) {
return new Throwable(getMessageForTooManyAcquireOperationsError(), originalCause);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR specifically but I don't think we should instantiate a literal Throwable here. Would be more conventional to use Exception or RuntimeException for the generic case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure why we use throwable

Comment on lines +118 to +119
validateResponseContentLength(channelContext);
finalizeResponse(requestContext, channelContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If validateResponseContentLength throws we do not call finalizeResponse. Likewise for streaming. Do we still release the channel in that scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exceptionCaught closes and releases the channel.

private static final Logger log = Logger.loggerFor(NettyUtils.class);

private NettyUtils() {
}

public static Throwable decorateException(Channel channel, Throwable originalCause) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We almost have enough exception-related logic in this class to consider a NettyExceptionUtils or similar.

import software.amazon.awssdk.utils.AttributeMap;
import software.amazon.awssdk.utils.Logger;

@RunWith(Parameterized.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider JUnit 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we on junit 5 for sure now? I honestly need to learn it, if so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both 4 and 5 work fine, but IMO we should try to ensure newly created tests are on 5 so that we don't create more future migration work. JUnit 5 parameterization is also a lot cleaner IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to junit 5. I don't like not being able to use the test parameter in the before-each statement, but otherwise it seems similar to me.

Comment on lines +94 to +93
return Arrays.asList(new TestCase(CloseTime.DURING_INIT, "The connection was closed during the request."),
new TestCase(CloseTime.BEFORE_SSL_HANDSHAKE, "The connection was closed during the request."),
new TestCase(CloseTime.DURING_SSL_HANDSHAKE, "The connection was closed during the request."),
new TestCase(CloseTime.BEFORE_REQUEST_PAYLOAD, "The connection was closed during the request."),
new TestCase(CloseTime.DURING_REQUEST_PAYLOAD, "The connection was closed during the request."),
new TestCase(CloseTime.BEFORE_RESPONSE_HEADERS, "The connection was closed during the request."),
new TestCase(CloseTime.BEFORE_RESPONSE_PAYLOAD, "Response had content-length"),
new TestCase(CloseTime.DURING_RESPONSE_PAYLOAD, "Response had content-length"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome test cases. Can we come up with more distinctive wording for the first 6 test cases, or out of scope of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example what you mean?

Copy link
Contributor

@Bennett-Lynch Bennett-Lynch Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this (numbers are not aligned to test cases):

  1. The connection was closed while it was being initialized.
  2. The connection was closed during the TLS handshake process.
  3. The connection was closed before the request could be sent.
  4. The connection was closed while sending the request body.
  5. The connection was closed while waiting to receive a response.
  6. The connection was closed while receiving the response body.

This type of wording could greatly help debugging certain failure scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case already provides something similar through its toString. I feel like that should be enough without needing to maintain a string description.

Screen Shot 2022-01-19 at 11 47 36 AM

Copy link
Contributor

@Bennett-Lynch Bennett-Lynch Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant can we change the user-facing wording used in the actual exception? I agree the test case enums are fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, we can talk offline. I tried super hard to do that originally (that's why these test cases are structured like that), and it just doesn't work like that. The client thinks it's waiting for a response when the service hasn't even finished the SSL handshake.

@Test
public void closeTimeHasCorrectMessage() {
server.closeTime = testCase.closeTime;
assertThat(captureException()).hasMessageContaining(testCase.errorMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider assertThatThrownBy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want the assertions running against the Error throwables that could be raised by the logic in captureException.


private static class TestCase {
private CloseTime closeTime;
private String errorMessage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe errorSubstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@millems millems force-pushed the millem/verify-response-content-length branch 2 times, most recently from be87573 to d6754fe Compare January 19, 2022 19:18
public String toString() {
return ToString.builder("ChannelDiagnostics")
.add("channel", channel)
.add("channelAge", Duration.between(channelCreationTime, Instant.now()).getSeconds() + " seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to just use the default Duration#toString format, even if it's not the prettiest. Flooring to seconds may lose a lot of granularity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@millems millems force-pushed the millem/verify-response-content-length branch 2 times, most recently from b116a36 to 8b63113 Compare January 19, 2022 19:47
…vice actually returns.

Also update error messaging in the event of service-side connection closes, and add testing to protect these error messages against regression.
@millems millems force-pushed the millem/verify-response-content-length branch from 8b63113 to fe78581 Compare January 19, 2022 19:50
@millems millems enabled auto-merge (squash) January 19, 2022 23:08
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

90.1% 90.1% Coverage
0.0% 0.0% Duplication

@millems millems merged commit f6fe8da into master Jan 20, 2022
aws-sdk-java-automation pushed a commit that referenced this pull request Jan 20, 2022
… the service actually returns. (#2957)"

This reverts commit f6fe8da.
millems added a commit that referenced this pull request Jan 21, 2022
millems added a commit that referenced this pull request Jan 21, 2022
* Validate that the response content-length header matches what the service actually returns. (#2957)

This reverts commit 6d93c8e.

* Do not validate content-length for 304 and HEAD responses.
@millems millems deleted the millem/verify-response-content-length branch October 19, 2022 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants