-
Notifications
You must be signed in to change notification settings - Fork 916
Validate that the response content-length header matches what the ser… #2957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Long contentLengthHeader = ctx.channel().attr(RESPONSE_CONTENT_LENGTH).get(); | ||
Long actualContentLength = ctx.channel().attr(RESPONSE_DATA_READ).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consolidate this logic to a standalone handler, and that handler raises an exception if it detects this issue? The ResponseHandler
would then propagate any channel exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successes and errors are reported in multiple places in the netty client. I can see about how that works, but it's harder for me to guarantee that it will work because of how messed up our success/error reporting is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it, I can already tell it might require me to rewrite our FullHttpResponse handling, since that currently handles does its own finalizeResponse call in parallel with this publisher to complete the future successfully...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Out of scope if it requires any significant refactoring. In general I would consider channel attribute usage a code smell and prefer to try to isolate behavior to standalone handlers, but understood if it's not feasible with the current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good feedback. I'll see what I can do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems like FullResponseContentPublisher and PublisherAdapter do their own error handling, so I can't decorate them with something else and expect it to do anything. The onError just gets fired out to the customer's publisher and the netty client ignores it. I'd have to break up those two publishers to put their error handling downstream from the publishers themselves. It would be good to do that, but it's a significant refactor.
return Optional.of(new IOException("Response had content-length of " + contentLengthHeader + " bytes, but only received " | ||
+ actualContentLength + " before the connection was closed.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if Transfer-Encoding: chunked
is being used, but we don't receive the final (0 byte) chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what happens if an HTTP/2 channel is closed without an endOfStream
flag? Does Netty throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a different edge case we might need to add validation for. That's much rarer, since we rarely use chunked encoding. This particular code only goes into effect when the content-length is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we also need to test the end-of-stream flag as well.
/** | ||
* Diagnostic information that may be useful to help with debugging during error scenarios. | ||
*/ | ||
@SdkInternalApi | ||
public class ChannelDiagnostics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting way to expose more helpful information. I like the idea. However, if it grows in size over time, we may want to consider only emitting it if, e.g., DEBUG logging is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree we need to be careful about it.
new FullResponseContentPublisher(channelContext, | ||
bb, ef))); | ||
|
||
Optional<IOException> contentLengthFailure = validateResponseContentLength(channelContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think try/catch may be more conventional than Optional
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
} finally { | ||
Optional.ofNullable(fullContent).ifPresent(ByteBuf::release); | ||
} | ||
} | ||
} | ||
|
||
private Long responseContentLength(HttpResponse response) { | ||
String length = response.headers().get("content-length"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: io.netty.handler.codec.http.HttpHeaderNames.CONTENT_LENGTH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
return Optional.of(new IOException("Response had content-length of " + contentLengthHeader + " bytes, but only received " | ||
+ actualContentLength + " before the connection was closed.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "but only received X bytes before"
nit: should we emit a different/special message if we received 0 bytes?
nit: should we wrap the IOException
inside of Netty's PrematureChannelClosureException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would need to be the other way around: IOException on the outside, so it's seen a retryable. PrematureChannelClosureException is a RuntimeException. Does that change your mind on it? Should I use that inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't our default retry logic check 1-level of causes?
What happens if Netty does actually throw a PrematureChannelClosureException
itself? E.g., here. Do we retry on that? It seems like we would want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does check 1 level of causes, but I was under the impression that the contract for ExecutableHttpRequest.call
is that only IOException will be retried, not necessarily one level of exception. I don't know if I have the wrong impression, or if there's further wrapping that happens upstream before the retry logic that necessitates checking 1 level deeper.
da0b052
to
c832143
Compare
26a01a3
to
cd5c184
Compare
@Override | ||
public String toString() { | ||
return ToString.builder("ChannelDiagnostics") | ||
.add("creationTime", channelCreationTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would channel age be helpful here? It may be able to be inferred by subtracting creation time from a log timestamp time, but it seems age may typically be more relevant than creation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I can remove creationTime and replace it with channelAge. I'd avoid both for what you mention - redundant information.
makeRequest(); | ||
try { | ||
configureChannel(); | ||
if (tryConfigurePipeline()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make tryConfigurePipeline
throw directly? This would wrap all of its exceptions with Failed to initiate request to...
. Or do you want to reserve that for less explicit failure cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not following.
I added this try/catch because exceptions raised within this runnable go unreported and the completable future returned to the customer never returns (likely causing a deadlock in the customer's code). This change could be in a separate commit, since it's unrelated/unneeded to this change.
I just added it here because I ran into this bug during implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryConfigurePipeline
raises several exceptions itself (by also invoking handleFailure
). I was just wondering if we should wrap all of them with this new logic for the sake of consistency (by changing tryConfigurePipeline
to simply throw
). But perhaps that just unnecessarily nests the more pertinent error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see. We can combine them, yeah. Good idea.
} | ||
|
||
@Override | ||
public String toString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include the Channel
itself here? Primarily for remote address information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
if (isAcquireTimeoutException(originalCause)) { | ||
return new Throwable(getMessageForAcquireTimeoutException(), originalCause); | ||
} else if (isTooManyPendingAcquiresException(originalCause)) { | ||
return new Throwable(getMessageForTooManyAcquireOperationsError(), originalCause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR specifically but I don't think we should instantiate a literal Throwable
here. Would be more conventional to use Exception
or RuntimeException
for the generic case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure why we use throwable
validateResponseContentLength(channelContext); | ||
finalizeResponse(requestContext, channelContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If validateResponseContentLength
throws we do not call finalizeResponse
. Likewise for streaming. Do we still release the channel in that scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exceptionCaught closes and releases the channel.
private static final Logger log = Logger.loggerFor(NettyUtils.class); | ||
|
||
private NettyUtils() { | ||
} | ||
|
||
public static Throwable decorateException(Channel channel, Throwable originalCause) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We almost have enough exception-related logic in this class to consider a NettyExceptionUtils
or similar.
import software.amazon.awssdk.utils.AttributeMap; | ||
import software.amazon.awssdk.utils.Logger; | ||
|
||
@RunWith(Parameterized.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider JUnit 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we on junit 5 for sure now? I honestly need to learn it, if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both 4 and 5 work fine, but IMO we should try to ensure newly created tests are on 5 so that we don't create more future migration work. JUnit 5 parameterization is also a lot cleaner IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to junit 5. I don't like not being able to use the test parameter in the before-each statement, but otherwise it seems similar to me.
return Arrays.asList(new TestCase(CloseTime.DURING_INIT, "The connection was closed during the request."), | ||
new TestCase(CloseTime.BEFORE_SSL_HANDSHAKE, "The connection was closed during the request."), | ||
new TestCase(CloseTime.DURING_SSL_HANDSHAKE, "The connection was closed during the request."), | ||
new TestCase(CloseTime.BEFORE_REQUEST_PAYLOAD, "The connection was closed during the request."), | ||
new TestCase(CloseTime.DURING_REQUEST_PAYLOAD, "The connection was closed during the request."), | ||
new TestCase(CloseTime.BEFORE_RESPONSE_HEADERS, "The connection was closed during the request."), | ||
new TestCase(CloseTime.BEFORE_RESPONSE_PAYLOAD, "Response had content-length"), | ||
new TestCase(CloseTime.DURING_RESPONSE_PAYLOAD, "Response had content-length")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome test cases. Can we come up with more distinctive wording for the first 6 test cases, or out of scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this (numbers are not aligned to test cases):
- The connection was closed while it was being initialized.
- The connection was closed during the TLS handshake process.
- The connection was closed before the request could be sent.
- The connection was closed while sending the request body.
- The connection was closed while waiting to receive a response.
- The connection was closed while receiving the response body.
This type of wording could greatly help debugging certain failure scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant can we change the user-facing wording used in the actual exception? I agree the test case enums are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, we can talk offline. I tried super hard to do that originally (that's why these test cases are structured like that), and it just doesn't work like that. The client thinks it's waiting for a response when the service hasn't even finished the SSL handshake.
@Test | ||
public void closeTimeHasCorrectMessage() { | ||
server.closeTime = testCase.closeTime; | ||
assertThat(captureException()).hasMessageContaining(testCase.errorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider assertThatThrownBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want the assertions running against the Error throwables that could be raised by the logic in captureException.
|
||
private static class TestCase { | ||
private CloseTime closeTime; | ||
private String errorMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe errorSubstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
be87573
to
d6754fe
Compare
public String toString() { | ||
return ToString.builder("ChannelDiagnostics") | ||
.add("channel", channel) | ||
.add("channelAge", Duration.between(channelCreationTime, Instant.now()).getSeconds() + " seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to just use the default Duration#toString
format, even if it's not the prettiest. Flooring to seconds may lose a lot of granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
b116a36
to
8b63113
Compare
…vice actually returns. Also update error messaging in the event of service-side connection closes, and add testing to protect these error messages against regression.
8b63113
to
fe78581
Compare
Kudos, SonarCloud Quality Gate passed! |
…vice actually returns.
Also update error messaging in the event of service-side connection closes, and add testing to protect these error messages against regression.