Skip to content

Commit e1feaa6

Browse files
committed
Lower memory consumption by streaming out request data
Prior to this change, the `UrlConnectionHttpClient` implementation of `SdkHttpClient` would unintentionally cause **the entire `RequestBody`** to be loaded into memory before being sent, even if the body had been made using `software.amazon.awssdk.core.sync.RequestBody.fromInputStream()` and thus was a stream of known content-length. Even worse, as the buffering `ByteArrayInputStream` received the request, it would have to repeatedly grow capacity by allocating new arrays and copying the data over, so peak memory usage would be 1.5x the size of the request data. For instance, streaming 1 GB of data to S3 with `putObject` would lead to a peak memory allocation of 1.5 GB. This large allocation of memory can be avoided by instead *streaming* the request data as it is sent. `java.net.HttpURLConnection` (the core Java class used by `UrlConnectionHttpClient` to implement `SdkHttpClient`) supports streaming the request, but will only do it if `setFixedLengthStreamingMode()` or `setChunkedStreamingMode()` has been called - this is true even if the `Content-Length` header has already been set. You can see this in the implementation of `sun.net.www.protocol.http.HttpURLConnection` where `streaming()` must be true to avoid **the whole request** being read into a `PosterOutputStream` (a simple extension of `ByteArrayInputStream`) before being sent: https://github.com/openjdk/jdk/blob/da75f3c4ad5bdf25167a3ed80e51f567ab3dbd01/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1373-L1394 If `Content-Length` is known, we're actually free to call `setFixedLengthStreamingMode()` on the connection to activate streaming, the only question is where & when to do it! Unfortunately `software.amazon.awssdk.http.urlconnection.UrlConnectionFactory`, which provides a way for end-users of the AWS SDK to customise the creation & configuration of the `HttpURLConnection`, _doesn't have access to the request or it's `Content-Length`, so can't call `setFixedLengthStreamingMode()`. In any case, this behaviour probably _always_ makes sense, so it doesn't make sense to leave it to users to customise it. That leaves the best place to do this being the `createAndConfigureConnection()` method, where the newly created `HttpURLConnection` is available and the `Content-Length` can be read if it's present. Consequently, I've updated `createAndConfigureConnection()` to call `setFixedLengthStreamingMode` if the `Content-Length` is not null, which is similar behaviour to the `ApacheHttpClient` SDK client: #2848 (comment) https://github.com/aws/aws-sdk-java-v2/blob/db0b45a0cd29603a1ea95b5e5f3d243755720309/http-clients/apache-client/src/main/java/software/amazon/awssdk/http/apache/internal/impl/ApacheHttpRequestFactory.java#L139-L160 At present, the only way to workaround this issue without a change to the AWS SDK is to create a custom `SdkHttpClient` that wraps `UrlConnectionHttpClient` and overrides `prepareRequest()`. As the `connection` field is not visible in the `ExecutableHttpRequest` object (it's defined on the private `software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient.RequestCallable` class), it's necessary to use reflection to extract the `connection` field before operating on it. An example of this in Scala code can be seen here: guardian/ophan-geoip-db-refresher@e2fc370#diff-5780ade773d1bdf8dd5dea40283e6fec88bacdb30c20c5b39955da1faa8e3ade Running the `ophan-geoip-db-refresher` code locally, the heap required is: * 560 MB (`-Xmx560m`) - _without_ the workaround, request fully in memory * 32 MB - with the workaround on, and so streaming enabled
1 parent ab51d8d commit e1feaa6

File tree

2 files changed

+10
-0
lines changed

2 files changed

+10
-0
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"category": "URL Connection Http Client",
3+
"contributor": "rtyley",
4+
"type": "feature",
5+
"description": "Lower memory consumption for HTTP requests by enabling fixed-length streaming mode."
6+
}

http-clients/url-connection-client/src/main/java/software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515

1616
package software.amazon.awssdk.http.urlconnection;
1717

18+
import static software.amazon.awssdk.http.Header.CONTENT_LENGTH;
1819
import static software.amazon.awssdk.http.HttpStatusFamily.CLIENT_ERROR;
1920
import static software.amazon.awssdk.http.HttpStatusFamily.SERVER_ERROR;
2021
import static software.amazon.awssdk.utils.FunctionalUtils.invokeSafely;
@@ -140,6 +141,9 @@ private HttpURLConnection createAndConfigureConnection(HttpExecuteRequest reques
140141
// See: https://github.com/aws/aws-sdk-java-v2/issues/975
141142
connection.setInstanceFollowRedirects(false);
142143

144+
request.httpRequest().firstMatchingHeader(CONTENT_LENGTH).map(Long::parseLong)
145+
.ifPresent(connection::setFixedLengthStreamingMode);
146+
143147
return connection;
144148
}
145149

0 commit comments

Comments
 (0)