-
Notifications
You must be signed in to change notification settings - Fork 916
[TM DownloadDirectory Part2] Implement download directory in transfer manager #3010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TM DownloadDirectory Part2] Implement download directory in transfer manager #3010
Conversation
9f07296
to
994a820
Compare
...va/software/amazon/awssdk/transfer/s3/S3TransferManagerDownloadDirectoryIntegrationTest.java
Outdated
Show resolved
Hide resolved
@@ -281,8 +281,11 @@ default DirectoryUpload uploadDirectory(Consumer<UploadDirectoryRequest.Builder> | |||
* bucket will be downloaded. | |||
* | |||
* <p> | |||
* The SDK will create the destination directory if it does not already exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note for what happens if (some) files already exist?
On that same note, should we provide an option to overwrite-existing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the transfer will fail if the corresponding file already exists. We can support overwrite-existing in the future if there's a ask for it. We'd need to make changes in FileAsyncTransformer
to expose that option
Will add a note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be almost guaranteed that there will be user-demand for this feature on something like download directory. It allows users to treat it like "sync" and it would allow it to behave symmetrical to upload directory (where S3 will not reject overwrites by default).
To be honest, I wish the default behavior were to overwrite-if-exists. I know that's not easily changeable, but I wonder if we should reconsider the default for new features like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we have to add overwrite-if-exists to support pause and resume before GA anyway. We can revisit the default behavior for download directory once we have that.
...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java
Outdated
Show resolved
Hide resolved
...nager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DownloadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...nager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DownloadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
.../src/main/java/software/amazon/awssdk/transfer/s3/internal/ListObjectsRecursivelyHelper.java
Outdated
Show resolved
Hide resolved
.../src/main/java/software/amazon/awssdk/transfer/s3/internal/ListObjectsRecursivelyHelper.java
Outdated
Show resolved
Hide resolved
...va/software/amazon/awssdk/transfer/s3/internal/DownloadDirectoryHelperParameterizedTest.java
Outdated
Show resolved
Hide resolved
.../src/main/java/software/amazon/awssdk/transfer/s3/internal/ListObjectsRecursivelyHelper.java
Outdated
Show resolved
Hide resolved
...va/software/amazon/awssdk/transfer/s3/internal/DownloadDirectoryHelperParameterizedTest.java
Outdated
Show resolved
Hide resolved
|
||
|
||
downloadDirectoryRequest.overrideConfiguration() | ||
.map(DownloadDirectoryOverrideConfiguration::downloadFileRequestTransformer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<Bit off topic but want to write down my thoughts so we can capture them somewhere>
Even though it's not an immediately planned feature/concern, I'm trying to think what it would look like to extend this download logic to simulate "sync" logic. E.g.,
A s3 object will require copying if the sizes of the two s3 objects differ, the last modified time of the source is newer than the last modified time of the destination, or the s3 object does not exist under the specified bucket and prefix destination.
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html#examples
I don't think the current downloadFileRequestTransformer
is extensible enough to support something like this. Should we consider something like a BiPredicate<S3Object, Path>
that allows users to declare if an individual download should be allowed to execute or not?
If we supported something like that, then in theory, implementing sync-download logic would be as simple as implementing the BiPredicate
to replicate the behavior quoted above. And users might be able to benefit from it for other similar use cases, beyond just sync.
It's slightly more complicated for the sync-upload case, because it probably requires performing an extra HEAD request per-object, but sync-download seems easier to achieve merely by extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, downloadFileRequestTransformer
works for the case where customers want to add the same configurations or parameters to all downloadFileRequest
s, but it doesn't support filtering. Something like execution interceptor might be more flexible where they can apply the transformer for certain requests and filter out some requests. I'll remove it for now and create a task for it
...nager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DownloadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
log.debug(() -> "s3Object key: " + s3Object.key()); | ||
futures.add(downloadSingleFile(downloadDirectoryRequest, | ||
failedFileDownloads, | ||
s3Object)); | ||
}).whenComplete((r, t) -> { | ||
if (t != null) { | ||
returnFuture.completeExceptionally(SdkClientException.create("Failed to call ListObjectsV2 ", t)); | ||
returnFuture.completeExceptionally(SdkClientException.create("Failed to call ListObjectsV2", t)); | ||
cleanUpDirectory(downloadDirectoryRequest.destinationDirectory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably only call this if the SDK was responsible for creating the directory. If the directory already existed, I don't think we should delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To close the loop, discussed offline and agreed that we should move the logic to create directories to FileAsyncResponseTransformer and FileResponseTransformer. #3018
SonarCloud Quality Gate failed. |
* [TM DownloadDirectory Part1] Create POJO classes for download directory (#2993) * Create POJO classes for download directory * Address feedback * Address comments * A couple of minor refactoring on the S3TransferManager (#2997) * [TM DownloadDirectory Part2] Implement download directory in transfer manager (#3010) * Implement download directory in transfer manager * Add more tests and address comments * Remove create parent directories logic and add changelog entry * [TM DownloadDirectory Part3] Various updates on downloadDirectory (#3020) * Implement download directory in transfer manager * Add more tests and address comments * Remove create parent directories logic and add changelog entry * Various updates on downloadDirectory 1. limit the number of concurrent download file 2. create nonexistent parent directories 3. normalize key if prefix is not empty * By default, set delimiter to null to avoid potentially excessive listObjectsV2 calls * Move async buffering logic to SdkPublisher * Move AsyncBufferingSubscriber back to s3-tranfer-manager module and address feedback * remove unnecessary isDelivering.set(false)
…f26d07861 Pull request: release <- staging/a7ad223f-cc6b-4682-a3d9-babf26d07861
Modifications
Implement download directory in transfer manager
Implementation Notes:
ListObjectsRecursivelyHelper
### Pending Items:- Add tests forListObjectsRecursivelyHelper
- Add more JavadocsTesting
Added unit tests and integ tests
Screenshots (if appropriate)
Types of changes
Checklist
mvn install
succeedsscripts/new-change
script and following the instructions. Commit the new file created by the script in.changes/next-release
with your changes.License