Skip to content

feat: s3 transfer manager v2 #3079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

yenfryherrerafeliz
Copy link
Contributor

@yenfryherrerafeliz yenfryherrerafeliz commented Feb 6, 2025

Description of changes:

  • Transfer Manager V2 implementation.
  • This changes includes:
    • Basic S3 Transfer Manager Client.
    • Multipart Download Functionality
    • Progress Tracker interfaces, along with a default console progress tracker.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This is an initial phase for the s3 transfer manager v2, which includes:
- Progress Tracker with a default Console Progres Bar.
- Dedicated Multipart Download Listener for listen to events specificly to multipart download.
- Generic Transfer Listener that will be used in either a multipart upload or a multipart download. The progress tracker is dependant on the Generic Transfer Listener, and when enabled it uses the same parameter to be provided as the progress tracker. This is important because if there is a need for listening to transfer specific events and also track the progress then, a custom implementation must be done that incorporate those two needs together, otherwise one of each other must be used.
- Single Object Download
- Multipart Objet Download

This initial implementation misses the test cases.
- Refactor set a single argument, even when not exists, in the console progress bar.
- Add a specific parameter for showing the progress rendering defaulted to STDOUT.
- Add test cases for ConsoleProgressBar.
- Add test cases for DefaultProgressTracker.
- Add test cases for ObjectProgressTracker.
- Add test cases for TransferListener.
- Add test cases for multipart download listener.
- Add a trait to the MultipartDownloader implementation to keep the main implementatio cleaner.
- Add test cases for multipart downloader, in specific testing part and range get multipart downloader.
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good on the first pass- there are some nits like function braces needing newlines, new files needing newlines, naming conventions, etc. also had some questions about design

Refactor:
- Moves opening braces into a new line.
- Make requestArgs an optional argument.
- Remove unnecessary traits.
- Use traditional declarations.

Adds:
- Download directory feature.
Refactor:
- Add a message placeholder for progress status. For example in case of errors.

Adds:
- Upload feature, missing multipart functionality.
- Add upload directory feature
- Add a dedicated multipart upload implementation
- Add transfer progress to multipart upload
- Add upload directory with the required options.
- Create specific response models for upload, and upload directory.
- Add multipart upload test cases.
- Fix transfer listener completation eval.
Short namespace from `Aws\S3\Features\S3Transfer` to `Aws\S3\S3Transfer`.
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments- I think a few from the last round were left addressed also. I'd do another check for function opening braces (needing to be moved to a new line) and new files that are missing a newline at the end. More test classes needed as well but I'm assuming those are on the way

/**
* @return Closure|ProgressBarFactory
*/
private function defaultProgressBarFactory(): Closure| ProgressBarFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever return an instance of ProgressBarFactory?

- Implement progress tracker based on SEP spec.
- Add a default progress bar implementation.
- Add different progress tracker formats:
-- Plain progress format: [|progress_bar|] |percent|%
-- Transfer progress format: [|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit|
-- Colored progress format: |object_name|:\n\033|color_code|[|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit| |message|\033[0m
- Add a default single progress tracker implementation.
- Add a default multi progress tracker implementation for tracking directory transfers.
- Include tests unit just for console progress bar.
- Fixes current test cases for:
  - MultipartUploader
  - MultipartDownloader
  - ProgressTracker
- Remove progress bar color enum since the colors were moved into the specific format that requires them.
TransferListener must be tested from the implementations that extends and use this abstract class.
Add nullable type to listenerNotifier property in the MultipartUploader implementation.
- Tests for MultiProgressTracker
- Tests for SingleProgressTracker
- Tests for ProgressBarFormat
- Tests for TransferProgressSnapshot
- Tests for TransferListenerNotifier
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better- still needing some more unit tests, along with integ tests. Left some comments and nits on formatting. It seems each new file is missing a newline so I'd check those as well

- Refactor code to address some styling related feedback.
- Add upload and uploadDirectory unit tests.
- Fix MultipartUpload tests by increasing the part size from 1024 to 10240000 so it gets between the allowed part size range 5MB-5GBs.
- Rename tobe to to_be in the progress formatting.
- Add download tests
- Add download directory tests
- Minor naming refactor
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a few nits this time around. Still needing integ tests- will do another round of reviews once those are up.

- Add upload integ tests for:
 - Single uploads
 - Multipart uploads
 - Checksum in single uploads
 - Checksum in multipart uploads
- Add download integ tests for:
 - Single downloads
 - Multipart downloads
- Add integ tests for directory uploads
- Add integ tests for directory downloads
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, but the most important requests are: adding upload() and download() methods on the uploader and downloader classes that call promise() (similar to the old implementation)and testing the resolvesOutsideTargetDirectory logic.

I would do an audit of hard-coded values that can be moved to classes, line length (max 85 char) and all new files that do not end with a newline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can we add an upload() method that calls the promise() method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about abstracting some of the shared logic between this and the 🤐 class (when ready) into something like AbstractMultipartUploadManager or something like that

- Move some fixed values out of the methods into consts.
- Address a line exceeded 80 chars.
- Declare keys used across different implementations as consts.
- Fix keys declaration in TransferListener.php
- Make use of DIRECTORY_SEPARATOR const instead of hardcoding `/`
- Some implementations using TransferListener were missing the import statement.
- Add test cases for if a file being download `resolvesOutsideTargetDirectory`.
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Addressed some questions you had and asked a few more. I'm still seeing quite a lot of new files that don't have the newline on the end, so just make sure that's done at some point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about abstracting some of the shared logic between this and the 🤐 class (when ready) into something like AbstractMultipartUploadManager or something like that

- Improve how the parts are created in multipart uploader so that it looks cleaner.
- Create single class tests for PartGetMultipartDownloader and RangeGetMultipartDownloader.
- Add tests for TransferListenerNotifier from MultipartUploader and MultipartDownloader implementations.
When a part download fails we trigger downloadFailed so it can be propagated to the listeners, and then we retrhow the exception, however, we also have a global exception catching for if something else fails during a multipart download also gets caught and propagated to the listeners as well, however, this causes the downloadFailed to be called twice. To prevent this we just check if the current snapshot has already a error message present there.
- In MultipartUploader, when a part upload fails, the exception should be thrown, and it was not being to.
Copy link
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few comments about checksums (and their tests), download functionality, and a question about a class property. Still seeing some new files missing newlines at the end, so just make sure you go through those and add them where needed.

]
];
if ($this->containsChecksum($this->createMultipartArgs)) {
$completeMultipartUploadArgs['ChecksumType'] = 'FULL_OBJECT';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some assertions on the resulting x-amz-checksum-type header ensuring this value is provided. Also, I think we need to set this in the CreateMultipartUpload call as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also ensure we set the checksum in the appropriate header (x-amz-checksum-<lowercase_algorithm_name>) when calling CompleteMultipartUpload. This might already be done by middleware, so something to check on. In any case, we'll need to make some assertions on that key and value as well.

private int $calculatedObjectSize;

/** @var array */
private array $deferFns = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this property necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it for making sure some streams when necessary are closed and are not left dangling.

*
* @return PromiseInterface
*/
public function download(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should have recursive download functionality as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean with recursive download? this API method is intended for single file downloads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to add from the comment I made in the other file about the checksum headers/values— we should have assertions here and also in the unit tests (where they apply).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants