Skip to content

Support multi-part uploads #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 22, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ examples/tensorflow/distributed_mnist/data
doc/_build
**/.DS_Store
venv/
*~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this make it into the commit message as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I accidentally added an emacs file on our fork. You mean you want a new commit in this PR with a message about that change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good addition, happy for it to be part of this PR. If we leave it in this PR, can the commit message just reference the addition. Something like

"Upload files to S3 using multipart uploads.

Emacs temporary files covered in .gitignore"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm happy to add it. Any idea if amending my commit message and force pushing to our fork will do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be fine

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that worked

3 changes: 1 addition & 2 deletions src/sagemaker/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,7 @@ def upload_data(self, path, bucket=None, key_prefix='data'):
s3 = self.boto_session.resource('s3')

for local_path, s3_key in files:
with open(local_path, 'rb') as f:
s3.Object(bucket, s3_key).put(Body=f)
s3.Object(bucket, s3_key).upload_file(local_path)

s3_uri = 's3://{}/{}'.format(bucket, key_prefix)
# If a specific file was used as input (instead of a directory), we return the full S3 key
Expand Down
8 changes: 4 additions & 4 deletions tests/unit/test_upload_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ def sagemaker_session():
def test_upload_data_absolute_dir(sagemaker_session):
result_s3_uri = sagemaker_session.upload_data(UPLOAD_DATA_TESTS_FILES_DIR)

uploaded_files = [kwargs['Body'].name for name, args, kwargs in sagemaker_session.boto_session.mock_calls
if name == 'resource().Object().put']
uploaded_files = [args[0] for name, args, kwargs in sagemaker_session.boto_session.mock_calls
if name == 'resource().Object().upload_file']
assert result_s3_uri == 's3://{}/data'.format(BUCKET_NAME)
assert len(uploaded_files) == 4
for file in uploaded_files:
Expand All @@ -45,8 +45,8 @@ def test_upload_data_absolute_dir(sagemaker_session):
def test_upload_data_absolute_file(sagemaker_session):
result_s3_uri = sagemaker_session.upload_data(UPLOAD_DATA_TESTS_SINGLE_FILE)

uploaded_files = [kwargs['Body'].name for name, args, kwargs in sagemaker_session.boto_session.mock_calls
if name == 'resource().Object().put']
uploaded_files = [args[0] for name, args, kwargs in sagemaker_session.boto_session.mock_calls
if name == 'resource().Object().upload_file']
assert result_s3_uri == 's3://{}/data/{}'.format(BUCKET_NAME, SINGLE_FILE_NAME)
assert len(uploaded_files) == 1
assert os.path.exists(uploaded_files[0])