Skip to content

fix: allow download_folder to download file even if bucket is more restricted #1295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 18, 2020

Conversation

laurenyu
Copy link
Contributor

Issue #, if available:
#1283

Description of changes:
S3 doesn't provide a way of definitively determining if a prefix is a folder, so originally we used ListObjects to try and gauge that. However, ListObjects requires extra permissions on the bucket, and so download a public file from a private bucket fails. In this PR, I've changed the logic to try and download the file first, and then if that incurs a 404 because it's a folder (no idea why that's how S3 chooses to implement this...), then the code will proceed with as if the prefix points to a folder.

Side note - I noticed that pylint was requiring me to put botocore in the wrong order for the imports. I'll fix that in a separate PR.

Testing done:
tried the example S3 path from #1283:

>>> import sagemaker
>>> from sagemaker import utils
>>> utils.download_folder('lausen-public', 'bert_sst.tar.gz', '/tmp', sagemaker.session.Session())
$ ls /tmp/bert_sst.tar.gz
/tmp/bert_sst.tar.gz

(also ran tox tests/unit)

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I used the commit message format described in CONTRIBUTING
  • I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@laurenyu laurenyu mentioned this pull request Feb 18, 2020
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository


# the prefix points to an s3 'directory' download the whole thing
# Assume the prefix points to an S3 'directory' and download the whole thing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Could moving it into a function and calling it from inside the except when object is not a file make it better?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

knakad
knakad previously approved these changes Feb 18, 2020
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@laurenyu laurenyu merged commit beb5a9a into aws:master Feb 18, 2020
@laurenyu laurenyu deleted the list-obj branch February 18, 2020 17:43
whittech1 pushed a commit to whittech1/sagemaker-python-sdk that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants