Skip to content

Correctly handle the case where the destination folder does not exist in local mode #458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 15, 2018
Merged

Conversation

derekhh
Copy link
Contributor

@derekhh derekhh commented Nov 6, 2018

Issue #, if available:

Description of changes:

Right now local mode would break if the output_path is a file:// but doesn't exist yet. This PR uses distutils.dir_util.copy_tree to handle this.

I've verified via a local job that the change fixes the issue.

Previous stack trace:

tmp8gc4u4zv_algo-1-RQ5K9_1_ce427f3ec602 exited with code 0
Aborting on container exit...
Traceback (most recent call last):
  File "sagemaker_train.py", line 107, in <module>
    submit_sagemaker_job(args.image_name, args.role_name, args.config_path)
  File "sagemaker_train.py", line 96, in submit_sagemaker_job
    estimator.fit(inputs=inputs, wait=False, job_name=job_name)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/estimator.py", line 209, in fit
    self.latest_training_job = _TrainingJob.start_new(self, inputs)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/estimator.py", line 444, in start_new
    tags=estimator.tags)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/session.py", line 277, in train
    self.sagemaker_client.create_training_job(**train_request)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/local_session.py", line 74, in create_training_job
    training_job.start(InputDataConfig, OutputDataConfig, hyperparameters, TrainingJobName)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/entities.py", line 70, in start
    self.model_artifacts = self.container.train(input_data_config, output_data_config, hyperparameters, job_name)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/image.py", line 132, in train
    artifacts = self.retrieve_artifacts(compose_data, output_data_config, job_name)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/image.py", line 238, in retrieve_artifacts
    self.sagemaker_session)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/utils.py", line 55, in move_to_destination
    recursive_copy(source, parsed_uri.path)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/site-packages/sagemaker/local/utils.py", line 81, in recursive_copy
    shutil.copy(os.path.join(current_path, file), os.path.join(target_path, file))
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/shutil.py", line 235, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/home/ubuntu/.virtualenvs/sagemaker-sdk_py3/lib/python3.5/shutil.py", line 115, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/model-output/./output.tar.gz'

With this change:

tmp28hhnmra_algo-1-JYEW7_1_759be53ea024 exited with code 0
Aborting on container exit...
WARNING:sagemaker.local.image:Failed to delete: /tmp/tmp28hhnmra/algo-1-JYEW7 Please remove it manually.
===== Job Complete =====

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-io
Copy link

codecov-io commented Nov 6, 2018

Codecov Report

Merging #458 into master will decrease coverage by 0.15%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #458      +/-   ##
==========================================
- Coverage   94.02%   93.87%   -0.16%     
==========================================
  Files          57       57              
  Lines        4269     4262       -7     
==========================================
- Hits         4014     4001      -13     
- Misses        255      261       +6
Impacted Files Coverage Δ
src/sagemaker/local/utils.py 96.15% <100%> (-0.82%) ⬇️
src/sagemaker/local/image.py 88.08% <0%> (-1.89%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46b10fa...7c143f3. Read the comment docs.

@nadiaya nadiaya requested a review from iquintero November 6, 2018 07:30
@derekhh
Copy link
Contributor Author

derekhh commented Nov 8, 2018

Kindly ping @iquintero ? Thanks!

Copy link
Contributor

@iquintero iquintero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a changelog entry? And a unit test for this would be nice as well. Otherwise looks good.

@derekhh derekhh changed the title Create destination folder if it doesn't exist in local mode Correctly handle the case where the destination folder does not exist in local mode Nov 14, 2018
@derekhh
Copy link
Contributor Author

derekhh commented Nov 14, 2018

@iquintero - Thanks for the review.

Actually, I've made some updates and used distutils.dir_util.copy_tree instead of the original implementation in recursive_copy. I've also updated the CHANGELOG and the unit test. Can you help take another look? Thanks!

@derekhh
Copy link
Contributor Author

derekhh commented Nov 14, 2018

@iquintero - Thanks for the approval! Can you also help merge this PR? I don't think I can merge it. :p

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018
@yangaws yangaws merged commit e0944e0 into aws:master Nov 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants