Skip to content

doc: specify S3 source_dir needs to point to a tar file #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 15, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions src/sagemaker/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -1478,12 +1478,13 @@ def __init__(
>>> |----- test.py

You can assign entry_point='src/train.py'.
source_dir (str): Path (absolute, relative, or an S3 URI) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory are
preserved when training on Amazon SageMaker. If 'git_config' is
provided, 'source_dir' should be a relative location to a
directory in the Git repo. .. admonition:: Example
source_dir (str): Path (absolute, relative, or an S3 URI points to a
tar.gz) to a directory with any other training source code
dependencies aside from the entry point file (default: None).
Structure within this directory are preserved when training on
Amazon SageMaker. If 'git_config' is provided, 'source_dir'
should be a relative location to a directory in the Git repo.
.. admonition:: Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry the way you've worded it might be a bit ambiguous, i.e. one might think even the absolute/relative paths should point to a tar.gz file. How about adding another sentence after "Structure within...Amazon SageMaker." that's like, "If source_dir is an S3 URI, it must point to a tar.gz file"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better!


With the following GitHub repo directory structure:

Expand Down
12 changes: 6 additions & 6 deletions src/sagemaker/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -659,12 +659,12 @@ def __init__(
>>> |----- test.py

You can assign entry_point='src/inference.py'.
source_dir (str): Path (absolute or relative) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory will
be preserved when training on SageMaker. If 'git_config' is
provided, 'source_dir' should be a relative location to a
directory in the Git repo. If the directory points to S3, no
source_dir (str): Path (absolute or relative) to a directory or S3 URI
points to a tar.gz with any other training source code dependencies
aside from the entry point file (default: None). Structure within
this directory will be preserved when training on SageMaker. If
'git_config' is provided, 'source_dir' should be a relative location
to a directory in the Git repo. If the directory points to S3, no
code will be uploaded and the S3 location will be used instead.
.. admonition:: Example

Expand Down
8 changes: 4 additions & 4 deletions src/sagemaker/mxnet/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,10 @@ def __init__(
entry_point (str): Path (absolute or relative) to the Python source
file which should be executed as the entry point to training.
This should be compatible with either Python 2.7 or Python 3.5.
source_dir (str): Path (absolute or relative) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory are
preserved when training on Amazon SageMaker.
source_dir (str): Path (absolute or relative) to a directory or S3 URI
points to a tar.gz. with any other training source code dependencies
aside from the entry point file (default: None). Structure within
this directory are preserved when training on Amazon SageMaker.
hyperparameters (dict): Hyperparameters that will be used for
training (default: None). The hyperparameters are made
accessible as a dict[str, str] to the training code on
Expand Down
8 changes: 4 additions & 4 deletions src/sagemaker/pytorch/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,10 @@ def __init__(
entry_point (str): Path (absolute or relative) to the Python source
file which should be executed as the entry point to training.
This should be compatible with either Python 2.7 or Python 3.5.
source_dir (str): Path (absolute or relative) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory are
preserved when training on Amazon SageMaker.
source_dir (str): Path (absolute or relative) to a directory or S3 URI
points to a tar.gz. with any other training source code dependencies
aside from the entry point file (default: None). Structure within
this directory are preserved when training on Amazon SageMaker.
hyperparameters (dict): Hyperparameters that will be used for
training (default: None). The hyperparameters are made
accessible as a dict[str, str] to the training code on
Expand Down
8 changes: 4 additions & 4 deletions src/sagemaker/rl/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,10 @@ def __init__(
framework (sagemaker.rl.RLFramework): Framework (MXNet or
TensorFlow) you want to be used as a toolkit backed for
reinforcement learning training.
source_dir (str): Path (absolute or relative) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory is
preserved when training on Amazon SageMaker.
source_dir (str): Path (absolute or relative) to a directory or S3 URI
points to a tar.gz. with any other training source code dependencies
aside from the entry point file (default: None). Structure within
this directory are preserved when training on Amazon SageMaker.
hyperparameters (dict): Hyperparameters that will be used for
training (default: None). The hyperparameters are made
accessible as a dict[str, str] to the training code on
Expand Down
8 changes: 4 additions & 4 deletions src/sagemaker/sklearn/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@ def __init__(
framework_version (str): Scikit-learn version you want to use for
executing your model training code. List of supported versions
https://github.com/aws/sagemaker-python-sdk#sklearn-sagemaker-estimators
source_dir (str): Path (absolute or relative) to a directory with
any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory are
preserved when training on Amazon SageMaker.
source_dir (str): Path (absolute or relative) to a directory or S3 URI
points to a tar.gz. with any other training source code dependencies
aside from the entry point file (default: None). Structure within
this directory are preserved when training on Amazon SageMaker.
hyperparameters (dict): Hyperparameters that will be used for
training (default: None). The hyperparameters are made
accessible as a dict[str, str] to the training code on
Expand Down
7 changes: 4 additions & 3 deletions src/sagemaker/xgboost/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,10 @@ def __init__(
framework_version (str): XGBoost version you want to use for executing your model
training code. List of supported versions
https://github.com/aws/sagemaker-python-sdk#xgboost-sagemaker-estimators
source_dir (str): Path (absolute or relative) to a directory with any other training
source code dependencies aside from the entry point file (default: None).
Structure within this directory are preserved when training on Amazon SageMaker.
source_dir (str): Path (absolute or relative) to a directory or S3 URI points to a
tar.gz. with any other training source code dependencies aside from the entry
point file (default: None). Structure within this directory are preserved when
training on Amazon SageMaker.
hyperparameters (dict): Hyperparameters that will be used for training (default: None).
The hyperparameters are made accessible as a dict[str, str] to the training code
on SageMaker. For convenience, this accepts other types for keys and values, but
Expand Down