-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: add git_config and git_clone, validate method #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
10d27c5
add git_config and validate method
db8652c
Merge branch 'master' of github.com:aws/sagemaker-python-sdk into clo…
6b78ed4
modify the order of git_config, add tests
e59bb79
move validate_git_config, add integ test
7808faa
modify location _git_clone_code called
2783c4a
add documentation
db3b69f
Merge branch 'master' of github.com:aws/sagemaker-python-sdk into clo…
f397850
Update doc/overview.rst
GaryTu1020 5b8d684
Update doc/overview.rst
GaryTu1020 a9e2932
add more integ tests
241ac92
write unit tests for git_utils
a81859a
fix conflict on overview.rst
c39c344
delete a line
068a7b1
modify an assertion in test_with_mxnet
2b1622b
add assertion to some test functions
28a5c58
remove deploy part in test_git
0797060
change testing git repo
e2e5c20
change the testing repo
c6daa5d
correct an error message
e8bede0
pull master
e5bd806
stop patching private methods
c1bae10
modified overview.rst, add lock for tests
2af9b24
slight change to overview.rst
e15a22d
Merge branch 'master' into clone_from_github
chuyang-deng b102563
add a comment for lock
9ae910e
merge with remote branch
3383bfc
Merge branch 'master' into clone_from_github
GaryTu1020 9a7f4e1
Merge branch 'master' into clone_from_github
chuyang-deng d4bb0bb
merge with master
e6a01f0
merge with master
0c5e32b
merge aws master
b6e75d0
merge with master
3621bd4
merge with master
c7af978
merge with aws master
c2f7a43
merge with aws master
0790f41
Merge branch 'master' into clone_from_github
mvsusp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"). You | ||
# may not use this file except in compliance with the License. A copy of | ||
# the License is located at | ||
# | ||
# http://aws.amazon.com/apache2.0/ | ||
# | ||
# or in the "license" file accompanying this file. This file is | ||
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF | ||
# ANY KIND, either express or implied. See the License for the specific | ||
# language governing permissions and limitations under the License. | ||
from __future__ import absolute_import | ||
|
||
import os | ||
import subprocess | ||
import tempfile | ||
|
||
|
||
def git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None): | ||
"""Git clone repo containing the training code and serving code. This method also validate ``git_config``, | ||
and set ``entry_point``, ``source_dir`` and ``dependencies`` to the right file or directory in the repo cloned. | ||
|
||
Args: | ||
git_config (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. ``branch`` and ``commit`` are optional. If ``branch`` is not specified, master branch | ||
will be used. If ``commit`` is not specified, the latest commit in the required branch will be used. | ||
entry_point (str): A relative location to the Python source file which should be executed as the entry point | ||
to training or model hosting in the Git repo. | ||
source_dir (str): A relative location to a directory with other training or model hosting source code | ||
dependencies aside from the entry point file in the Git repo (default: None). Structure within this | ||
directory are preserved when training on Amazon SageMaker. | ||
dependencies (list[str]): A list of relative locations to directories with any additional libraries that will | ||
be exported to the container in the Git repo (default: []). | ||
|
||
Raises: | ||
CalledProcessError: If 1. failed to clone git repo | ||
2. failed to checkout the required branch | ||
3. failed to checkout the required commit | ||
ValueError: If 1. entry point specified does not exist in the repo | ||
2. source dir specified does not exist in the repo | ||
|
||
Returns: | ||
dict: A dict that contains the updated values of entry_point, source_dir and dependencies | ||
""" | ||
_validate_git_config(git_config) | ||
repo_dir = tempfile.mkdtemp() | ||
subprocess.check_call(["git", "clone", git_config["repo"], repo_dir]) | ||
|
||
_checkout_branch_and_commit(git_config, repo_dir) | ||
|
||
ret = {"entry_point": entry_point, "source_dir": source_dir, "dependencies": dependencies} | ||
# check if the cloned repo contains entry point, source directory and dependencies | ||
if source_dir: | ||
if not os.path.isdir(os.path.join(repo_dir, source_dir)): | ||
raise ValueError("Source directory does not exist in the repo.") | ||
if not os.path.isfile(os.path.join(repo_dir, source_dir, entry_point)): | ||
raise ValueError("Entry point does not exist in the repo.") | ||
ret["source_dir"] = os.path.join(repo_dir, source_dir) | ||
else: | ||
if not os.path.isfile(os.path.join(repo_dir, entry_point)): | ||
raise ValueError("Entry point does not exist in the repo.") | ||
ret["entry_point"] = os.path.join(repo_dir, entry_point) | ||
|
||
ret["dependencies"] = [] | ||
for path in dependencies: | ||
if not os.path.exists(os.path.join(repo_dir, path)): | ||
raise ValueError("Dependency {} does not exist in the repo.".format(path)) | ||
ret["dependencies"].append(os.path.join(repo_dir, path)) | ||
return ret | ||
|
||
|
||
def _validate_git_config(git_config): | ||
"""check if a git_config param is valid | ||
|
||
Args: | ||
git_config ((dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. | ||
|
||
Raises: | ||
ValueError: If: | ||
1. git_config has no key 'repo' | ||
2. git_config['repo'] is in the wrong format. | ||
""" | ||
if "repo" not in git_config: | ||
raise ValueError("Please provide a repo for git_config.") | ||
|
||
|
||
def _checkout_branch_and_commit(git_config, repo_dir): | ||
"""Checkout the required branch and commit. | ||
|
||
Args: | ||
git_config: (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. | ||
repo_dir (str): the directory where the repo is cloned | ||
|
||
Raises: | ||
ValueError: If 1. entry point specified does not exist in the repo | ||
2. source dir specified does not exist in the repo | ||
""" | ||
if "branch" in git_config: | ||
subprocess.check_call(args=["git", "checkout", git_config["branch"]], cwd=str(repo_dir)) | ||
if "commit" in git_config: | ||
subprocess.check_call(args=["git", "checkout", git_config["commit"]], cwd=str(repo_dir)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"). You | ||
# may not use this file except in compliance with the License. A copy of | ||
# the License is located at | ||
# | ||
# http://aws.amazon.com/apache2.0/ | ||
# | ||
# or in the "license" file accompanying this file. This file is | ||
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF | ||
# ANY KIND, either express or implied. See the License for the specific | ||
# language governing permissions and limitations under the License. | ||
from __future__ import absolute_import | ||
|
||
import os | ||
|
||
import numpy | ||
import tempfile | ||
|
||
from tests.integ import lock as lock | ||
from sagemaker.mxnet.estimator import MXNet | ||
from sagemaker.pytorch.estimator import PyTorch | ||
from tests.integ import DATA_DIR, PYTHON_VERSION | ||
|
||
GIT_REPO = "https://github.com/aws/sagemaker-python-sdk.git" | ||
BRANCH = "test-branch-git-config" | ||
COMMIT = "329bfcf884482002c05ff7f44f62599ebc9f445a" | ||
|
||
# endpoint tests all use the same port, so we use this lock to prevent concurrent execution | ||
LOCK_PATH = os.path.join(tempfile.gettempdir(), "sagemaker_test_git_lock") | ||
|
||
|
||
def test_git_support_with_pytorch(sagemaker_local_session): | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
script_path = "mnist.py" | ||
data_path = os.path.join(DATA_DIR, "pytorch_mnist") | ||
git_config = {"repo": GIT_REPO, "branch": BRANCH, "commit": COMMIT} | ||
pytorch = PyTorch( | ||
entry_point=script_path, | ||
role="SageMakerRole", | ||
source_dir="pytorch", | ||
framework_version=PyTorch.LATEST_VERSION, | ||
py_version=PYTHON_VERSION, | ||
train_instance_count=1, | ||
train_instance_type="local", | ||
sagemaker_session=sagemaker_local_session, | ||
git_config=git_config, | ||
) | ||
|
||
pytorch.fit({"training": "file://" + os.path.join(data_path, "training")}) | ||
|
||
with lock.lock(LOCK_PATH): | ||
try: | ||
predictor = pytorch.deploy(initial_instance_count=1, instance_type="local") | ||
|
||
data = numpy.zeros(shape=(1, 1, 28, 28)).astype(numpy.float32) | ||
result = predictor.predict(data) | ||
assert result is not None | ||
finally: | ||
predictor.delete_endpoint() | ||
|
||
|
||
def test_git_support_with_mxnet(sagemaker_local_session, mxnet_full_version): | ||
script_path = "mnist.py" | ||
data_path = os.path.join(DATA_DIR, "mxnet_mnist") | ||
git_config = {"repo": GIT_REPO, "branch": BRANCH, "commit": COMMIT} | ||
dependencies = ["foo/bar.py"] | ||
mx = MXNet( | ||
entry_point=script_path, | ||
role="SageMakerRole", | ||
source_dir="mxnet", | ||
dependencies=dependencies, | ||
framework_version=MXNet.LATEST_VERSION, | ||
py_version=PYTHON_VERSION, | ||
train_instance_count=1, | ||
train_instance_type="local", | ||
sagemaker_session=sagemaker_local_session, | ||
git_config=git_config, | ||
) | ||
|
||
mx.fit( | ||
{ | ||
"train": "file://" + os.path.join(data_path, "train"), | ||
"test": "file://" + os.path.join(data_path, "test"), | ||
} | ||
) | ||
|
||
files = [file for file in os.listdir(mx.source_dir)] | ||
assert "some_file" in files | ||
assert "mnist.py" in files | ||
assert os.path.exists(mx.dependencies[0]) | ||
|
||
with lock.lock(LOCK_PATH): | ||
try: | ||
predictor = mx.deploy(initial_instance_count=1, instance_type="local") | ||
|
||
data = numpy.zeros(shape=(1, 1, 28, 28)) | ||
result = predictor.predict(data) | ||
assert result is not None | ||
finally: | ||
predictor.delete_endpoint() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.