-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: deal with credentials for Git support for GitHub #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
1d5ace6
03b1561
2c43e38
ed74c76
de62e72
d11fa17
e34d726
0a13c8a
b323c5b
ff0bc4b
7e00560
10f4afb
3a203c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -185,16 +185,37 @@ Here is an example: | |
|
||
Git Support | ||
----------- | ||
If you have your training scripts in your GitHub repository, you can use them directly without the trouble to download | ||
them to local machine. Git support can be enabled simply by providing ``git_config`` parameter when initializing an | ||
estimator. If Git support is enabled, then ``entry_point``, ``source_dir`` and ``dependencies`` should all be relative | ||
paths in the Git repo. Note that if you decided to use Git support, then everything you need for ``entry_point``, | ||
``source_dir`` and ``dependencies`` should be in a single Git repo. | ||
If you have your training scripts in your GitHub (or GitHub-like) repository, you can use them directly without the | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
trouble to download them to local machine. Git support can be enabled simply by providing ``git_config`` parameter | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
when initializing an estimator. If Git support is enabled, then ``entry_point``, ``source_dir`` and ``dependencies`` | ||
should all be relative paths in the Git repo. Note that if you decided to use Git support, then everything you need | ||
for ``entry_point``, ``source_dir`` and ``dependencies`` should be in a single Git repo. | ||
|
||
Here are ways to specify ``git_config``: | ||
The ``git_config`` parameter includes arguments ``repo``, ``branch``, ``commit``, ``2FA_enabled``, ``username``, | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``password`` and ``token``. Except for ``repo``, the other arguments are optional. ``repo`` specifies the Git repository | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
that you want to use. If ``branch`` is not provided, master branch will be used. If ``commit`` is not provided, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
the latest commit in the required branch will be used. | ||
|
||
``2FA_enabled``, ``username``, ``password`` and ``token`` are for authentication purpose. ``2FA_enabled`` should | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
be ``True`` or ``False``, provides the information whether two-factor authentication is enabled for the GitHub (or GitHub-like) account. | ||
If ``2FA_enabled`` is not provided, we consider 2FA as disabled. | ||
|
||
If ``repo`` is an ssh url, you should either have no passphrase for the ssh key pairs, or have the ssh-agent configured | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
so that you will not be prompted for ssh passphrase when you do 'git clone' command with ssh urls. For ssh urls, it | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
makes no difference whether the 2FA is enabled or disabled. | ||
|
||
If ``repo`` is an https url, 2FA matters. When 2FA is disabled, either ``token`` or ``username``+``password`` will be | ||
used for authentication if provided (``token`` prioritized). When 2FA is enabled, only token will be used for | ||
authentication if provided. If required authentication info is not provided, python SDK will try to use local | ||
credentials storage to authenticate. If that fails either, an error message will be thrown. | ||
|
||
Here are some ways to specify ``git_config``: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please ask @eslesar-aws to review the writing. |
||
|
||
.. code:: python | ||
|
||
# The following three examples do not provide Git credentials, so python SDK will try to use | ||
# local credential storage. | ||
|
||
# Specifies the git_config parameter | ||
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git', | ||
'branch': 'branch1', | ||
|
@@ -209,6 +230,17 @@ Here are ways to specify ``git_config``: | |
# 'master' branch will be used. | ||
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git'} | ||
|
||
# This example does not provide '2FA_enabled', so 2FA is treated as disabled by default. 'username' and | ||
# 'password' are provided for authentication | ||
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git', | ||
'username': 'username', | ||
'password': 'passw0rd!'} | ||
|
||
# This example specifies that 2FA is enabled, and token is provided for authentication | ||
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git', | ||
'2FA_enabled': True, | ||
'token': 'your-token'} | ||
|
||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The following are some examples to define estimators with Git support: | ||
|
||
.. code:: python | ||
|
@@ -240,7 +272,6 @@ The following are some examples to define estimators with Git support: | |
train_instance_count=1, | ||
train_instance_type='ml.c4.xlarge') | ||
|
||
When Git support is enabled, users can still use local mode in the same way. | ||
mvsusp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Training Metrics | ||
---------------- | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,16 +16,27 @@ | |
import six | ||
import subprocess | ||
import tempfile | ||
import warnings | ||
from six.moves import urllib | ||
|
||
|
||
def git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None): | ||
"""Git clone repo containing the training code and serving code. This method also validate ``git_config``, | ||
and set ``entry_point``, ``source_dir`` and ``dependencies`` to the right file or directory in the repo cloned. | ||
|
||
Args: | ||
git_config (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. ``branch`` and ``commit`` are optional. If ``branch`` is not specified, master branch | ||
will be used. If ``commit`` is not specified, the latest commit in the required branch will be used. | ||
git_config (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch``, | ||
``commit``, ``2FA_enabled``, ``username``, ``password`` and ``token``. The fields are optional except | ||
``repo``. If ``branch`` is not specified, master branch will be used. If ``commit`` is not specified, | ||
the latest commit in the required branch will be used. ``2FA_enabled``, ``username``, ``password`` and | ||
``token`` are for authentication purpose. | ||
If ``2FA_enabled`` is not provided, we consider 2FA as disabled. For GitHub and GitHub-like repos, when | ||
ssh urls are provided, it does not make a difference whether 2FA is enabled or disabled; an ssh passphrase | ||
should be in local storage. When https urls are provided: if 2FA is disabled, then either token or | ||
username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, | ||
only token will be used for authentication if provided. If required authentication info is not provided, | ||
python SDK will try to use local credentials storage to authenticate. If that fails either, an error message | ||
will be thrown. | ||
entry_point (str): A relative location to the Python source file which should be executed as the entry point | ||
to training or model hosting in the Git repo. | ||
source_dir (str): A relative location to a directory with other training or model hosting source code | ||
|
@@ -41,16 +52,14 @@ def git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None): | |
ValueError: If 1. entry point specified does not exist in the repo | ||
2. source dir specified does not exist in the repo | ||
3. dependencies specified do not exist in the repo | ||
4. git_config is in bad format | ||
4. wrong format is provided for git_config | ||
|
||
Returns: | ||
dict: A dict that contains the updated values of entry_point, source_dir and dependencies | ||
dict: A dict that contains the updated values of entry_point, source_dir and dependencies. | ||
""" | ||
if entry_point is None: | ||
raise ValueError("Please provide an entry point.") | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_validate_git_config(git_config) | ||
repo_dir = tempfile.mkdtemp() | ||
subprocess.check_call(["git", "clone", git_config["repo"], repo_dir]) | ||
_generate_and_run_clone_command(git_config, repo_dir) | ||
|
||
_checkout_branch_and_commit(git_config, repo_dir) | ||
|
||
|
@@ -72,44 +81,182 @@ def git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None): | |
updated_paths["entry_point"] = os.path.join(repo_dir, entry_point) | ||
else: | ||
raise ValueError("Entry point does not exist in the repo.") | ||
|
||
updated_paths["dependencies"] = [] | ||
for path in dependencies: | ||
if os.path.exists(os.path.join(repo_dir, path)): | ||
updated_paths["dependencies"].append(os.path.join(repo_dir, path)) | ||
else: | ||
raise ValueError("Dependency {} does not exist in the repo.".format(path)) | ||
if dependencies is None: | ||
updated_paths["dependencies"] = None | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
updated_paths["dependencies"] = [] | ||
for path in dependencies: | ||
if os.path.exists(os.path.join(repo_dir, path)): | ||
updated_paths["dependencies"].append(os.path.join(repo_dir, path)) | ||
else: | ||
raise ValueError("Dependency {} does not exist in the repo.".format(path)) | ||
return updated_paths | ||
|
||
|
||
def _validate_git_config(git_config): | ||
"""check if a git_config param is valid | ||
if "repo" not in git_config: | ||
raise ValueError("Please provide a repo for git_config.") | ||
string_args = ["repo", "branch", "commit", "username", "password", "token"] | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for key in string_args: | ||
if key in git_config and not isinstance(git_config[key], six.string_types): | ||
raise ValueError("'{}' must be a string.".format(key)) | ||
if "2FA_enabled" in git_config and not isinstance(git_config["2FA_enabled"], bool): | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise ValueError("'2FA_enabled' must be a bool value.") | ||
allowed_keys = ["repo", "branch", "commit", "2FA_enabled", "username", "password", "token"] | ||
for k in list(git_config): | ||
if k not in allowed_keys: | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise ValueError("Unexpected git_config argument(s) provided!") | ||
|
||
|
||
def _generate_and_run_clone_command(git_config, repo_dir): | ||
"""check if a git_config param is valid, if it is, create the command to git clone the repo, and run it. | ||
|
||
Args: | ||
git_config ((dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. | ||
repo_dir (str): The local directory to clone the Git repo into. | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Raises: | ||
ValueError: If: | ||
1. git_config has no key 'repo' | ||
2. git_config['repo'] is in the wrong format. | ||
CalledProcessError: If failed to clone git repo. | ||
""" | ||
if "repo" not in git_config: | ||
raise ValueError("Please provide a repo for git_config.") | ||
allowed_keys = ["repo", "branch", "commit"] | ||
for key in allowed_keys: | ||
if key in git_config and not isinstance(git_config[key], six.string_types): | ||
raise ValueError("'{}' should be a string".format(key)) | ||
for key in git_config: | ||
if key not in allowed_keys: | ||
raise ValueError("Unexpected argument(s) provided for git_config!") | ||
exists = { | ||
"2FA_enabled": "2FA_enabled" in git_config and git_config["2FA_enabled"] is True, | ||
"username": "username" in git_config, | ||
"password": "password" in git_config, | ||
"token": "token" in git_config, | ||
} | ||
_clone_command_for_github_like(git_config, repo_dir, exists) | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def _clone_command_for_github_like(git_config, repo_dir, exists): | ||
"""check if a git_config param representing a GitHub (or like) repo is valid, if it is, create the command to | ||
git clone the repo, and run it. | ||
|
||
Args: | ||
git_config ((dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. | ||
repo_dir (str): The local directory to clone the Git repo into. | ||
|
||
Raises: | ||
ValueError: If git_config['repo'] is in the wrong format. | ||
CalledProcessError: If failed to clone git repo. | ||
""" | ||
is_https = git_config["repo"].startswith("https://") | ||
is_ssh = git_config["repo"].startswith("git@") | ||
if not is_https and not is_ssh: | ||
raise ValueError("Invalid Git url provided.") | ||
if is_ssh: | ||
_clone_command_for_github_like_ssh(git_config, repo_dir, exists) | ||
elif exists["2FA_enabled"]: | ||
_clone_command_for_github_like_https_2fa_enabled(git_config, repo_dir, exists) | ||
else: | ||
_clone_command_for_github_like_https_2fa_disabled(git_config, repo_dir, exists) | ||
|
||
|
||
def _clone_command_for_github_like_ssh(git_config, repo_dir, exists): | ||
if exists["username"] or exists["password"] or exists["token"]: | ||
warnings.warn("Unnecessary credential argument(s) provided.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you be more specific about which credentials are not used in the warning message? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "SSH cloning, authentication information in git config will be ignored." |
||
_run_clone_command(git_config["repo"], repo_dir) | ||
|
||
|
||
def _clone_command_for_github_like_https_2fa_disabled(git_config, repo_dir, exists): | ||
updated_url = git_config["repo"] | ||
if exists["token"]: | ||
if exists["username"] or exists["password"]: | ||
warnings.warn( | ||
"Using token for authentication, " | ||
"but unnecessary credential argument(s) provided." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
) | ||
updated_url = _insert_token_to_repo_url(url=git_config["repo"], token=git_config["token"]) | ||
elif exists["username"] and exists["password"]: | ||
updated_url = _insert_username_and_password_to_repo_url( | ||
url=git_config["repo"], username=git_config["username"], password=git_config["password"] | ||
) | ||
elif exists["username"] or exists["password"]: | ||
warnings.warn("Unnecessary credential argument(s) provided.") | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_run_clone_command(updated_url, repo_dir) | ||
|
||
|
||
def _clone_command_for_github_like_https_2fa_enabled(git_config, repo_dir, exists): | ||
updated_url = git_config["repo"] | ||
if exists["token"]: | ||
if exists["username"] or exists["password"]: | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
warnings.warn( | ||
"Using token for authentication, " | ||
"but unnecessary credential argument(s) provided." | ||
) | ||
updated_url = _insert_token_to_repo_url(url=git_config["repo"], token=git_config["token"]) | ||
elif exists["username"] or exists["password"] or exists["token"]: | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
warnings.warn( | ||
"Unnecessary credential argument(s) provided." | ||
"Hint: since two factor authentication is enabled, you have to provide token." | ||
) | ||
_run_clone_command(updated_url, repo_dir) | ||
|
||
|
||
def _run_clone_command(repo_url, repo_dir): | ||
"""Run the 'git clone' command with the repo url and the directory to clone the repo into. | ||
|
||
Args: | ||
repo_url (str): Git repo url to be cloned. | ||
repo_dir: (str): Local path where the repo should be cloned into. | ||
|
||
Raises: | ||
CalledProcessError: If failed to clone git repo. | ||
""" | ||
my_env = os.environ.copy() | ||
if repo_url.startswith("https://"): | ||
my_env["GIT_TERMINAL_PROMPT"] = "0" | ||
elif repo_url.startswith("git@"): | ||
f = tempfile.NamedTemporaryFile() | ||
w = open(f.name, "w") | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
w.write("ssh -oBatchMode=yes $@") | ||
w.close() | ||
# 511 in decimal is same as 777 in octal | ||
os.chmod(f.name, 511) | ||
my_env["GIT_SSH"] = f.name | ||
subprocess.check_call(["git", "clone", repo_url, repo_dir], env=my_env) | ||
|
||
|
||
def _insert_token_to_repo_url(url, token): | ||
"""Insert the token to the Git repo url, to make a component of the git clone command. This method can | ||
only be called when repo_url is an https url. | ||
|
||
Args: | ||
url (str): Git repo url where the token should be inserted into. | ||
token (str): Token to be inserted. | ||
|
||
Returns: | ||
str: the component needed fot the git clone command. | ||
""" | ||
index = len("https://") | ||
return url[:index] + token + "@" + url[index:] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tip: use replace url.replace('https://', 'https://' + token + '@') There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to handle the case which the token and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have dealt with that situation. Do we have the same concern for username and password? I am thinking maybe not because people usually won't first encode the password and then insert it into url. |
||
|
||
|
||
def _insert_username_and_password_to_repo_url(url, username, password): | ||
"""Insert the username and the password to the Git repo url, to make a component of the git clone command. | ||
This method can only be called when repo_url is an https url. | ||
|
||
Args: | ||
url (str): Git repo url where the token should be inserted into. | ||
username (str): Username to be inserted. | ||
password (str): Password to be inserted. | ||
|
||
Returns: | ||
str: the component needed fot the git clone command. | ||
GaryTu1020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
password = urllib.parse.quote_plus(password) | ||
# urllib parses ' ' as '+', but what we need is '%20' here | ||
password = password.replace("+", "%20") | ||
index = len("https://") | ||
return url[:index] + username + ":" + password + "@" + url[index:] | ||
|
||
|
||
def _checkout_branch_and_commit(git_config, repo_dir): | ||
"""Checkout the required branch and commit. | ||
|
||
Args: | ||
git_config: (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
git_config (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch`` | ||
and ``commit``. | ||
repo_dir (str): the directory where the repo is cloned | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.