Skip to content

Specify task of training/tuning job in Airflow transform/deploy operator #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 10, 2019

Conversation

yangaws
Copy link
Contributor

@yangaws yangaws commented Jan 8, 2019

Issue #, if available:

Description of changes:

Specify task id and type of the training/tuning job in Airflow transform/endpoint operator. Then the API knows that which training job to transform/deploy if there are multiple jobs created in the DAG. To make this happen, additional changes are:

  1. Make default name of jobs (training/tuning/transform/model/endpoint) random instead of global(all same). This also enables retry feature.
  2. Remove the Airflow naming generation part in utils. Use the timestamp name postfix as before.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

CHANGELOG.rst Outdated
@@ -5,6 +5,7 @@ CHANGELOG
1.17.0.dev
==========

* enhancement: Specify tasks from which training/tuning operator to transform/deploy in related operators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: and something to denote this is for Airflow/workflows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Also updated other entries of Airflow in CHANGELOG.

if estimator.uploaded_code is None:
return

postfix = os.path.join('/', 'source', 'sourcedir.tar.gz')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're actually concerned about using the correct separator for the OS, then you should use os.path.sep instead of '/'. however, if this is something where we actually just need slashes, then you can just use a format string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line is removed since I used regex for the replace logic.

But it should be just '/' since it's in a s3 URI. Should be OS independent.

# s3://path/old_job/source/sourcedir.tar.gz will become s3://path/new_job/source/sourcedir.tar.gz
submit_uri = estimator.uploaded_code.s3_prefix
submit_uri = submit_uri[:len(submit_uri) - len(postfix)]
submit_uri = submit_uri[:submit_uri.rfind('/') + 1] + job_name + postfix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be potentially cleaner with a regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Updated using regex

training_job = "{{ ti.xcom_pull(task_ids='%s')['Tuning']['BestTrainingJob']['TrainingJobName'] }}" % task_id
# need to strip the double quotes in json to get the string
job_name = "{{ ti.xcom_pull(task_ids='%s')['Tuning']['TrainingJobDefinition']['StaticHyperParameters']" \
"['sagemaker_job_name'].replace('%s', '') }}" % (task_id, '"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make more sense to use strip instead of replace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Updated

@@ -369,16 +421,22 @@ def model_config(instance_type, model, role=None, image=None):
return config


def model_config_from_estimator(instance_type, estimator, role=None, image=None, model_server_workers=None,
vpc_config_override=vpc_utils.VPC_CONFIG_DEFAULT):
def model_config_from_estimator(instance_type, estimator, task_id, task_type, role=None, image=None, name=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change?

Copy link
Contributor Author

@yangaws yangaws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address most of the comments. Need to rethink the attach one and will provide updates tomorrow.

CHANGELOG.rst Outdated
@@ -5,6 +5,7 @@ CHANGELOG
1.17.0.dev
==========

* enhancement: Specify tasks from which training/tuning operator to transform/deploy in related operators
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Also updated other entries of Airflow in CHANGELOG.

if estimator.uploaded_code is None:
return

postfix = os.path.join('/', 'source', 'sourcedir.tar.gz')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line is removed since I used regex for the replace logic.

But it should be just '/' since it's in a s3 URI. Should be OS independent.

# s3://path/old_job/source/sourcedir.tar.gz will become s3://path/new_job/source/sourcedir.tar.gz
submit_uri = estimator.uploaded_code.s3_prefix
submit_uri = submit_uri[:len(submit_uri) - len(postfix)]
submit_uri = submit_uri[:submit_uri.rfind('/') + 1] + job_name + postfix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Updated using regex

training_job = "{{ ti.xcom_pull(task_ids='%s')['Tuning']['BestTrainingJob']['TrainingJobName'] }}" % task_id
# need to strip the double quotes in json to get the string
job_name = "{{ ti.xcom_pull(task_ids='%s')['Tuning']['TrainingJobDefinition']['StaticHyperParameters']" \
"['sagemaker_job_name'].replace('%s', '') }}" % (task_id, '"')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Updated

@codecov-io
Copy link

codecov-io commented Jan 9, 2019

Codecov Report

Merging #590 into master will decrease coverage by 0.03%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #590      +/-   ##
==========================================
- Coverage   92.78%   92.75%   -0.04%     
==========================================
  Files          71       71              
  Lines        5367     5386      +19     
==========================================
+ Hits         4980     4996      +16     
- Misses        387      390       +3
Impacted Files Coverage Δ
src/sagemaker/utils.py 92.5% <ø> (-0.53%) ⬇️
src/sagemaker/workflow/airflow.py 91.82% <92.3%> (-0.4%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fd7e2b...862979d. Read the comment docs.

@yangaws yangaws merged commit 63b852b into aws:master Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants