Skip to content

Add autopilot customer churn notebook #1092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2020
Merged

Conversation

juliodelgadoaws
Copy link
Contributor

Similar to the XGBoost customer churn notebook, but using autopilot
instead.

Issue #, if available:

Description of changes:

To support Aurora ML blog (https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/) it is necessary to have a Customer Churn Autopilot example.

The blog currently references: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb but the resulting endpoint from this notebook is not usable from Aurora, since it does not include all the pre/post processing code generated.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Similar to the XGBoost customer churn notebook, but using autopilot
instead.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@pdasamzn
Copy link
Contributor

I would restrict the MaxCandidates to be less that 20 with explicit instruction on how to increase. https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobCompletionCriteria.html#sagemaker-Type-AutoMLJobCompletionCriteria-MaxCandidates

This would imply that we may not have 96% accuracy as mentioned in the later section of the doc.

@juliodelgadoaws
Copy link
Contributor Author

You can specify that the 'Churn?' contains 2 values & hence it's a binary classification problem. However, we don't need to specify as Autopilot will be able to infer it.

Make sense.

@juliodelgadoaws
Copy link
Contributor Author

juliodelgadoaws commented Mar 18, 2020

I would restrict the MaxCandidates to be less that 20 with explicit instruction on how to increase. https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobCompletionCriteria.html#sagemaker-Type-AutoMLJobCompletionCriteria-MaxCandidates

This would imply that we may not have 96% accuracy as mentioned in the later section of the doc.

Will modify and run a couple of times to make sure I have a correct ballpark value in the text for the accuracy.

So that the automl job completes within a few minutes we
now limit the max candidates to 20. We've also added a note
regarding the nature of the task, BinaryClassification, and
how AutoPilot is able to auto detect it.
@pdasamzn
Copy link
Contributor

lgtm!

Copy link
Contributor

@pdasamzn pdasamzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@laurencer laurencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@laurencer laurencer merged commit bc8d9d5 into aws:master Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants