Update xgboost_customer_churn.ipynb (#86)

JunLyu · winstonaws · commit da0c41a293b6 · 2017-11-27T15:35:25.000-08:00
diff --git a/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb b/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb
@@ -19,14 +19,14 @@
     "1. [Train](#Train)\n",
     "1. [Host](#Host)\n",
     "  1. [Evaluate](#Evaluate)\n",
-    "  1. [Cost of Errors](#Cost of Errors)\n",
+    "  1. [Relative cost of errors](#Relative-cost-of-errors)\n",
     "1. [Extensions](#Extensions)\n",
     "\n",
     "---\n",
     "\n",
     "## Background\n",
     "\n",
-    "_This notebook has been adapted from an [AWS blog post](#https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/)_\n",
+    "_This notebook has been adapted from an [AWS blog post](https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/)_\n",
     "\n",
     "Losing customers is costly for any business.  Identifying unhappy customers early on gives you a chance to offer them incentives to stay.  This notebook describes using machine learning (ML) for the automated identification of unhappy customers, also known as customer churn prediction. ML models rarely give perfect predictions though, so this notebook is also about how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.\n",
     "\n",
@@ -48,6 +48,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "isConfigCell": true
    },
    "outputs": [],
@@ -73,7 +74,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "import pandas as pd\n",
@@ -179,7 +182,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "churn = churn.drop('Phone', axis=1)\n",
@@ -244,7 +249,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "churn = churn.drop(['Day Charge', 'Eve Charge', 'Night Charge', 'Intl Charge'], axis=1)"
@@ -266,7 +273,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "model_data = pd.get_dummies(churn)\n",
@@ -283,7 +292,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))])\n",
@@ -301,7 +312,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')\n",
@@ -321,7 +334,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',\n",
@@ -420,7 +435,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "xgb_predictor.content_type = 'text/csv'\n",
@@ -443,7 +460,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "def predict(data, rows=500):\n",
@@ -516,7 +535,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Se can see that changing the cutoff from 0.5 to 0.3 results in 1 more true positives, 3 more false positives, and 1 fewer false negatives.  The numbers are small overall here, but that's 6-10% of customers overall that are shifting because of a change to the cutoff.  Was this the right decision?  We may end up retaining 3 extra customers, but we also unnecessarily incentivized 5 more customers who would have stayed.  Determining optimal cutoffs is a key step in properly applying machine learning in a real-world setting.  Let's discuss this more broadly and then apply a specific, hypothetical solution for our current problem.\n",
+    "We can see that changing the cutoff from 0.5 to 0.3 results in 1 more true positives, 3 more false positives, and 1 fewer false negatives.  The numbers are small overall here, but that's 6-10% of customers overall that are shifting because of a change to the cutoff.  Was this the right decision?  We may end up retaining 3 extra customers, but we also unnecessarily incentivized 5 more customers who would have stayed.  Determining optimal cutoffs is a key step in properly applying machine learning in a real-world setting.  Let's discuss this more broadly and then apply a specific, hypothetical solution for our current problem.\n",
     "\n",
     "### Relative cost of errors\n",
     "\n",
@@ -616,7 +635,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Environment (conda_python3)",
+   "display_name": "conda_python3",
    "language": "python",
    "name": "conda_python3"
   },
@@ -630,7 +649,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.3"
+   "version": "3.6.2"
   },
   "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
  },