Skip to content

Commit da0c41a

Browse files
JunLyuwinstonaws
authored andcommitted
Update xgboost_customer_churn.ipynb (#86)
1 parent af8a848 commit da0c41a

File tree

1 file changed

+33
-14
lines changed

1 file changed

+33
-14
lines changed

introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,14 @@
1919
"1. [Train](#Train)\n",
2020
"1. [Host](#Host)\n",
2121
" 1. [Evaluate](#Evaluate)\n",
22-
" 1. [Cost of Errors](#Cost of Errors)\n",
22+
" 1. [Relative cost of errors](#Relative-cost-of-errors)\n",
2323
"1. [Extensions](#Extensions)\n",
2424
"\n",
2525
"---\n",
2626
"\n",
2727
"## Background\n",
2828
"\n",
29-
"_This notebook has been adapted from an [AWS blog post](#https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/)_\n",
29+
"_This notebook has been adapted from an [AWS blog post](https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/)_\n",
3030
"\n",
3131
"Losing customers is costly for any business. Identifying unhappy customers early on gives you a chance to offer them incentives to stay. This notebook describes using machine learning (ML) for the automated identification of unhappy customers, also known as customer churn prediction. ML models rarely give perfect predictions though, so this notebook is also about how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.\n",
3232
"\n",
@@ -48,6 +48,7 @@
4848
"cell_type": "code",
4949
"execution_count": null,
5050
"metadata": {
51+
"collapsed": true,
5152
"isConfigCell": true
5253
},
5354
"outputs": [],
@@ -73,7 +74,9 @@
7374
{
7475
"cell_type": "code",
7576
"execution_count": null,
76-
"metadata": {},
77+
"metadata": {
78+
"collapsed": true
79+
},
7780
"outputs": [],
7881
"source": [
7982
"import pandas as pd\n",
@@ -179,7 +182,9 @@
179182
{
180183
"cell_type": "code",
181184
"execution_count": null,
182-
"metadata": {},
185+
"metadata": {
186+
"collapsed": true
187+
},
183188
"outputs": [],
184189
"source": [
185190
"churn = churn.drop('Phone', axis=1)\n",
@@ -244,7 +249,9 @@
244249
{
245250
"cell_type": "code",
246251
"execution_count": null,
247-
"metadata": {},
252+
"metadata": {
253+
"collapsed": true
254+
},
248255
"outputs": [],
249256
"source": [
250257
"churn = churn.drop(['Day Charge', 'Eve Charge', 'Night Charge', 'Intl Charge'], axis=1)"
@@ -266,7 +273,9 @@
266273
{
267274
"cell_type": "code",
268275
"execution_count": null,
269-
"metadata": {},
276+
"metadata": {
277+
"collapsed": true
278+
},
270279
"outputs": [],
271280
"source": [
272281
"model_data = pd.get_dummies(churn)\n",
@@ -283,7 +292,9 @@
283292
{
284293
"cell_type": "code",
285294
"execution_count": null,
286-
"metadata": {},
295+
"metadata": {
296+
"collapsed": true
297+
},
287298
"outputs": [],
288299
"source": [
289300
"train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))])\n",
@@ -301,7 +312,9 @@
301312
{
302313
"cell_type": "code",
303314
"execution_count": null,
304-
"metadata": {},
315+
"metadata": {
316+
"collapsed": true
317+
},
305318
"outputs": [],
306319
"source": [
307320
"boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')\n",
@@ -321,7 +334,9 @@
321334
{
322335
"cell_type": "code",
323336
"execution_count": null,
324-
"metadata": {},
337+
"metadata": {
338+
"collapsed": true
339+
},
325340
"outputs": [],
326341
"source": [
327342
"containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',\n",
@@ -420,7 +435,9 @@
420435
{
421436
"cell_type": "code",
422437
"execution_count": null,
423-
"metadata": {},
438+
"metadata": {
439+
"collapsed": true
440+
},
424441
"outputs": [],
425442
"source": [
426443
"xgb_predictor.content_type = 'text/csv'\n",
@@ -443,7 +460,9 @@
443460
{
444461
"cell_type": "code",
445462
"execution_count": null,
446-
"metadata": {},
463+
"metadata": {
464+
"collapsed": true
465+
},
447466
"outputs": [],
448467
"source": [
449468
"def predict(data, rows=500):\n",
@@ -516,7 +535,7 @@
516535
"cell_type": "markdown",
517536
"metadata": {},
518537
"source": [
519-
"Se can see that changing the cutoff from 0.5 to 0.3 results in 1 more true positives, 3 more false positives, and 1 fewer false negatives. The numbers are small overall here, but that's 6-10% of customers overall that are shifting because of a change to the cutoff. Was this the right decision? We may end up retaining 3 extra customers, but we also unnecessarily incentivized 5 more customers who would have stayed. Determining optimal cutoffs is a key step in properly applying machine learning in a real-world setting. Let's discuss this more broadly and then apply a specific, hypothetical solution for our current problem.\n",
538+
"We can see that changing the cutoff from 0.5 to 0.3 results in 1 more true positives, 3 more false positives, and 1 fewer false negatives. The numbers are small overall here, but that's 6-10% of customers overall that are shifting because of a change to the cutoff. Was this the right decision? We may end up retaining 3 extra customers, but we also unnecessarily incentivized 5 more customers who would have stayed. Determining optimal cutoffs is a key step in properly applying machine learning in a real-world setting. Let's discuss this more broadly and then apply a specific, hypothetical solution for our current problem.\n",
520539
"\n",
521540
"### Relative cost of errors\n",
522541
"\n",
@@ -616,7 +635,7 @@
616635
],
617636
"metadata": {
618637
"kernelspec": {
619-
"display_name": "Environment (conda_python3)",
638+
"display_name": "conda_python3",
620639
"language": "python",
621640
"name": "conda_python3"
622641
},
@@ -630,7 +649,7 @@
630649
"name": "python",
631650
"nbconvert_exporter": "python",
632651
"pygments_lexer": "ipython3",
633-
"version": "3.6.3"
652+
"version": "3.6.2"
634653
},
635654
"notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
636655
},

0 commit comments

Comments
 (0)