Updated: Changes based on tomfaulhaber@ comments

djarpin · djarpin · commit 142ed7be7552 · 2017-11-16T16:11:18.000-08:00
diff --git a/kmeans_bring_your_own_model/kmeans_bring_your_own_model.ipynb b/kmeans_bring_your_own_model/kmeans_bring_your_own_model.ipynb
@@ -25,7 +25,7 @@
     "---\n",
     "## Background\n",
     "\n",
-    "Amazon SageMaker includes functionality to support a hosted notebook environment, distributed, serverless training, and real-time, autoscaling hosting.  We think it works best when all three of these services are used together, but they can also be used independently.  Some use cases may only require hosting.  Maybe the model was trained prior to Amazon SageMaker existing, in a different service.\n",
+    "Amazon SageMaker includes functionality to support a hosted notebook environment, distributed, managed training, and real-time, autoscaling hosting.  We think it works best when all three of these services are used together, but they can also be used independently.  Some use cases may only require hosting.  Maybe the model was trained prior to Amazon SageMaker existing, in a different service.\n",
     "\n",
     "This notebook shows how to use a pre-existing model with an Amazon SageMaker Algorithm container to quickly create a hosted endpoint for that model.\n",
     "\n",
@@ -34,9 +34,9 @@
     "\n",
     "Let's start by specifying:\n",
     "\n",
-    "* AWS region.\n",
-    "* The IAM role arn used to give learning and hosting access to your data. See the documentation for how to specify these.\n",
-    "* The S3 bucket that you want to use for training and model data."
+    "- AWS region.\n",
+    "- The IAM role arn used to give learning and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto call with a the appropriate full IAM role arn string.\n",
+    "- The S3 bucket that you want to use for training and model data."
    ]
   },
   {
diff --git a/linear_time_series_forecast/linear_time_series_forecast.ipynb b/linear_time_series_forecast/linear_time_series_forecast.ipynb
@@ -384,7 +384,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now let's kick off our training job in SageMaker's distributed, serverless training, using the parameters we just created.  Because training is serverless, we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."
+    "Now let's kick off our training job in SageMaker's distributed, managed training, using the parameters we just created.  Because training is managed (AWS handles spinning up and spinning down hardware), we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."
    ]
   },
   {
diff --git a/pca_kmeans_movie_clustering/pca_kmeans_movie_clustering.ipynb b/pca_kmeans_movie_clustering/pca_kmeans_movie_clustering.ipynb
@@ -349,7 +349,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now let's kick off our training job in SageMaker's distributed, serverless training, using the parameters we just created.  Because training is serverless, we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."
+    "Now let's kick off our training job in SageMaker's distributed, managed training, using the parameters we just created.  Because training is managed (AWS handles spinning up and spinning down the hardware), we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."
    ]
   },
   {
@@ -644,7 +644,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now invoke EASE for serverless training."
+    "Now invoke Amazon SageMaker for managed training."
    ]
   },
   {
diff --git a/r_bring_your_own/mars.R b/r_bring_your_own/mars.R
@@ -29,7 +29,7 @@ train <- function() {
 
     target <- training_params$target
 
-    if (is.null(training_params$degree)) {
+    if (!is.null(training_params$degree)) {
         degree <- as.numeric(training_params$degree)}
     else {
         degree <- 2}
diff --git a/r_bring_your_own/plumber.R b/r_bring_your_own/plumber.R
@@ -7,7 +7,7 @@ function() {
     list(status='200', code='200')}
 
 
-#' Echo the parameter that was sent in
+#' Parse input and return prediction from model
 #' @param req The http request sent
 #' @post /invocations
 function(req) {
diff --git a/r_bring_your_own/r_bring_your_own.ipynb b/r_bring_your_own/r_bring_your_own.ipynb
@@ -116,10 +116,10 @@
     "    # Read in hyperparameters\n",
     "    training_params <- read_json(param_path)\n",
     "\n",
-    "    target <- training_params$target\n",
+    "    target <- training_params\$target\n",
     "\n",
-    "    if (is.null(training_params$degree)) {\n",
-    "        degree <- as.numeric(training_params$degree)}\n",
+    "    if (!is.null(training_params\$degree)) {\n",
+    "        degree <- as.numeric(training_params\$degree)}\n",
     "    else {\n",
     "        degree <- 2}\n",
     "\n",
@@ -139,11 +139,11 @@
     "    \n",
     "    # Generate outputs\n",
     "    mars_model <- model[!(names(model) %in% c('x', 'residuals', 'fitted.values'))]\n",
-    "    attributes(mars_model)$class <- 'mars'\n",
+    "    attributes(mars_model)\$class <- 'mars'\n",
     "    save(mars_model, factor_levels, file=paste(model_path, 'mars_model.RData', sep='/'))\n",
     "    print(summary(mars_model))\n",
     "\n",
-    "    write.csv(model$fitted.values, paste(output_path, 'data/fitted_values.csv', sep='/'), row.names=FALSE)\n",
+    "    write.csv(model\$fitted.values, paste(output_path, 'data/fitted_values.csv', sep='/'), row.names=FALSE)\n",
     "    write('success', file=paste(output_path, 'success', sep='/'))}\n",
     "```"
    ]
@@ -158,7 +158,7 @@
     "# Setup scoring function\n",
     "serve <- function() {\n",
     "    app <- plumb(paste(prefix, 'plumber.R', sep='/'))\n",
-    "    app$run(host='0.0.0.0', port=8080)}\n",
+    "    app\$run(host='0.0.0.0', port=8080)}\n",
     "```"
    ]
   },
@@ -183,7 +183,7 @@
    "metadata": {},
    "source": [
     "### Serve\n",
-    "`plumber.R` uses the [plumber](https://www.rplumber.io/) package to create a light weight http server for processing requests in hosting.  Note the specific syntax, and see the plumber help docs for additional detail on more specialized use cases."
+    "`plumber.R` uses the [plumber](https://www.rplumber.io/) package to create a lightweight HTTP server for processing requests in hosting.  Note the specific syntax, and see the plumber help docs for additional detail on more specialized use cases."
    ]
   },
   {
@@ -217,7 +217,7 @@
     "    load(paste(model_path, 'mars_model.RData', sep='/'))\n",
     "\n",
     "    # Read in data\n",
-    "    conn <- textConnection(gsub('\\\\\\\\n', '\\n', req$postBody))\n",
+    "    conn <- textConnection(gsub('\\\\\\\\n', '\\n', req\$postBody))\n",
     "    data <- read.csv(conn)\n",
     "    close(conn)\n",
     "\n",
@@ -292,7 +292,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Note: Although we could, we'll avoid doing any preliminary transformations on the data, instead choosing to do those transformations inside the container.  This is not typicall the best practice for model efficiency, but provides some benefits in terms of flexibility._"
+    "_Note: Although we could, we'll avoid doing any preliminary transformations on the data, instead choosing to do those transformations inside the container.  This is not typically the best practice for model efficiency, but provides some benefits in terms of flexibility._"
    ]
   },
   {
@@ -369,7 +369,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now let's kick off our training job on EASE, using the parameters we just created.  Because training is serverless, we don't have to wait for our job to finish to continue, but for this case, let's setup a waiter so we can monitor the status of our training."
+    "Now let's kick off our training job on EASE, using the parameters we just created.  Because training is managed (AWS takes care of spinning up and spinning down the hardware), we don't have to wait for our job to finish to continue, but for this case, let's setup a waiter so we can monitor the status of our training."
    ]
   },
   {
@@ -538,7 +538,7 @@
     "\n",
     "This notebook showcases a straightforward example to train and host an R algorithm in Amazon SageMaker.  As mentioned previously, this notebook could also be written in R.  We could even train the algorithm entirely within a notebook and then simply use the serving portion of the container to host our model.\n",
     "\n",
-    "Other extensions could include setting up the R algorithm to train in parallel.  Although R is not the easiest language to build distributed applications on top of, this is possible.  In addition, running multiple versions of training simultaneously would allow for parallelized grid (or random) search for optimal hyperparamter settings.  This would more fully realize the benefits of serverless training."
+    "Other extensions could include setting up the R algorithm to train in parallel.  Although R is not the easiest language to build distributed applications on top of, this is possible.  In addition, running multiple versions of training simultaneously would allow for parallelized grid (or random) search for optimal hyperparamter settings.  This would more fully realize the benefits of managed training."
    ]
   }
  ],
diff --git a/xgboost_customer_churn/xgboost_customer_churn.ipynb b/xgboost_customer_churn/xgboost_customer_churn.ipynb
@@ -257,7 +257,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now that we've cleaned up our dataset, let's determine which algorithm to use.  As mentioned above, there appear to be some variables where both high and low (but not intermediate) values are predictive of churn.  In order to accommodate this in an algorithm like linear regression, we'd need to generate polynomial (or bucketed) terms.  Instead, let's attempt to model this problem using gradient boosted trees.  Amazon SageMaker provides an XGBoost container that we can use to train in a serverless, distributed setting, and then host as a real-time prediction endpoint.  XGBoost uses gradient boosted trees which naturally account for non-linear relationships between features and the target variable, as well as accommodating complex interactions between features.\n",
+    "Now that we've cleaned up our dataset, let's determine which algorithm to use.  As mentioned above, there appear to be some variables where both high and low (but not intermediate) values are predictive of churn.  In order to accommodate this in an algorithm like linear regression, we'd need to generate polynomial (or bucketed) terms.  Instead, let's attempt to model this problem using gradient boosted trees.  Amazon SageMaker provides an XGBoost container that we can use to train in a managed, distributed setting, and then host as a real-time prediction endpoint.  XGBoost uses gradient boosted trees which naturally account for non-linear relationships between features and the target variable, as well as accommodating complex interactions between features.\n",
     "\n",
     "Amazon SageMaker XGBoost can train on data in either a CSV or LibSVM format.  For this example, we'll stick with CSV.  It should:\n",
     "- Have the predictor variable in the first column\n",
@@ -324,7 +324,7 @@
     "---\n",
     "## Train\n",
     "\n",
-    "Moving onto training, we'll need to specify the following parameters to take advantage of Amazon SageMaker's serverless training:\n",
+    "Moving onto training, we'll need to specify the following parameters to take advantage of Amazon SageMaker's managed training:\n",
     "1. The role for Amazon SageMaker to use\n",
     "1. Our training job name\n",
     "1. The `xgboost` algorithm EC2 Container Repository location\n",
diff --git a/xgboost_direct_marketing/README.md b/xgboost_direct_marketing/README.md
@@ -4,4 +4,4 @@ This folder contains two notebooks:
 
 *xgboost_direct_marketing.ipynb:* is an introduction to machine learning for less technical users.  The task is the same as the Amazon ML [tutorial](http://docs.aws.amazon.com/machine-learning/latest/dg/tutorial.html), but showcases the flexibility of running this analysis in a notebook environment.
 
-*xgboost_direct_marketing_sagemaker.ipynb:* is very similar, but utilizes Amazon Amazon SageMaker concepts beyond the hosted notebook environment, relying on serverless, distributed training and creating a hosted endpoint for realtime predictions.
+*xgboost_direct_marketing_sagemaker.ipynb:* is very similar, but utilizes Amazon Amazon SageMaker concepts beyond the hosted notebook environment, relying on managed, distributed training and creating a hosted endpoint for realtime predictions.
diff --git a/xgboost_direct_marketing/xgboost_direct_marketing.ipynb b/xgboost_direct_marketing/xgboost_direct_marketing.ipynb
@@ -709,7 +709,7 @@
     "\n",
     "## Extensions\n",
     "\n",
-    "This example was contained within the Notebook environment entirely.  As data sizes grow, utilizing other Amazon SageMaker features such as distributed, serverless training and our hyperparameter optimization service makes more sense.  In addition, if the model needs to be used to provide real-time, online predictions, Amazon SageMakers's auto-scaling hosting should be used.  Please check out the other Amazon SageMaker direct marketing notebook for a more functionally detailed walkthrough of those features."
+    "This example was contained within the Notebook environment entirely.  As data sizes grow, utilizing other Amazon SageMaker features such as distributed, managed training and our hyperparameter optimization service makes more sense.  In addition, if the model needs to be used to provide real-time, online predictions, Amazon SageMakers's auto-scaling hosting should be used.  Please check out the other Amazon SageMaker direct marketing notebook for a more functionally detailed walkthrough of those features."
    ]
   }
  ],
diff --git a/xgboost_direct_marketing/xgboost_direct_marketing_sagemaker.ipynb b/xgboost_direct_marketing/xgboost_direct_marketing_sagemaker.ipynb
@@ -389,7 +389,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we'll copy the file to S3 for Amazon SageMaker's serverless training to pickup."
+    "Now we'll copy the file to S3 for Amazon SageMaker's managed training to pickup."
    ]
   },
   {
@@ -415,7 +415,7 @@
     "\n",
     "There are several intricacies to understanding the algorithm, but at a high level, gradient boosted trees works by combining predictions from many simple models, each of which tries to address the weaknesses of the previous models.  By doing this the collection of simple models can actually outperform large, complex models.  Other Amazon SageMaker notebooks elaborate on gradient boosting trees further and how they differ from similar algorithms.\n",
     "\n",
-    "`xgboost` is an extremely popular, open-source package for gradient boosted trees.  It is computationally powerful, fully featured, and has been successfully used in many machine learning competitions.  Let's start with a simple `xgboost` model, trained using Amazon SageMaker's serverless, distributed training framework.\n",
+    "`xgboost` is an extremely popular, open-source package for gradient boosted trees.  It is computationally powerful, fully featured, and has been successfully used in many machine learning competitions.  Let's start with a simple `xgboost` model, trained using Amazon SageMaker's managed, distributed training framework.\n",
     "\n",
     "First we'll need to specify training parameters.  This includes:\n",
     "1. The role to use\n",
@@ -723,7 +723,7 @@
     "\n",
     "## Extensions\n",
     "\n",
-    "This example analyzed a relatively small dataset, but utilized Amazon SageMaker features such as distributed, serverless training and highly available, autoscaling model hosting, which could easily be applied to much larger problems.  Please check out the other Amazon SageMaker direct marketing notebook for a more detailed walkthrough of improvements that could be made to the model (in particular tuning the model for better accuracy) and discussion of gradient boosting versus similar algorithms."
+    "This example analyzed a relatively small dataset, but utilized Amazon SageMaker features such as distributed, managed training and highly available, autoscaling model hosting, which could easily be applied to much larger problems.  Please check out the other Amazon SageMaker direct marketing notebook for a more detailed walkthrough of improvements that could be made to the model (in particular tuning the model for better accuracy) and discussion of gradient boosting versus similar algorithms."
    ]
   }
  ],

Original file line number	Diff line number	Diff line change
`@@ -384,7 +384,7 @@`
`384`	`384`	`"cell_type": "markdown",`
`385`	`385`	`"metadata": {},`
`386`	`386`	`"source": [`
`387`		`- "Now let's kick off our training job in SageMaker's distributed, serverless training, using the parameters we just created. Because training is serverless, we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."`
	`387`	`+ "Now let's kick off our training job in SageMaker's distributed, managed training, using the parameters we just created. Because training is managed (AWS handles spinning up and spinning down hardware), we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."`
`388`	`388`	`]`
`389`	`389`	`},`
`390`	`390`	`{`
Original file line number	Diff line number	Diff line change
`@@ -349,7 +349,7 @@`
`349`	`349`	`"cell_type": "markdown",`
`350`	`350`	`"metadata": {},`
`351`	`351`	`"source": [`
`352`		`- "Now let's kick off our training job in SageMaker's distributed, serverless training, using the parameters we just created. Because training is serverless, we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."`
	`352`	`+ "Now let's kick off our training job in SageMaker's distributed, managed training, using the parameters we just created. Because training is managed (AWS handles spinning up and spinning down the hardware), we don't have to wait for our job to finish to continue, but for this case, let's setup a while loop so we can monitor the status of our training."`
`353`	`353`	`]`
`354`	`354`	`},`
`355`	`355`	`{`
`@@ -644,7 +644,7 @@`
`644`	`644`	`"cell_type": "markdown",`
`645`	`645`	`"metadata": {},`
`646`	`646`	`"source": [`
`647`		`- "Now invoke EASE for serverless training."`
	`647`	`+ "Now invoke Amazon SageMaker for managed training."`
`648`	`648`	`]`
`649`	`649`	`},`
`650`	`650`	`{`
Original file line number	Diff line number	Diff line change
`@@ -4,4 +4,4 @@ This folder contains two notebooks:`
`4`	`4`
`5`	`5`	`xgboost_direct_marketing.ipynb: is an introduction to machine learning for less technical users. The task is the same as the Amazon ML [tutorial](http://docs.aws.amazon.com/machine-learning/latest/dg/tutorial.html), but showcases the flexibility of running this analysis in a notebook environment.`
`6`	`6`
`7`		`-xgboost_direct_marketing_sagemaker.ipynb: is very similar, but utilizes Amazon Amazon SageMaker concepts beyond the hosted notebook environment, relying on serverless, distributed training and creating a hosted endpoint for realtime predictions.`
	`7`	`+xgboost_direct_marketing_sagemaker.ipynb: is very similar, but utilizes Amazon Amazon SageMaker concepts beyond the hosted notebook environment, relying on managed, distributed training and creating a hosted endpoint for realtime predictions.`
Original file line number	Diff line number	Diff line change
`@@ -709,7 +709,7 @@`
`709`	`709`	`"\n",`
`710`	`710`	`"## Extensions\n",`
`711`	`711`	`"\n",`
`712`		- "This example was contained within the Notebook environment entirely. As data sizes grow, utilizing other Amazon SageMaker features such as distributed, serverless training and our hyperparameter optimization service makes more sense. In addition, if the model needs to be used to provide real-time, online predictions, Amazon SageMakers's auto-scaling hosting should be used. Please check out the other Amazon SageMaker direct marketing notebook for a more functionally detailed walkthrough of those features."
	`712`	+ "This example was contained within the Notebook environment entirely. As data sizes grow, utilizing other Amazon SageMaker features such as distributed, managed training and our hyperparameter optimization service makes more sense. In addition, if the model needs to be used to provide real-time, online predictions, Amazon SageMakers's auto-scaling hosting should be used. Please check out the other Amazon SageMaker direct marketing notebook for a more functionally detailed walkthrough of those features."
`713`	`713`	`]`
`714`	`714`	`}`
`715`	`715`	`],`