|
41 | 41 | "\n",
|
42 | 42 | "### Permissions and environment variables\n",
|
43 | 43 | "\n",
|
44 |
| - "Here we set up the linkage and authentication to AWS services. There are three parts to this:\n", |
| 44 | + "Here we set up the linkage and authentication to AWS services. There are two parts to this:\n", |
45 | 45 | "\n",
|
46 |
| - "1. The credentials and region for the account that's running training. Upload the credentials in the normal AWS credentials file format using the jupyter upload feature.\n", |
47 |
| - "2. The roles used to give learning and hosting access to your data. See the documentation for how to specify these.\n", |
48 |
| - "3. The S3 bucket that you want to use for training and model data.\n", |
49 |
| - "\n", |
50 |
| - "_Note:_ Credentials for hosted notebooks will be automated before the final release." |
| 46 | + "1. The role(s) used to give learning and hosting access to your data. See the documentation for how to specify these.\n", |
| 47 | + "1. The S3 bucket name and locations that you want to use for training and model data." |
51 | 48 | ]
|
52 | 49 | },
|
53 | 50 | {
|
|
82 | 79 | "source": [
|
83 | 80 | "### Data ingestion\n",
|
84 | 81 | "\n",
|
85 |
| - "Next, we read the dataset from the existing repository into memory, for preprocessing prior to training. This processing could be done *in situ* by Amazon Athena, Apache Spark in Amazon EMR, Amazon Redshift, etc., assuming the dataset is present in the appropriate location. Then, the next step would be to transfer the data to S3 for use in training. For small datasets, such as this one, reading into memory isn't onerous, though it would be for larger datasets." |
| 82 | + "Next, we read the dataset from the existing repository into memory, for preprocessing prior to training. In this case we'll use the MNIST dataset, which contains 70K 28 x 28 pixel images of handwritten digits. For more details, please see [here](http://yann.lecun.com/exdb/mnist/).\n", |
| 83 | + "\n", |
| 84 | + "This processing could be done *in situ* by Amazon Athena, Apache Spark in Amazon EMR, Amazon Redshift, etc., assuming the dataset is present in the appropriate location. Then, the next step would be to transfer the data to S3 for use in training. For small datasets, such as this one, reading into memory isn't onerous, though it would be for larger datasets." |
86 | 85 | ]
|
87 | 86 | },
|
88 | 87 | {
|
|
137 | 136 | "source": [
|
138 | 137 | "## Training the K-Means model\n",
|
139 | 138 | "\n",
|
140 |
| - "Once we have the data preprocessed and available in the correct format for training, the next step is to actually train the model using the data. Since this data is relatively small, it isn't meant to show off the performance of the kmeans training algorithm - we will visit that in another example.\n", |
| 139 | + "Once we have the data preprocessed and available in the correct format for training, the next step is to actually train the model using the data. Since this data is relatively small, it isn't meant to show off the performance of the k-means training algorithm. But Amazon SageMaker's k-means has been tested on, and scales well with, multi-terabyte datasets.\n", |
141 | 140 | "\n",
|
142 | 141 | "After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between 7 and 11 minutes."
|
143 | 142 | ]
|
|
174 | 173 | "metadata": {},
|
175 | 174 | "source": [
|
176 | 175 | "## Set up hosting for the model\n",
|
177 |
| - "In order to set up hosting, we have to import the model from training to hosting. A common question would be, why wouldn't we automatically go from training to hosting? As we worked through examples of what customers were looking to do with hosting, we realized that the Amazon ML model of hosting was unlikely to be sufficient for all customers.\n", |
178 |
| - "\n", |
179 |
| - "As a result, we have introduced some flexibility with respect to model deployment, with the goal of additional model deployment targets after launch. In the short term, that introduces some complexity, but we are actively working on making that easier for customers, even before GA.\n", |
180 |
| - "\n", |
181 |
| - "### Import model into hosting\n", |
182 |
| - "Next, you register the model with hosting. This allows you the flexibility of importing models trained elsewhere, as well as the choice of not importing models if the target of model creation is AWS Lambda, AWS Greengrass, Amazon Redshift, Amazon Athena, or other deployment target." |
| 176 | + "Now, we can deploy the model we just trained behind a real-time hosted endpoint. This next step can take, on average, 7 to 11 minutes to complete." |
183 | 177 | ]
|
184 | 178 | },
|
185 | 179 | {
|
|
199 | 193 | "metadata": {},
|
200 | 194 | "source": [
|
201 | 195 | "## Validate the model for use\n",
|
202 |
| - "Finally, the customer can now validate the model for use. They can obtain the endpoint from the client library using the result from previous operations, and generate classifications from the trained model using that endpoint." |
| 196 | + "Finally, we'll validate the model for use. Let's generate a classification for a single observation from the trained model using the endpoint we just created." |
203 | 197 | ]
|
204 | 198 | },
|
205 | 199 | {
|
|
268 | 262 | "cell_type": "markdown",
|
269 | 263 | "metadata": {},
|
270 | 264 | "source": [
|
271 |
| - "### (Optional) Delete the Endpoint" |
| 265 | + "### (Optional) Delete the Endpoint\n", |
| 266 | + "If you're ready to be done with this notebook, make sure run the cell below. This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on." |
272 | 267 | ]
|
273 | 268 | },
|
274 | 269 | {
|
|
291 | 286 | "#import sagemaker\n",
|
292 | 287 | "#sagemaker.Session().delete_endpoint(kmeans_predictor.endpoint)"
|
293 | 288 | ]
|
294 |
| - }, |
295 |
| - { |
296 |
| - "cell_type": "code", |
297 |
| - "execution_count": null, |
298 |
| - "metadata": {}, |
299 |
| - "outputs": [], |
300 |
| - "source": [] |
301 | 289 | }
|
302 | 290 | ],
|
303 | 291 | "metadata": {
|
|
0 commit comments