|
15 | 15 | "7. [More on SageMaker Spark](#More-on-SageMaker-Spark)\n",
|
16 | 16 | "\n",
|
17 | 17 | "## Introduction\n",
|
18 |
| - "This notebook will show how to classify handwritten digits through the SageMaker PySpark library. \n", |
| 18 | + "This notebook will show how to cluster handwritten digits through the SageMaker PySpark library. \n", |
19 | 19 | "\n",
|
20 | 20 | "We will manipulate data through Spark using a SparkSession, and then use the SageMaker Spark library to interact with SageMaker for training and inference. \n",
|
21 | 21 | "We will create a pipeline consisting of a first step to reduce the dimensionality using Spark MLLib PCA algorithm, followed by the final K-Means clustering step on SageMaker. \n",
|
22 | 22 | "\n",
|
23 | 23 | "You can visit SageMaker Spark's GitHub repository at https://github.com/aws/sagemaker-spark to learn more about SageMaker Spark.\n",
|
24 | 24 | "\n",
|
25 |
| - "This notebook was created and tested on an ml.m4.xlarge notebook instance." |
| 25 | + "This notebook was created and tested on an ml.m4.xlarge notebook instance.\n", |
| 26 | + "\n", |
| 27 | + "## Why use Spark MLLib algorithms? \n", |
| 28 | + "\n", |
| 29 | + "The use of Spark MLLib PCA in this notebook is meant to showcase how you can use different pre-processting steps, ranging from data transformers to algorithms, with tools such as Spark MLLib that are well suited for data pre-processing. You can then use SageMaker algorithms and features through the SageMaker-Spark SDK. Here in our case, PCA is in charge of reducing the feature vector as a pre-processing step, and K-Means responsible for clustering the data. " |
26 | 30 | ]
|
27 | 31 | },
|
28 | 32 | {
|
|
31 | 35 | "source": [
|
32 | 36 | "## Setup\n",
|
33 | 37 | "\n",
|
34 |
| - "First, we import the necessary modules and create the SparkSession and `SparkSession` with the SageMaker-Spark dependencies attached. " |
| 38 | + "First, we import the necessary modules and create the `SparkSession` with the SageMaker-Spark dependencies attached. " |
35 | 39 | ]
|
36 | 40 | },
|
37 | 41 | {
|
|
191 | 195 | "cell_type": "markdown",
|
192 | 196 | "metadata": {},
|
193 | 197 | "source": [
|
194 |
| - "Now that we've defined the `Pipeline`, we can call fit on the training data. " |
| 198 | + "Now that we've defined the `Pipeline`, we can call fit on the training data. Please note the below code will take several minutes to run and create all the resources needed for this pipeline. " |
195 | 199 | ]
|
196 | 200 | },
|
197 | 201 | {
|
|
219 | 223 | "cell_type": "markdown",
|
220 | 224 | "metadata": {},
|
221 | 225 | "source": [
|
222 |
| - "## Inference" |
| 226 | + "## Inference\n", |
| 227 | + "\n", |
| 228 | + "Let's use our test data on our pipeline by calling `transform`. Please note the below code will take several minutes to run and create the endpoints needed in order to serve this pipeline. " |
223 | 229 | ]
|
224 | 230 | },
|
225 | 231 | {
|
|
350 | 356 | "pygments_lexer": "ipython3",
|
351 | 357 | "version": "3.6.4"
|
352 | 358 | },
|
353 |
| - "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." |
| 359 | + "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." |
354 | 360 | },
|
355 | 361 | "nbformat": 4,
|
356 | 362 | "nbformat_minor": 2
|
|
0 commit comments