|
25 | 25 | "# SageMakerPySpark MNIST Example\n",
|
26 | 26 | "\n",
|
27 | 27 | "1. [Introduction](#Introduction)\n",
|
28 |
| - "2. [Data Inspection](#Data-Inspection)\n", |
29 |
| - "3. [Training the K-Means Model](#Training-the-K-Means-Model)\n", |
30 |
| - "4. [Validate the Model for use](#Validate-the-Model-for-use)\n", |
31 |
| - "5. [Bring your Own Algorithm](#Bring-your-Own-Algorithm)\n" |
| 28 | + "2. [Loading the Data](#Loading-the-Data)\n", |
| 29 | + "3. [Training and Hosting a Model](#Training-and-Hosting-a-Model)\n", |
| 30 | + "4. [Inference](#Inference)\n", |
| 31 | + "5. [More on SageMaker Spark](#More-on-SageMaker-Spark)\n" |
32 | 32 | ]
|
33 | 33 | },
|
34 | 34 | {
|
|
50 | 50 | {
|
51 | 51 | "cell_type": "code",
|
52 | 52 | "execution_count": null,
|
53 |
| - "metadata": {}, |
| 53 | + "metadata": { |
| 54 | + "collapsed": true |
| 55 | + }, |
54 | 56 | "outputs": [],
|
55 | 57 | "source": [
|
56 | 58 | "from pyspark import SparkContext, SparkConf\n",
|
|
78 | 80 | {
|
79 | 81 | "cell_type": "code",
|
80 | 82 | "execution_count": null,
|
81 |
| - "metadata": {}, |
| 83 | + "metadata": { |
| 84 | + "collapsed": true |
| 85 | + }, |
82 | 86 | "outputs": [],
|
83 | 87 | "source": [
|
84 | 88 | "# replace this with your own region, such as us-east-1\n",
|
|
96 | 100 | "cell_type": "markdown",
|
97 | 101 | "metadata": {},
|
98 | 102 | "source": [
|
99 |
| - "## Data Inspection\n", |
| 103 | + "## Loading the Data\n", |
| 104 | + "\n", |
100 | 105 | "In order to train and make inferences our input DataFrame must have a column of Doubles (named \"label\" by default) and a column of Vectors of Doubles (named \"features\" by default).\n",
|
101 | 106 | "\n",
|
102 | 107 | "Spark's LibSVM DataFrameReader loads a DataFrame already suitable for training and inference."
|
|
105 | 110 | {
|
106 | 111 | "cell_type": "code",
|
107 | 112 | "execution_count": null,
|
108 |
| - "metadata": {}, |
| 113 | + "metadata": { |
| 114 | + "collapsed": true |
| 115 | + }, |
109 | 116 | "outputs": [],
|
110 | 117 | "source": [
|
111 | 118 | "trainingData.show()"
|
|
115 | 122 | "cell_type": "markdown",
|
116 | 123 | "metadata": {},
|
117 | 124 | "source": [
|
118 |
| - "## Training the K-Means Model\n", |
| 125 | + "## Training and Hosting a Model\n", |
119 | 126 | "Now we create a KMeansSageMakerEstimator, which uses the KMeans Amazon SageMaker Algorithm to train on our input data, and uses the KMeans Amazon SageMaker model image to host our model.\n",
|
120 | 127 | "\n",
|
121 | 128 | "Calling fit() on this estimator will train our model on Amazon SageMaker, and then create an Amazon SageMaker Endpoint to host our model.\n",
|
|
128 | 135 | {
|
129 | 136 | "cell_type": "code",
|
130 | 137 | "execution_count": null,
|
131 |
| - "metadata": {}, |
| 138 | + "metadata": { |
| 139 | + "collapsed": true |
| 140 | + }, |
132 | 141 | "outputs": [],
|
133 | 142 | "source": [
|
134 | 143 | "import random\n",
|
|
154 | 163 | "cell_type": "markdown",
|
155 | 164 | "metadata": {},
|
156 | 165 | "source": [
|
157 |
| - "## Validate the Model for use\n", |
| 166 | + "## Inference\n", |
| 167 | + "\n", |
158 | 168 | "Now we transform our DataFrame.\n",
|
159 | 169 | "To do this, we serialize each row's \"features\" Vector of Doubles into a Protobuf format for inference against the Amazon SageMaker Endpoint. We deserialize the Protobuf responses back into our DataFrame:"
|
160 | 170 | ]
|
161 | 171 | },
|
162 | 172 | {
|
163 | 173 | "cell_type": "code",
|
164 | 174 | "execution_count": null,
|
165 |
| - "metadata": {}, |
| 175 | + "metadata": { |
| 176 | + "collapsed": true |
| 177 | + }, |
166 | 178 | "outputs": [],
|
167 | 179 | "source": [
|
168 | 180 | "transformedData = model.transform(testData)\n",
|
|
173 | 185 | {
|
174 | 186 | "cell_type": "code",
|
175 | 187 | "execution_count": null,
|
176 |
| - "metadata": {}, |
| 188 | + "metadata": { |
| 189 | + "collapsed": true |
| 190 | + }, |
177 | 191 | "outputs": [],
|
178 | 192 | "source": [
|
179 | 193 | "from pyspark.sql.types import DoubleType\n",
|
|
230 | 244 | "cell_type": "markdown",
|
231 | 245 | "metadata": {},
|
232 | 246 | "source": [
|
233 |
| - "## Bring your Own Algorithm\n", |
| 247 | + "## More on SageMaker Spark\n", |
234 | 248 | "\n",
|
235 | 249 | "The SageMaker Spark Github repository has more about SageMaker Spark, including how to use SageMaker Spark with your own algorithms on Amazon SageMaker: https://github.com/aws/sagemaker-spark\n"
|
236 | 250 | ]
|
|
0 commit comments