change header names

andremoeller · andremoeller · commit caa8bc519628 · 2018-01-17T14:16:37.000-08:00
diff --git a/sagemaker-spark/pyspark_mnist/pyspark_mnist_kmeans.ipynb b/sagemaker-spark/pyspark_mnist/pyspark_mnist_kmeans.ipynb
@@ -25,10 +25,10 @@
     "# SageMakerPySpark MNIST Example\n",
     "\n",
     "1. [Introduction](#Introduction)\n",
-    "2. [Data Inspection](#Data-Inspection)\n",
-    "3. [Training the K-Means Model](#Training-the-K-Means-Model)\n",
-    "4. [Validate the Model for use](#Validate-the-Model-for-use)\n",
-    "5. [Bring your Own Algorithm](#Bring-your-Own-Algorithm)\n"
+    "2. [Loading the Data](#Loading-the-Data)\n",
+    "3. [Training and Hosting a Model](#Training-and-Hosting-a-Model)\n",
+    "4. [Inference](#Inference)\n",
+    "5. [More on SageMaker Spark](#More-on-SageMaker-Spark)\n"
    ]
   },
   {
@@ -50,7 +50,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "from pyspark import SparkContext, SparkConf\n",
@@ -78,7 +80,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# replace this with your own region, such as us-east-1\n",
@@ -96,7 +100,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Data Inspection\n",
+    "## Loading the Data\n",
+    "\n",
     "In order to train and make inferences our input DataFrame must have a column of Doubles (named \"label\" by default) and a column of Vectors of Doubles (named \"features\" by default).\n",
     "\n",
     "Spark's LibSVM DataFrameReader loads a DataFrame already suitable for training and inference."
@@ -105,7 +110,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "trainingData.show()"
@@ -115,7 +122,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Training the K-Means Model\n",
+    "## Training and Hosting a Model\n",
     "Now we create a KMeansSageMakerEstimator, which uses the KMeans Amazon SageMaker Algorithm to train on our input data, and uses the KMeans Amazon SageMaker model image to host our model.\n",
     "\n",
     "Calling fit() on this estimator will train our model on Amazon SageMaker, and then create an Amazon SageMaker Endpoint to host our model.\n",
@@ -128,7 +135,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "import random\n",
@@ -154,15 +163,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Validate the Model for use\n",
+    "## Inference\n",
+    "\n",
     "Now we transform our DataFrame.\n",
     "To do this, we serialize each row's \"features\" Vector of Doubles into a Protobuf format for inference against the Amazon SageMaker Endpoint. We deserialize the Protobuf responses back into our DataFrame:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "transformedData = model.transform(testData)\n",
@@ -173,7 +185,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "from pyspark.sql.types import DoubleType\n",
@@ -230,7 +244,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Bring your Own Algorithm\n",
+    "## More on SageMaker Spark\n",
     "\n",
     "The SageMaker Spark Github repository has more about SageMaker Spark, including how to use SageMaker Spark with your own algorithms on Amazon SageMaker: https://github.com/aws/sagemaker-spark\n"
    ]