update the notebook to pass CI

can-sun · can-sun · commit fd96940657d5 · 2023-06-21T00:05:17.000-07:00
diff --git a/sagemaker-featurestore/feature_store_feature_processor.ipynb b/sagemaker-featurestore/feature_store_feature_processor.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "e2ac1559-3729-4cf3-acee-d4bb15c6f53d",
    "metadata": {
     "tags": []
    },
    "source": [
-    "# Feature Processor Sample Notebook"
+    "# Amazon SageMaker Feature Store: Feature Processor Introduction"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "bfd7d612",
    "metadata": {},
@@ -27,16 +25,35 @@
    ]
   },
   {
-   "attachments": {},
+   "cell_type": "markdown",
+   "id": "c339cb18",
+   "metadata": {},
+   "source": [
+    "This notebook demonstrates how to get started with Feature Processor using SageMaker python SDK, create feature groups, perform batch transformation and ingest processed input data to feature groups.\n",
+    "\n",
+    "We first demonstrate how to use `@feature-processor` decorator to run the job locally and then show how to use `@remote` decorator to execute large batch transform and ingestion on SageMaker training job remotely. Besides, the SDK provides APIs to create scheduled pipelines based on transformation code.\n",
+    "\n",
+    "If you would like to learn more about Feature Processor, see documentation [Feature Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-feature-processing.html) for more info and examples."
+   ]
+  },
+  {
    "cell_type": "markdown",
    "id": "a8b4ba90-e512-46bf-bfa9-541213021e86",
    "metadata": {
     "tags": []
    },
    "source": [
-    "## Setup For Notebook\n",
-    "First we create a new kernel to execute this notebook.\n",
+    "## Setup For Notebook\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e45c4dd7",
+   "metadata": {},
+   "source": [
+    "### Setup Runtime Environment\n",
     "\n",
+    "First we create a new kernel to execute this notebook.\n",
     "1. Launch a new terminal in the current image (the '$_' icon at the top of this notebook).\n",
     "2. Execute the commands: \n",
     "```\n",
@@ -48,12 +65,35 @@
     "3. Return to this notebook and select the kernel with Image: 'Data Science' and Kernel: 'feature-processing-py-3.9'"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a65db47d",
+   "metadata": {},
+   "source": [
+    "Alternatively If you are running this notebook on SageMaker Studio, you can execute the following cell to install runtime dependencies."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "73131cc7-1680-4e31-b47a-58d6f9c9236d",
+   "id": "efbd6006",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "\n",
+    "!apt-get update\n",
+    "!apt-get install openjdk-11-jdk -y\n",
+    "%pip install ipykernel"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7351b428",
    "metadata": {
-    "scrolled": true,
     "tags": []
    },
    "outputs": [],
@@ -103,6 +143,22 @@
     "    get_ipython().run_cell(cell)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a303d7bc",
+   "metadata": {},
+   "source": [
+    "### Create Feature Groups"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f57390a2",
+   "metadata": {},
+   "source": [
+    "First we start by creating two feature groups. One feature group is used for storing raw car sales dataset which is located in `data/car_data.csv`. We create another feature group to store aggregated feature values after feature processing, for example average value of `mileage`, `price` and `msrp`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -241,7 +297,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "75d9c534-7b9d-40da-a99b-54aa8f927f8e",
    "metadata": {
@@ -252,7 +307,7 @@
     "\n",
     "The following example demonstrates how to use the @feature_processor decorator to load data from Amazon S3 to a SageMaker Feature Group. \n",
     "\n",
-    "A @feature_processor decorated function automatically loads data from the configured inputs, applies the feature processing code and ingests the transformed data to a feature group."
+    "A `@feature_processor` decorated function automatically loads data from the configured inputs, applies the feature processing code and ingests the transformed data to a feature group."
    ]
   },
   {
@@ -317,7 +372,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "23ef02b7-7b38-4c00-99fb-4caed9773321",
    "metadata": {},
@@ -326,14 +380,16 @@
     "\n",
     "The following example demonstrates how to run your feature processing code remotely.\n",
     "\n",
-    "This is useful if you are working with large data sets that require hardware more powerful than locally available. You can decorate your code with the @remote decorator to run your local Python code as a single or multi-node distributed SageMaker training job. For more information on running your code as a SageMaker training job, see [Run your local code as a SageMaker training job](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html)."
+    "This is useful if you are working with large data sets that require hardware more powerful than locally available. You can decorate your code with the `@remote` decorator to run your local Python code as a single or multi-node distributed SageMaker training job. For more information on running your code as a SageMaker training job, see [Run your local code as a SageMaker training job](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "d1f50d11",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "\"\"\"\n",
@@ -417,14 +473,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "11e1a26a-35f1-4477-b71f-17c18c604ea7",
    "metadata": {},
    "source": [
     "## `to_pipeline and schedule`\n",
     "\n",
-    "The following example demonstrates how to operationalize your feature processor by promoting it to a SageMaker Pipeline and configuring a schedule to execute it on a regular basis. This example uses the aggregate function defined above."
+    "The following example demonstrates how to operationalize your feature processor by promoting it to a SageMaker Pipeline and configuring a schedule to execute it on a regular basis. This example uses the aggregate function defined above. Note, in order to create a pipeline, please make sure your method is annotated by both `@remote` and `@feature-processor` decorators."
    ]
   },
   {
@@ -468,6 +523,16 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "83ef5ce7",
+   "metadata": {},
+   "source": [
+    "In the following example, we will create and schedule the pipeline using `to_pipeline` and `schedule` method. If you want to test the job before scheduling, you can use `execute` to start only one execution.\n",
+    "\n",
+    "The SDK also provides two extra methods `describe` and `list_pipelines` for you to get insights about the pipeline info."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -551,7 +616,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "be2d9751-e288-42db-b5fa-081939be66aa",
    "metadata": {},
@@ -564,7 +628,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "0e9af135",
    "metadata": {},
@@ -603,7 +666,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "id": "6c1ebc50",
    "metadata": {},
@@ -645,10 +707,11 @@
   }
  ],
  "metadata": {
+  "instance_type": "ml.m5.2xlarge",
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (TensorFlow 2.10.0 Python 3.9 CPU Optimized)",
    "language": "python",
-   "name": "python3"
+   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.10.1-cpu-py39-ubuntu20.04-sagemaker-v1.2"
   },
   "language_info": {
    "codemirror_mode": {
@@ -660,7 +723,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.14"
+   "version": "3.9.16"
   }
  },
  "nbformat": 4,
diff --git a/sagemaker-featurestore/index.rst b/sagemaker-featurestore/index.rst
@@ -11,4 +11,5 @@ Feature Store is a centralized store for features and associated metadata so fea
    feature_store_client_side_encryption
    feature_store_securely_store_images
    feature_store_classification_job_to_ground_truth
+   feature_store_feature_processor
    sagemaker_featurestore_fraud_detection_python_sdk