Skip to content

Commit fd96940

Browse files
committed
update the notebook to pass CI
1 parent 555c00b commit fd96940

File tree

2 files changed

+85
-21
lines changed

2 files changed

+85
-21
lines changed

sagemaker-featurestore/feature_store_feature_processor.ipynb

Lines changed: 84 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
11
{
22
"cells": [
33
{
4-
"attachments": {},
54
"cell_type": "markdown",
65
"id": "e2ac1559-3729-4cf3-acee-d4bb15c6f53d",
76
"metadata": {
87
"tags": []
98
},
109
"source": [
11-
"# Feature Processor Sample Notebook"
10+
"# Amazon SageMaker Feature Store: Feature Processor Introduction"
1211
]
1312
},
1413
{
15-
"attachments": {},
1614
"cell_type": "markdown",
1715
"id": "bfd7d612",
1816
"metadata": {},
@@ -27,16 +25,35 @@
2725
]
2826
},
2927
{
30-
"attachments": {},
28+
"cell_type": "markdown",
29+
"id": "c339cb18",
30+
"metadata": {},
31+
"source": [
32+
"This notebook demonstrates how to get started with Feature Processor using SageMaker python SDK, create feature groups, perform batch transformation and ingest processed input data to feature groups.\n",
33+
"\n",
34+
"We first demonstrate how to use `@feature-processor` decorator to run the job locally and then show how to use `@remote` decorator to execute large batch transform and ingestion on SageMaker training job remotely. Besides, the SDK provides APIs to create scheduled pipelines based on transformation code.\n",
35+
"\n",
36+
"If you would like to learn more about Feature Processor, see documentation [Feature Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-feature-processing.html) for more info and examples."
37+
]
38+
},
39+
{
3140
"cell_type": "markdown",
3241
"id": "a8b4ba90-e512-46bf-bfa9-541213021e86",
3342
"metadata": {
3443
"tags": []
3544
},
3645
"source": [
37-
"## Setup For Notebook\n",
38-
"First we create a new kernel to execute this notebook.\n",
46+
"## Setup For Notebook\n"
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"id": "e45c4dd7",
52+
"metadata": {},
53+
"source": [
54+
"### Setup Runtime Environment\n",
3955
"\n",
56+
"First we create a new kernel to execute this notebook.\n",
4057
"1. Launch a new terminal in the current image (the '$_' icon at the top of this notebook).\n",
4158
"2. Execute the commands: \n",
4259
"```\n",
@@ -48,12 +65,35 @@
4865
"3. Return to this notebook and select the kernel with Image: 'Data Science' and Kernel: 'feature-processing-py-3.9'"
4966
]
5067
},
68+
{
69+
"cell_type": "markdown",
70+
"id": "a65db47d",
71+
"metadata": {},
72+
"source": [
73+
"Alternatively If you are running this notebook on SageMaker Studio, you can execute the following cell to install runtime dependencies."
74+
]
75+
},
5176
{
5277
"cell_type": "code",
5378
"execution_count": null,
54-
"id": "73131cc7-1680-4e31-b47a-58d6f9c9236d",
79+
"id": "efbd6006",
80+
"metadata": {
81+
"tags": []
82+
},
83+
"outputs": [],
84+
"source": [
85+
"%%capture\n",
86+
"\n",
87+
"!apt-get update\n",
88+
"!apt-get install openjdk-11-jdk -y\n",
89+
"%pip install ipykernel"
90+
]
91+
},
92+
{
93+
"cell_type": "code",
94+
"execution_count": null,
95+
"id": "7351b428",
5596
"metadata": {
56-
"scrolled": true,
5797
"tags": []
5898
},
5999
"outputs": [],
@@ -103,6 +143,22 @@
103143
" get_ipython().run_cell(cell)"
104144
]
105145
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "a303d7bc",
149+
"metadata": {},
150+
"source": [
151+
"### Create Feature Groups"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"id": "f57390a2",
157+
"metadata": {},
158+
"source": [
159+
"First we start by creating two feature groups. One feature group is used for storing raw car sales dataset which is located in `data/car_data.csv`. We create another feature group to store aggregated feature values after feature processing, for example average value of `mileage`, `price` and `msrp`."
160+
]
161+
},
106162
{
107163
"cell_type": "code",
108164
"execution_count": null,
@@ -241,7 +297,6 @@
241297
]
242298
},
243299
{
244-
"attachments": {},
245300
"cell_type": "markdown",
246301
"id": "75d9c534-7b9d-40da-a99b-54aa8f927f8e",
247302
"metadata": {
@@ -252,7 +307,7 @@
252307
"\n",
253308
"The following example demonstrates how to use the @feature_processor decorator to load data from Amazon S3 to a SageMaker Feature Group. \n",
254309
"\n",
255-
"A @feature_processor decorated function automatically loads data from the configured inputs, applies the feature processing code and ingests the transformed data to a feature group."
310+
"A `@feature_processor` decorated function automatically loads data from the configured inputs, applies the feature processing code and ingests the transformed data to a feature group."
256311
]
257312
},
258313
{
@@ -317,7 +372,6 @@
317372
]
318373
},
319374
{
320-
"attachments": {},
321375
"cell_type": "markdown",
322376
"id": "23ef02b7-7b38-4c00-99fb-4caed9773321",
323377
"metadata": {},
@@ -326,14 +380,16 @@
326380
"\n",
327381
"The following example demonstrates how to run your feature processing code remotely.\n",
328382
"\n",
329-
"This is useful if you are working with large data sets that require hardware more powerful than locally available. You can decorate your code with the @remote decorator to run your local Python code as a single or multi-node distributed SageMaker training job. For more information on running your code as a SageMaker training job, see [Run your local code as a SageMaker training job](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html)."
383+
"This is useful if you are working with large data sets that require hardware more powerful than locally available. You can decorate your code with the `@remote` decorator to run your local Python code as a single or multi-node distributed SageMaker training job. For more information on running your code as a SageMaker training job, see [Run your local code as a SageMaker training job](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html)."
330384
]
331385
},
332386
{
333387
"cell_type": "code",
334388
"execution_count": null,
335389
"id": "d1f50d11",
336-
"metadata": {},
390+
"metadata": {
391+
"tags": []
392+
},
337393
"outputs": [],
338394
"source": [
339395
"\"\"\"\n",
@@ -417,14 +473,13 @@
417473
]
418474
},
419475
{
420-
"attachments": {},
421476
"cell_type": "markdown",
422477
"id": "11e1a26a-35f1-4477-b71f-17c18c604ea7",
423478
"metadata": {},
424479
"source": [
425480
"## `to_pipeline and schedule`\n",
426481
"\n",
427-
"The following example demonstrates how to operationalize your feature processor by promoting it to a SageMaker Pipeline and configuring a schedule to execute it on a regular basis. This example uses the aggregate function defined above."
482+
"The following example demonstrates how to operationalize your feature processor by promoting it to a SageMaker Pipeline and configuring a schedule to execute it on a regular basis. This example uses the aggregate function defined above. Note, in order to create a pipeline, please make sure your method is annotated by both `@remote` and `@feature-processor` decorators."
428483
]
429484
},
430485
{
@@ -468,6 +523,16 @@
468523
")"
469524
]
470525
},
526+
{
527+
"cell_type": "markdown",
528+
"id": "83ef5ce7",
529+
"metadata": {},
530+
"source": [
531+
"In the following example, we will create and schedule the pipeline using `to_pipeline` and `schedule` method. If you want to test the job before scheduling, you can use `execute` to start only one execution.\n",
532+
"\n",
533+
"The SDK also provides two extra methods `describe` and `list_pipelines` for you to get insights about the pipeline info."
534+
]
535+
},
471536
{
472537
"cell_type": "code",
473538
"execution_count": null,
@@ -551,7 +616,6 @@
551616
]
552617
},
553618
{
554-
"attachments": {},
555619
"cell_type": "markdown",
556620
"id": "be2d9751-e288-42db-b5fa-081939be66aa",
557621
"metadata": {},
@@ -564,7 +628,6 @@
564628
]
565629
},
566630
{
567-
"attachments": {},
568631
"cell_type": "markdown",
569632
"id": "0e9af135",
570633
"metadata": {},
@@ -603,7 +666,6 @@
603666
]
604667
},
605668
{
606-
"attachments": {},
607669
"cell_type": "markdown",
608670
"id": "6c1ebc50",
609671
"metadata": {},
@@ -645,10 +707,11 @@
645707
}
646708
],
647709
"metadata": {
710+
"instance_type": "ml.m5.2xlarge",
648711
"kernelspec": {
649-
"display_name": "Python 3",
712+
"display_name": "Python 3 (TensorFlow 2.10.0 Python 3.9 CPU Optimized)",
650713
"language": "python",
651-
"name": "python3"
714+
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.10.1-cpu-py39-ubuntu20.04-sagemaker-v1.2"
652715
},
653716
"language_info": {
654717
"codemirror_mode": {
@@ -660,7 +723,7 @@
660723
"name": "python",
661724
"nbconvert_exporter": "python",
662725
"pygments_lexer": "ipython3",
663-
"version": "3.9.14"
726+
"version": "3.9.16"
664727
}
665728
},
666729
"nbformat": 4,

sagemaker-featurestore/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ Feature Store is a centralized store for features and associated metadata so fea
1111
feature_store_client_side_encryption
1212
feature_store_securely_store_images
1313
feature_store_classification_job_to_ground_truth
14+
feature_store_feature_processor
1415
sagemaker_featurestore_fraud_detection_python_sdk

0 commit comments

Comments
 (0)