|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "# Object Detection using Managed Spot Training\n", |
| 7 | + "# Object detection using managed spot training\n", |
8 | 8 | "\n",
|
9 |
| - "The example here is almost the same as [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb).\n", |
| 9 | + "This notebook shows how to use [Amazon SageMaker managed spot training](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html) to run training jobs at potentially lower cost. Managed spot training uses [Amazon EC2 Spot instances](https://aws.amazon.com/ec2/spot/) and manages the Spot interruptions on your behalf.\n", |
10 | 10 | "\n",
|
11 |
| - "This notebook tackles the exact same problem with the same solution, but it has been modified to be able to run using SageMaker Managed Spot infrastructure. SageMaker Managed Spot uses [EC2 Spot Instances](https://aws.amazon.com/ec2/spot/) to run Training at a lower cost.\n", |
12 |
| - "\n", |
13 |
| - "Please read the original notebook and try it out to gain an understanding of the ML use-case and how it is being solved. We will not delve into that here in this notebook.\n", |
| 11 | + "To highlight the differences between on-demand and Spot instances, this notebook is the same as [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb), but has been updated to use managed spot training. For a full description of the ML use case and how it is being solved, see the original notebook.\n", |
14 | 12 | "\n",
|
15 | 13 | "## Setup\n",
|
16 |
| - "Again, we won't go into detail explaining the code below, it has been lifted verbatim from [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb)." |
17 |
| - ] |
18 |
| - }, |
19 |
| - { |
20 |
| - "cell_type": "code", |
21 |
| - "execution_count": null, |
22 |
| - "metadata": {}, |
23 |
| - "outputs": [], |
24 |
| - "source": [ |
25 |
| - "!pip install -qU awscli boto3 sagemaker" |
| 14 | + "\n", |
| 15 | + "See [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb) for a description of the code.\n", |
| 16 | + "\n", |
| 17 | + "### Prerequisites\n", |
| 18 | + "\n", |
| 19 | + "This notebook has been tested with:\n", |
| 20 | + "* SageMaker Python SDK 1.72.1\n", |
| 21 | + "* Python 3.6\n", |
| 22 | + "* Kernel: conda_mxnet_p36" |
26 | 23 | ]
|
27 | 24 | },
|
28 | 25 | {
|
|
48 | 45 | "cell_type": "markdown",
|
49 | 46 | "metadata": {},
|
50 | 47 | "source": [
|
51 |
| - "### Download And Prepare Data\n", |
52 |
| - "Note: this notebook downloads and uses the Pascal VOC dateset, please be aware of the database usage rights:\n", |
53 |
| - "\"The VOC data includes images obtained from the \"flickr\" website. Use of these images must respect the corresponding terms of use: \n", |
54 |
| - "* \"flickr\" terms of use (https://www.flickr.com/help/terms)\"" |
| 48 | + "### Download and prepare data\n", |
| 49 | + "This notebook downloads and uses the Pascal VOC dataset, which has the following database usage rights:\n", |
| 50 | + "> The VOC data includes images obtained from the Flickr website. Use of these images must respect the corresponding terms of use: \n", |
| 51 | + "> * Flickr terms of use (https://www.flickr.com/help/terms)" |
55 | 52 | ]
|
56 | 53 | },
|
57 | 54 | {
|
|
81 | 78 | "cell_type": "markdown",
|
82 | 79 | "metadata": {},
|
83 | 80 | "source": [
|
84 |
| - "### Upload data to S3" |
| 81 | + "### Upload data to Amazon Simple Storage Service (Amazon S3)" |
85 | 82 | ]
|
86 | 83 | },
|
87 | 84 | {
|
|
105 | 102 | "cell_type": "markdown",
|
106 | 103 | "metadata": {},
|
107 | 104 | "source": [
|
108 |
| - "# Object Detection using Managed Spot Training\n", |
| 105 | + "## Managed spot training\n", |
109 | 106 | "\n",
|
110 |
| - "For Managed Spot Training using Object Detection we need to configure two things:\n", |
111 |
| - "1. Enable the `train_use_spot_instances` constructor arg - a simple self-explanatory boolean.\n", |
112 |
| - "2. Set the `train_max_wait` constructor arg - this is an int arg representing the amount of time you are willing to wait for Spot infrastructure to become available. Some instance types are harder to get at Spot prices and you may have to wait longer. You are not charged for time spent waiting for Spot infrastructure to become available, you're only charged for actual compute time spent once Spot instances have been successfully procured.\n", |
| 107 | + "Managed spot training is controlled by two arguments to the `sagemaker.estimator.Estimator` constructor:\n", |
113 | 108 | "\n",
|
114 |
| - "Feel free to toggle the `train_use_spot_instances` variable to see the effect of running the same job using regular (a.k.a. \"On Demand\") infrastructure.\n", |
| 109 | + "* `train_use_spot_instances`: Set to `True` to use Spot instances for training jobs.\n", |
| 110 | + "* `train_max_wait`: Represents the amount of time to wait for a Spot instance to become available. Be aware that some Spot instance types take longer to get. You are charged only for actual compute time spent once Spot instances have been acquired, and not for time spent waiting for Spot instances to become available.\n", |
115 | 111 | "\n",
|
116 |
| - "Note that `train_max_wait` can be set if and only if `train_use_spot_instances` is enabled and **must** be greater than or equal to `train_max_run`." |
| 112 | + "Note that `train_max_wait` can be set only if `train_use_spot_instances` is `True` and **must** be greater than or equal to `train_max_run`.\n", |
| 113 | + "\n", |
| 114 | + "Toggle `train_use_spot_instances` in the following code to see the effect of running the same job using on-demand instances." |
117 | 115 | ]
|
118 | 116 | },
|
119 | 117 | {
|
|
131 | 129 | "cell_type": "markdown",
|
132 | 130 | "metadata": {},
|
133 | 131 | "source": [
|
134 |
| - "## Training\n", |
135 |
| - "Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job." |
| 132 | + "### Training\n", |
| 133 | + "\n", |
| 134 | + "Train the object detector by creating a `sagemaker.estimator.Estimator` object and launching the training job." |
136 | 135 | ]
|
137 | 136 | },
|
138 | 137 | {
|
|
182 | 181 | },
|
183 | 182 | {
|
184 | 183 | "cell_type": "markdown",
|
185 |
| - "metadata": {}, |
| 184 | + "metadata": { |
| 185 | + "pycharm": { |
| 186 | + "name": "#%% md\n" |
| 187 | + } |
| 188 | + }, |
186 | 189 | "source": [
|
187 |
| - "# Savings\n", |
188 |
| - "Towards the end of the job you should see two lines of output printed:\n", |
| 190 | + "### Savings\n", |
| 191 | + "At the end of the job output, two lines are printed:\n", |
| 192 | + "\n", |
| 193 | + "* `Training seconds: X` : The actual compute time spent on the training job.\n", |
| 194 | + "* `Billable seconds: Y` : The time you will be billed for after Spot discounting is applied.\n", |
189 | 195 | "\n",
|
190 |
| - "- `Training seconds: X` : This is the actual compute-time your training job spent\n", |
191 |
| - "- `Billable seconds: Y` : This is the time you will be billed for after Spot discounting is applied.\n", |
| 196 | + "When `train_use_spot_instances` is `True`, you should see a notable difference between training and billable seconds. This shows the cost savings when managed spot training is used, and is summarized in the final output:\n", |
192 | 197 | "\n",
|
193 |
| - "If you enabled the `train_use_spot_instances` var then you should see a notable difference between `X` and `Y` signifying the cost savings you will get for having chosen Managed Spot Training. This should be reflected in an additional line:\n", |
194 |
| - "- `Managed Spot Training savings: (1-Y/X)*100 %`" |
| 198 | + "* `Managed Spot Training savings: (1-Y/X)*100 %`" |
195 | 199 | ]
|
196 | 200 | }
|
197 | 201 | ],
|
|
212 | 216 | "name": "python",
|
213 | 217 | "nbconvert_exporter": "python",
|
214 | 218 | "pygments_lexer": "ipython3",
|
215 |
| - "version": "3.6.5" |
| 219 | + "version": "3.6.10" |
216 | 220 | },
|
217 |
| - "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." |
| 221 | + "notice": "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." |
218 | 222 | },
|
219 | 223 | "nbformat": 4,
|
220 | 224 | "nbformat_minor": 4
|
|
0 commit comments