Merge pull request aws#37 from awslabs/kmeans_regional_images

djarpin · web-flow · commit 730cff603371 · 2017-11-25T11:28:18.000-08:00
Kmeans regional images
diff --git a/sagemaker-python-sdk/1P_kmeans_highlevel/kmeans_mnist.ipynb b/sagemaker-python-sdk/1P_kmeans_highlevel/kmeans_mnist.ipynb
@@ -43,7 +43,7 @@
     "\n",
     "Here we set up the linkage and authentication to AWS services. There are three parts to this:\n",
     "\n",
-    "1. The credentials and region for the account that's running training. Upload the credentials in the normal AWS credentials file format using the jupyter upload feature. The region must always be `us-west-2` during the Beta program.\n",
+    "1. The credentials and region for the account that's running training. Upload the credentials in the normal AWS credentials file format using the jupyter upload feature.\n",
     "2. The roles used to give learning and hosting access to your data. See the documentation for how to specify these.\n",
     "3. The S3 bucket that you want to use for training and model data.\n",
     "\n",
@@ -68,8 +68,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "output_location = 's3://{}/kmeans_highlevel_example/output'.format(bucket)\n",
-    "data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)\n",
+    "data_key = 'kmeans_example/data'\n",
+    "data_location = 's3://{}/{}'.format(bucket, data_key)\n",
+    "output_location = 's3://{}/kmeans_example/output'.format(bucket)\n",
     "\n",
     "print('training data will be uploaded to: {}'.format(data_location))\n",
     "print('training artifacts will be uploaded to: {}'.format(output_location))"
@@ -130,13 +131,6 @@
     "show_digit(train_set[0][30], 'This is a {}'.format(train_set[1][30]))"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Upload training data"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -292,9 +286,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import sagemaker\n",
+    "# Uncomment and run to delete\n",
     "\n",
-    "sagemaker.Session().delete_endpoint(kmeans_predictor.endpoint)"
+    "#import sagemaker\n",
+    "#sagemaker.Session().delete_endpoint(kmeans_predictor.endpoint)"
    ]
   },
   {
diff --git a/sagemaker-python-sdk/1P_kmeans_lowlevel/kmeans_mnist_lowlevel.ipynb b/sagemaker-python-sdk/1P_kmeans_lowlevel/kmeans_mnist_lowlevel.ipynb
@@ -62,6 +62,20 @@
     "bucket='<bucket-name>'"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_key = 'kmeans_example/data'\n",
+    "data_location = 's3://{}/{}'.format(bucket, data_key)\n",
+    "output_location = 's3://{}/kmeans_example/output'.format(bucket)\n",
+    "\n",
+    "print('training data will be uploaded to: {}'.format(data_location))\n",
+    "print('training artifacts will be uploaded to: {}'.format(output_location))"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -121,7 +135,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Data conversion\n",
+    "### Data conversion and upload\n",
     "\n",
     "Since algorithms have particular input and output requirements, converting the dataset is also part of the process that a data scientist goes through prior to initiating training. In this particular case, the hosted implementation of k-means takes recordio-wrapped protobuf, where the data we have today is a pickle-ized numpy array on disk.\n",
     "\n",
@@ -140,27 +154,14 @@
     "%%time\n",
     "from sagemaker.amazon.common import write_numpy_to_dense_tensor\n",
     "import io\n",
+    "import boto3\n",
     "\n",
     "# Convert the training data into the format required by the SageMaker KMeans algorithm\n",
     "buf = io.BytesIO()\n",
     "write_numpy_to_dense_tensor(buf, train_set[0], train_set[1])\n",
-    "buf.seek(0)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "\n",
-    "import boto3\n",
+    "buf.seek(0)\n",
     "\n",
-    "key = 'kmeans_lowlevel_example/data'\n",
-    "boto3.resource('s3').Bucket(bucket).Object(key).upload_fileobj(buf)\n",
-    "s3_train_data = 's3://{}/{}'.format(bucket, key)\n",
-    "print('uploaded training data location: {}'.format(s3_train_data))"
+    "boto3.resource('s3').Bucket(bucket).Object(data_key).upload_fileobj(buf)"
    ]
   },
   {
@@ -187,15 +188,21 @@
     "job_name = 'kmeans-lowlevel-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
     "print(\"Training job\", job_name)\n",
     "\n",
+    "images = {'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:latest',\n",
+    "          'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:latest',\n",
+    "          'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/kmeans:latest',\n",
+    "          'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/kmeans:latest'}\n",
+    "image = images[boto3.Session().region_name]\n",
+    "\n",
     "create_training_params = \\\n",
     "{\n",
     "    \"AlgorithmSpecification\": {\n",
-    "        \"TrainingImage\": \"174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1\",\n",
+    "        \"TrainingImage\": image,\n",
     "        \"TrainingInputMode\": \"File\"\n",
     "    },\n",
     "    \"RoleArn\": role,\n",
     "    \"OutputDataConfig\": {\n",
-    "        \"S3OutputPath\": \"s3://{}/kmeans_lowlevel_example/output\".format(bucket)\n",
+    "        \"S3OutputPath\": output_location\n",
     "    },\n",
     "    \"ResourceConfig\": {\n",
     "        \"InstanceCount\": 2,\n",
@@ -218,7 +225,7 @@
     "            \"DataSource\": {\n",
     "                \"S3DataSource\": {\n",
     "                    \"S3DataType\": \"S3Prefix\",\n",
-    "                    \"S3Uri\": s3_train_data,\n",
+    "                    \"S3Uri\": data_location,\n",
     "                    \"S3DataDistributionType\": \"FullyReplicated\"\n",
     "                }\n",
     "            },\n",
@@ -278,7 +285,7 @@
     "model_data = info['ModelArtifacts']['S3ModelArtifacts']\n",
     "\n",
     "primary_container = {\n",
-    "    'Image': \"174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1\",\n",
+    "    'Image': image,\n",
     "    'ModelDataUrl': model_data\n",
     "}\n",
     "\n",
@@ -474,7 +481,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sagemaker.delete_endpoint(EndpointName=endpoint_name)"
+    "# Uncomment and run to delete\n",
+    "\n",
+    "# sagemaker.delete_endpoint(EndpointName=endpoint_name)"
    ]
   },
   {