aws
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎NOTICE
Lines changed: 1 addition & 1 deletion b/‎NOTICE
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 110 additions & 2 deletions b/‎README.md
Lines changed: 110 additions & 2 deletions
diff --git a/‎container/sagemaker/nginx.conf.template
Lines changed: 55 additions & 0 deletions b/‎container/sagemaker/nginx.conf.template
Lines changed: 55 additions & 0 deletions
diff --git a/‎container/sagemaker/serve
Lines changed: 3 additions & 0 deletions b/‎container/sagemaker/serve
Lines changed: 3 additions & 0 deletions
diff --git a/‎container/sagemaker/serve.py
Lines changed: 178 additions & 0 deletions b/‎container/sagemaker/serve.py
Lines changed: 178 additions & 0 deletions
@@ -0,0 +1 @@
+log.txt
@@ -1,2 +1,2 @@
-Sagemaker Tfs Container
+Sagemaker TensorFlow Serving Container
 Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 
@@ -1,7 +1,115 @@
-## Sagemaker Tfs Container
+# SageMaker TensorFlow Serving Container
 
-A TensorFlow Serving solution for use in SageMaker.
+SageMaker TensorFlow Serving Container is an a open source project that builds 
+docker images for running TensorFlow Serving on 
+[Amazon SageMaker](https://aws.amazon.com/documentation/sagemaker/).
+
+This documentation covers building and testing these docker images. 
+
+For information about using TensorFlow Serving on SageMaker, see: 
+[Deploying to TensorFlow Serving Endpoints](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst)
+in the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) documentation.
+
+For notebook examples, see: [Amazon SageMaker Examples](https://github.com/awslabs/amazon-sagemaker-examples).
+
+## Table of Contents
+
+1. [Getting Started](#getting-started)
+2. [Building your image](#building-your-image)
+3. [Running the tests](#running-the-tests)
+
+## Getting Started
+
+### Prerequisites
+
+Make sure you have installed all of the following prerequisites on your 
+development machine:
+
+- [Docker](https://www.docker.com/)
+- [AWS CLI](https://aws.amazon.com/cli/)
+
+For testing, you will also need:
+
+- [Python 3.5+](https://www.python.org/)
+- [pytest](https://docs.pytest.org/en/latest/)
+- The Python [requests](http://docs.python-requests.org/en/master/) library
+
+To test GPU images locally, you will also need:
+
+- [nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
+
+**Note:** Some of the build and tests scripts interact with resources in your AWS account. Be sure to 
+set your default AWS credentials and region using `aws configure` before using these scripts. 
+
+## Building your image
+
+Amazon SageMaker uses Docker containers to run all training jobs and inference endpoints.
+
+The Docker images are built from the Dockerfiles in 
+[docker/](https://github.com/aws/sagemaker-tensorflow-serving-container/tree/master/docker>).
+
+The Dockerfiles are grouped based on the version of TensorFlow Serving they support. Each supported
+processor type (e.g. "cpu", "gpu") has a different Dockerfile in each group.  
+
+To build an image, run the `./scripts/build.sh` script:
+
+```bash
+./scripts/build.sh --version 1.11 --arch cpu
+./scripts/build.sh --version 1.11 --arch gpu
+```
+
+
+If your are testing locally, building the image is enough. But if you want to your updated image
+in SageMaker, you need to publish it to an ECR repository in your account. The 
+`./scripts/publish.sh` script makes that easy:
+ 
+```bash
+./scripts/publish.sh --version 1.11 --arch cpu
+./scripts/publish.sh --version 1.11 --arch gpu
+```
+
+Note: this will publish to ECR in your default region. Use the `--region` argument to 
+specify a different region.
+
+### Running your image in local docker
+
+You can also run your container locally in Docker to test different models and input 
+inference requests by hand. Standard `docker run` commands (or `nvidia-docker run` for 
+GPU images) will work for this, or you can use the provided `start.sh` 
+and `stop.sh` scripts:
+
+```bash
+./scripts/start.sh [--version x.xx] [--arch cpu|gpu|...]
+./scripts/stop.sh [--version x.xx] [--arch cpu|gpu|...]
+```
+
+When the container is running, you can send test requests to it using any HTTP client. Here's
+and an example using the `curl` command:
+
+```bash
+curl -X POST --data-binary @test/resources/inputs/test.json \
+     -H 'Content-Type: application/json' \ 
+     -H 'X-Amzn-SageMaker-Custom-Attributes: tfs-model-name=half_plus_three' \ 
+     http://localhost:8080/invocations
+```
+
+Additional `curl` examples can be found in `./scripts/curl.sh`. 
+
+## Running the tests
+
+The package includes some automated unit and integration tests. These tests use Docker to run
+your image locally, and do not access resources in AWS. You can run them using `pytest`:
+
+```bash
+pytest ./test
+```
+
+## Contributing
+
+Please read [CONTRIBUTING.md](https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/CONTRIBUTING.md) 
+for details on our code of conduct, and the process for submitting pull requests to us.
 
 ## License
 
 This library is licensed under the Apache 2.0 License. 
+
@@ -0,0 +1,55 @@
+load_module modules/ngx_http_js_module.so;
+
+worker_processes auto;
+daemon off;
+pid /tmp/nginx.pid;
+error_log  /dev/stderr %NGINX_LOG_LEVEL%;
+
+worker_rlimit_nofile 4096;
+
+events {
+  worker_connections 2048;
+}
+
+http {
+  include /etc/nginx/mime.types;
+  default_type application/json;
+  access_log /dev/stdout combined;
+  js_include tensorflow-serving.js;
+
+  upstream tfs_upstream {
+    server localhost:%TFS_REST_PORT%;
+  }
+
+  server {
+    listen %NGINX_HTTP_PORT% deferred;
+    client_max_body_size 0;
+    client_body_buffer_size 100m;
+    subrequest_output_buffer_size 100m;
+
+    set $default_tfs_model %TFS_DEFAULT_MODEL_NAME%;
+
+    location /tfs {
+        rewrite ^/tfs/(.*) /$1  break;
+        proxy_redirect off;
+        proxy_pass_request_headers off;
+        proxy_set_header Content-Type 'application/json';
+        proxy_set_header Accept 'application/json';
+        proxy_pass http://tfs_upstream;
+    }
+
+    location /ping {
+        js_content ping;
+    }
+
+    location /invocations {
+        js_content invocations;
+    }
+
+    location / {
+        return 404 '{"error": "Not Found"}';
+    }
+
+    keepalive_timeout 3;
+  }
+}
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+python3 /sagemaker/serve.py
@@ -0,0 +1,178 @@
+# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"). You
+# may not use this file except in compliance with the License. A copy of
+# the License is located at
+#
+#     http://aws.amazon.com/apache2.0/
+#
+# or in the "license" file accompanying this file. This file is
+# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
+# ANY KIND, either express or implied. See the License for the specific
+# language governing permissions and limitations under the License.
+
+import logging
+import os
+import re
+import signal
+import subprocess
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger(__name__)
+
+
+class ServiceManager(object):
+    def __init__(self):
+        self._state = 'initializing'
+        self._nginx = None
+        self._tfs = None
+        self._nginx_http_port = os.environ.get('SAGEMAKER_BIND_TO_PORT', '8080')
+        self._nginx_loglevel = os.environ.get('SAGEMAKER_TFS_NGINX_LOGLEVEL', 'error')
+
+        self._tfs_default_model_name = os.environ.get('SAGEMAKER_TFS_DEFAULT_MODEL_NAME', None)
+
+        if 'SAGEMAKER_SAFE_PORT_RANGE' in os.environ:
+            port_range = os.environ['SAGEMAKER_SAFE_PORT_RANGE']
+            parts = port_range.split('-')
+            low = int(parts[0])
+            hi = int(parts[1])
+            if low + 1 > hi:
+                raise ValueError('not enough ports available in SAGEMAKER_SAFE_PORT_RANGE ({})',
+                                 port_range)
+            self._tfs_grpc_port = str(low)
+            self._tfs_rest_port = str(low + 1)
+        else:
+            # just use the standard default ports
+            self._tfs_grpc_port = '9000'
+            self._tfs_rest_port = '8501'
+
+    def _create_tfs_config(self):
+        models = self._find_models()
+
+        if not models:
+            raise ValueError('no SavedModel bundles found!')
+
+        if self._tfs_default_model_name is None:
+            self._tfs_default_model_name = os.path.basename(models[0])
+            log.info('using default model name: {}'.format(self._tfs_default_model_name))
+
+        # config (may) include duplicate 'config' keys, so we can't just dump a dict
+        config = 'model_config_list: {\n'
+        for m in models:
+            config += '  config: {\n'
+            config += '    name: "{}",\n'.format(os.path.basename(m))
+            config += '    base_path: "{}",\n'.format(m)
+            config += '    model_platform: "tensorflow"\n'
+            config += '  },\n'
+        config += '}\n'
+
+        log.info('tensorflow serving model config: \n%s\n', config)
+
+        with open('/sagemaker/model-config.cfg', 'w') as f:
+            f.write(config)
+
+    def _find_models(self):
+        base_path = '/opt/ml/model'
+        models = []
+        for f in self._find_saved_model_files(base_path):
+            parts = f.split('/')
+            if len(parts) >= 6 and re.match('^\d+$', parts[-2]):
+                model_path = '/'.join(parts[0:-2])
+                if model_path not in models:
+                    models.append(model_path)
+        return models
+
+    def _find_saved_model_files(self, path):
+        for e in os.scandir(path):
+            if e.is_dir():
+                yield from self._find_saved_model_files(os.path.join(path, e.name))
+            else:
+                if e.name == 'saved_model.pb':
+                    yield os.path.join(path, e.name)
+
+    def _create_nginx_config(self):
+        template = self._read_nginx_template()
+        pattern = re.compile(r'%(\w+)%')
+        template_values = {
+            'TFS_REST_PORT': self._tfs_rest_port,
+            'TFS_DEFAULT_MODEL_NAME': self._tfs_default_model_name,
+            'NGINX_HTTP_PORT': self._nginx_http_port,
+            'NGINX_LOG_LEVEL': self._nginx_loglevel
+        }
+
+        config = pattern.sub(lambda x: template_values[x.group(1)], template)
+        log.info('nginx config: \n%s\n', config)
+
+        with open('/sagemaker/nginx.conf', 'w') as f:
+            f.write(config)
+
+    def _read_nginx_template(self):
+        with open('/sagemaker/nginx.conf.template', 'r') as f:
+            template = f.read()
+            if not template:
+                raise ValueError('failed to read nginx.conf.template')
+
+            return template
+
+    def _start_tfs(self):
+        tfs_config_path = '/sagemaker/model-config.cfg'
+        cmd = "tensorflow_model_server --port={} --rest_api_port={} --model_config_file={}".format(
+            self._tfs_grpc_port, self._tfs_rest_port, tfs_config_path)
+        log.info('tensorflow serving command: {}'.format(cmd))
+        p = subprocess.Popen(cmd.split())
+        log.info('started tensorflow serving (pid: %d)', p.pid)
+        self._tfs = p
+
+    def _start_nginx(self):
+        p = subprocess.Popen('/usr/sbin/nginx -c /sagemaker/nginx.conf'.split())
+        log.info('started nginx (pid: %d)', p.pid)
+        self._nginx = p
+
+    def _stop(self, *args):
+        self._state = 'stopping'
+        log.info('stopping services')
+        try:
+            os.kill(self._nginx.pid, signal.SIGQUIT)
+        except OSError:
+            pass
+        try:
+            os.kill(self._tfs.pid, signal.SIGTERM)
+        except OSError:
+            pass
+
+        self._state = 'stopped'
+        log.info('stopped')
+
+    def start(self):
+        log.info('starting services')
+        self._state = 'starting'
+        signal.signal(signal.SIGTERM, self._stop)
+
+        # TODO set env vars for ports etc
+        self._create_tfs_config()
+        self._create_nginx_config()
+
+        self._start_tfs()
+        self._start_nginx()
+        self._state = 'started'
+
+        while True:
+            pid, status = os.wait()
+
+            if self._state != 'started':
+                break
+
+            if pid == self._nginx.pid:
+                log.warning('unexpected nginx exit (status: {}). restarting.'.format(status))
+                self._start_nginx()
+
+            elif pid == self._tfs.pid:
+                log.warning(
+                    'unexpected tensorflow serving exit (status: {}). restarting.'.format(status))
+                self._start_tfs()
+
+        self._stop()
+
+
+if __name__ == '__main__':
+    ServiceManager().start()
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`		`-Sagemaker Tfs Container`
	`1`	`+Sagemaker TensorFlow Serving Container`
`2`	`2`	`Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+#!/bin/bash`
	`2`	`+`
	`3`	`+python3 /sagemaker/serve.py`