aws · knakad · Aug 23, 2019 · Jul 25, 2019 · Jul 26, 2019 · Jul 29, 2019
@@ -299,6 +299,114 @@ Here are some examples of creating estimators with Git support:
 Git support can be used not only for training jobs, but also for hosting models. The usage is the same as the above,
 and ``git_config`` should be provided when creating model objects, e.g. ``TensorFlowModel``, ``MXNetModel``, ``PyTorchModel``.
 
+Use File Systems as Training Inputs
+-------------------------------------
+Amazon SageMaker supports using Amazon Elastic File System (EFS) and FSx for Lustre as data sources to use during training.
+If you want use those data sources, create a file system (EFS/FSx) and mount the file system on an Amazon EC2 instance.
+For more information about setting up EFS and FSx, see the following documentation:
+
+- `Using File Systems in Amazon EFS <https://docs.aws.amazon.com/efs/latest/ug/using-fs.html>`__
+- `Getting Started with Amazon FSx for Lustre <https://aws.amazon.com/fsx/lustre/getting-started/>`__
+
+The general experience uses either the ``FileSystemInput`` or ``FileSystemRecordSet`` class, which encapsulates
+all of the necessary arguments required by the service to use EFS or Lustre.
+
+Here are examples of how to use Amazon EFS as input for training:
+
+.. code:: python
+
+        # This example shows how to use FileSystemInput class
+        # Configure an estimator with subnets and security groups from your VPC. The EFS volume must be in
+        # the same VPC as your Amazon EC2 instance
+        estimator = TensorFlow(entry_point='tensorflow_mnist/mnist.py',
+                               role='SageMakerRole',
+                               train_instance_count=1,
+                               train_instance_type='ml.c4.xlarge',
+                               subnets=['subnet-1', 'subnet-2']
+                               security_group_ids=['sg-1'])
+
+        file_system_input = FileSystemInput(file_system_id='fs-1',
+                                            file_system_type='EFS',
+                                            directory_path='tensorflow',
+                                            file_system_access_mode='ro')
+
+        # Start an Amazon SageMaker training job with EFS using the FileSystemInput class
+        estimator.fit(file_system_input)
+
+.. code:: python
+
+        # This example shows how to use FileSystemRecordSet class
+        # Configure an estimator with subnets and security groups from your VPC. The EFS volume must be in
+        # the same VPC as your Amazon EC2 instance
+        kmeans = KMeans(role='SageMakerRole',
+                        train_instance_count=1,
+                        train_instance_type='ml.c4.xlarge',
+                        k=10,
+                        subnets=['subnet-1', 'subnet-2'],
+                        security_group_ids=['sg-1'])
+
+        records = FileSystemRecordSet(file_system_id='fs-1,
+                                      file_system_type='EFS',
+                                      directory_path='kmeans',
+                                      num_records=784,
+                                      feature_dim=784)
+
+        # Start an Amazon SageMaker training job with EFS using the FileSystemRecordSet class
+        kmeans.fit(records)
+
+Here are examples of how to use Amazon FSx for Lustre as input for training:
+
+.. code:: python
+
+        # This example shows how to use FileSystemInput class
+        # Configure an estimator with subnets and security groups from your VPC. The VPC should be the same as that
+        # you chose for your Amazon EC2 instance
+
+        estimator = TensorFlow(entry_point='tensorflow_mnist/mnist.py',
+                               role='SageMakerRole',
+                               train_instance_count=1,
+                               train_instance_type='ml.c4.xlarge',
+                               subnets=['subnet-1', 'subnet-2']
+                               security_group_ids=['sg-1'])
+
+
+        file_system_input = FileSystemInput(file_system_id='fs-2',
+                                            file_system_type='FSxLustre',
+                                            directory_path='tensorflow',
+                                            file_system_access_mode='ro')
+
+        # Start an Amazon SageMaker training job with FSx using the FileSystemInput class
+        estimator.fit(file_system_input)
+
+.. code:: python
+
+        # This example shows how to use FileSystemRecordSet class
+        # Configure an estimator with subnets and security groups from your VPC. The VPC should be the same as that
+        # you chose for your Amazon EC2 instance
+        kmeans = KMeans(role='SageMakerRole',
+                        train_instance_count=1,
+                        train_instance_type='ml.c4.xlarge',
+                        k=10,
+                        subnets=['subnet-1', 'subnet-2'],
+                        security_group_ids=['sg-1'])
+
+        records = FileSystemRecordSet(file_system_id='fs-=2,
+                                      file_system_type='FSxLustre',
+                                      directory_path='kmeans',
+                                      num_records=784,
+                                      feature_dim=784)
+
+        # Start an Amazon SageMaker training job with FSx using the FileSystemRecordSet class
+        kmeans.fit(records)
+
+Data sources from EFS and FSx can also be used for hyperparameter tuning jobs. The usage is the same as above.
+
+A few important notes:
+
+- Local mode is not supported if using EFS and FSx as data sources
+
+- Pipe mode is not supported if using EFS as data source
+
 Training Metrics
 ----------------
 The SageMaker Python SDK allows you to specify a name and a regular expression for metrics you want to track for training.

@@ -1,4 +1,4 @@
-# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License"). You
 # may not use this file except in compliance with the License. A copy of
@@ -34,14 +34,15 @@ def read_version():
 
 # Declare minimal set for installation
 required_packages = [
-    "boto3>=1.9.169",
+    "boto3>=1.9.213",
     "numpy>=1.9.0",
     "protobuf>=3.1",
     "scipy>=0.19.0",
     "urllib3>=1.21, <1.25",
     "protobuf3-to-dict>=0.1.5",
     "docker-compose>=1.23.0",
     "requests>=2.20.0, <2.21",
+    "fabric>=2.0",
 ]
 
 # enum is introduced in Python 3.4. Installing enum back port

@@ -1,4 +1,4 @@
-# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License"). You
 # may not use this file except in compliance with the License. A copy of
@@ -23,6 +23,7 @@
 from sagemaker.amazon.hyperparameter import Hyperparameter as hp  # noqa
 from sagemaker.amazon.common import write_numpy_to_dense_tensor
 from sagemaker.estimator import EstimatorBase, _TrainingJob
+from sagemaker.inputs import FileSystemInput
 from sagemaker.model import NEO_IMAGE_ACCOUNT
 from sagemaker.session import s3_input
 from sagemaker.utils import sagemaker_timestamp, get_ecr_image_uri_prefix
@@ -281,6 +282,55 @@ def records_s3_input(self):
         return s3_input(self.s3_data, distribution="ShardedByS3Key", s3_data_type=self.s3_data_type)
 
 
+class FileSystemRecordSet(object):
+    """Amazon SageMaker channel configuration for a file system data source
+    for Amazon algorithms.
+    """
+
+    def __init__(
+        self,
+        file_system_id,
+        file_system_type,
+        directory_path,
+        num_records,
+        feature_dim,
+        file_system_access_mode="ro",
+        channel="train",
+    ):
+        """Initialize a ``FileSystemRecordSet`` object.
+
+        Args:
+            file_system_id (str): An Amazon file system ID starting with 'fs-'.
+            file_system_type (str): The type of file system used for the input.
+                Valid values: 'EFS', 'FSxLustre'.
+            directory_path (str): Relative path to the root directory (mount point) in
+                the file system. Reference:
+                https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html and
+                https://docs.aws.amazon.com/efs/latest/ug/wt1-test.html
+            num_records (int): The number of records in the set.
+            feature_dim (int): The dimensionality of "values" arrays in the Record features,
+                and label (if each Record is labeled).
+            file_system_access_mode (str): Permissions for read and write.
+                Valid values: 'ro' or 'rw'. Defaults to 'ro'.
+            channel (str): The SageMaker Training Job channel this RecordSet should be bound to
+        """
+
+        self.file_system_input = FileSystemInput(
+            file_system_id, file_system_type, directory_path, file_system_access_mode
+        )
+        self.feature_dim = feature_dim
+        self.num_records = num_records
+        self.channel = channel
+
+    def __repr__(self):
+        """Return an unambiguous representation of this RecordSet"""
+        return str((FileSystemRecordSet, self.__dict__))
+
+    def data_channel(self):
+        """Return a dictionary to represent the training data in a channel for use with ``fit()``"""
+        return {self.channel: self.file_system_input}
+
+
 def _build_shards(num_shards, array):
     """
     Args:

@@ -308,21 +308,21 @@ def fit(self, inputs=None, wait=True, logs=True, job_name=None):
                 about the training data. This can be one of three types:
 
                 * (str) the S3 location where training data is saved.
-
                 * (dict[str, str] or dict[str, sagemaker.session.s3_input]) If using multiple
                     channels for training data, you can specify a dict mapping channel names to
                     strings or :func:`~sagemaker.session.s3_input` objects.
-
                 * (sagemaker.session.s3_input) - channel configuration for S3 data sources that can
                     provide additional information as well as the path to the training dataset.
                     See :func:`sagemaker.session.s3_input` for full details.
-            wait (bool): Whether the call should wait until the job completes
-                (default: True).
-            logs (bool): Whether to show the logs produced by the job. Only
-                meaningful when wait is True (default: True).
-            job_name (str): Training job name. If not specified, the estimator
-                generates a default job name, based on the training image name
-                and current timestamp.
+                * (sagemaker.session.FileSystemInput) - channel configuration for
+                    a file system data source that can provide additional information as well as
+                    the path to the training dataset.
+
+            wait (bool): Whether the call should wait until the job completes (default: True).
+            logs (bool): Whether to show the logs produced by the job.
+                Only meaningful when wait is True (default: True).
+            job_name (str): Training job name. If not specified, the estimator generates
+                a default job name, based on the training image name and current timestamp.
         """
         self._prepare_for_training(job_name=job_name)
 

@@ -0,0 +1,146 @@
+# Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"). You
+# may not use this file except in compliance with the License. A copy of
+# the License is located at
+#
+#     http://aws.amazon.com/apache2.0/
+#
+# or in the "license" file accompanying this file. This file is
+# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
+# ANY KIND, either express or implied. See the License for the specific
+# language governing permissions and limitations under the License.
+"""Amazon SageMaker channel configurations for S3 data sources and file system data sources"""
+from __future__ import absolute_import, print_function
+
+FILE_SYSTEM_TYPES = ["FSxLustre", "EFS"]
+FILE_SYSTEM_ACCESS_MODES = ["ro", "rw"]
+
+
+class s3_input(object):
+    """Amazon SageMaker channel configurations for S3 data sources.
+
+    Attributes:
+        config (dict[str, dict]): A SageMaker ``DataSource`` referencing
+            a SageMaker ``S3DataSource``.
+    """
+
+    def __init__(
+        self,
+        s3_data,
+        distribution="FullyReplicated",
+        compression=None,
+        content_type=None,
+        record_wrapping=None,
+        s3_data_type="S3Prefix",
+        input_mode=None,
+        attribute_names=None,
+        shuffle_config=None,
+    ):
+        """Create a definition for input data used by an SageMaker training job.
+        See AWS documentation on the ``CreateTrainingJob`` API for more details on the parameters.
+
+        Args:
+            s3_data (str): Defines the location of s3 data to train on.
+            distribution (str): Valid values: 'FullyReplicated', 'ShardedByS3Key'
+                (default: 'FullyReplicated').
+            compression (str): Valid values: 'Gzip', None (default: None). This is used only in
+                Pipe input mode.
+            content_type (str): MIME type of the input data (default: None).
+            record_wrapping (str): Valid values: 'RecordIO' (default: None).
+            s3_data_type (str): Valid values: 'S3Prefix', 'ManifestFile', 'AugmentedManifestFile'.
+                If 'S3Prefix', ``s3_data`` defines a prefix of s3 objects to train on.
+                All objects with s3 keys beginning with ``s3_data`` will be used to train.
+                If 'ManifestFile' or 'AugmentedManifestFile', then ``s3_data`` defines a
+                single S3 manifest file or augmented manifest file (respectively),
+                listing the S3 data to train on. Both the ManifestFile and
+                AugmentedManifestFile formats are described in the SageMaker API documentation:
+                https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html
+            input_mode (str): Optional override for this channel's input mode (default: None).
+                By default, channels will use the input mode defined on
+                ``sagemaker.estimator.EstimatorBase.input_mode``, but they will ignore
+                that setting if this parameter is set.
+
+                    * None - Amazon SageMaker will use the input mode specified in the ``Estimator``
+                    * 'File' - Amazon SageMaker copies the training dataset from the S3 location to
+                        a local directory.
+                    * 'Pipe' - Amazon SageMaker streams data directly from S3 to the container via
+                        a Unix-named pipe.
+
+            attribute_names (list[str]): A list of one or more attribute names to use that are
+                found in a specified AugmentedManifestFile.
+            shuffle_config (ShuffleConfig): If specified this configuration enables shuffling on
+                this channel. See the SageMaker API documentation for more info:
+                https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html
+        """
+
+        self.config = {
+            "DataSource": {
+                "S3DataSource": {
+                    "S3DataDistributionType": distribution,
+                    "S3DataType": s3_data_type,
+                    "S3Uri": s3_data,
+                }
+            }
+        }
+
+        if compression is not None:
+            self.config["CompressionType"] = compression
+        if content_type is not None:
+            self.config["ContentType"] = content_type
+        if record_wrapping is not None:
+            self.config["RecordWrapperType"] = record_wrapping
+        if input_mode is not None:
+            self.config["InputMode"] = input_mode
+        if attribute_names is not None:
+            self.config["DataSource"]["S3DataSource"]["AttributeNames"] = attribute_names
+        if shuffle_config is not None:
+            self.config["ShuffleConfig"] = {"Seed": shuffle_config.seed}
+
+
+class FileSystemInput(object):
+    """Amazon SageMaker channel configurations for file system data sources.
+
+    Attributes:
+        config (dict[str, dict]): A Sagemaker File System ``DataSource``.
+    """
+
+    def __init__(
+        self, file_system_id, file_system_type, directory_path, file_system_access_mode="ro"
+    ):
+        """Create a new file system input used by an SageMaker training job.
+
+        Args:
+            file_system_id (str): An Amazon file system ID starting with 'fs-'.
+            file_system_type (str): The type of file system used for the input.
+                Valid values: 'EFS', 'FSxLustre'.
+            directory_path (str): Relative path to the root directory (mount point) in
+                the file system.
+                Reference: https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html and
+                https://docs.aws.amazon.com/fsx/latest/LustreGuide/mount-fs-auto-mount-onreboot.html
+            file_system_access_mode (str): Permissions for read and write.
+                Valid values: 'ro' or 'rw'. Defaults to 'ro'.
+        """
+
+        if file_system_type not in FILE_SYSTEM_TYPES:
+            raise ValueError(
+                "Unrecognized file system type: %s. Valid values: %s."
+                % (file_system_type, ", ".join(FILE_SYSTEM_TYPES))
+            )
+
+        if file_system_access_mode not in FILE_SYSTEM_ACCESS_MODES:
+            raise ValueError(
+                "Unrecognized file system access mode: %s. Valid values: %s."
+                % (file_system_access_mode, ", ".join(FILE_SYSTEM_ACCESS_MODES))
+            )
+
+        self.config = {
+            "DataSource": {
+                "FileSystemDataSource": {
+                    "FileSystemId": file_system_id,
+                    "FileSystemType": file_system_type,
+                    "DirectoryPath": directory_path,
+                    "FileSystemAccessMode": file_system_access_mode,
+                }
+            }
+        }