aws
diff --git a/‎.gitignore
Lines changed: 3 additions & 1 deletion b/‎.gitignore
Lines changed: 3 additions & 1 deletion
diff --git a/‎CHANGELOG.md
Lines changed: 53 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 53 additions & 0 deletions
diff --git a/‎VERSION
Lines changed: 1 addition & 1 deletion b/‎VERSION
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/_static/theme_overrides.css
Lines changed: 10 additions & 0 deletions b/‎doc/_static/theme_overrides.css
Lines changed: 10 additions & 0 deletions
diff --git a/‎doc/api/index.rst
Lines changed: 1 addition & 0 deletions b/‎doc/api/index.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/api/training/debugger.rst
Lines changed: 75 additions & 3 deletions b/‎doc/api/training/debugger.rst
Lines changed: 75 additions & 3 deletions
diff --git a/‎doc/api/training/distributed.rst
Lines changed: 11 additions & 0 deletions b/‎doc/api/training/distributed.rst
Lines changed: 11 additions & 0 deletions
diff --git a/‎doc/api/training/index.rst
Lines changed: 9 additions & 3 deletions b/‎doc/api/training/index.rst
Lines changed: 9 additions & 3 deletions
diff --git a/‎doc/api/training/processing.rst
Lines changed: 5 additions & 0 deletions b/‎doc/api/training/processing.rst
Lines changed: 5 additions & 0 deletions
diff --git a/‎doc/api/training/smd_data_parallel.rst
Lines changed: 64 additions & 0 deletions b/‎doc/api/training/smd_data_parallel.rst
Lines changed: 64 additions & 0 deletions
@@ -25,4 +25,6 @@ venv/
 *~
 .pytest_cache/
 *.swp
-.docker/
+.docker/
+env/
+.vscode/
@@ -1,5 +1,58 @@
 # Changelog
 
+## v2.19.0 (2020-12-08)
+
+### Features
+
+ * add tensorflow 1.15.4 and 2.3.1 as valid versions
+ * add py36 as valid python version for pytorch 1.6.0
+ * auto-select container version for p4d and smdistributed
+ * add edge packaging job support
+ * Add Clarify Processor, Model Bias, Explainability, and Quality Monitors support. (#494)
+ * add model parallelism support
+ * add data parallelism support (#454) (#511)
+ * support creating and updating profiler in training job (#444) (#526)
+
+### Bug Fixes and Other Changes
+
+ * bump boto3 and smdebug_rulesconfig versions for reinvent and enable data parallel integ tests
+ * run UpdateTrainingJob tests only during allowed secondary status
+ * Remove workarounds and apply fixes to Clarify and MM integ tests
+ * add p4d to smdataparallel supported instances
+ * Mount metadata directory when starting local mode docker container
+ * add integ test for profiler
+ * Re-enable model monitor integration tests.
+
+### Documentation Changes
+
+ * add SageMaker distributed libraries documentation
+ * update documentation for the new SageMaker Debugger APIs
+ * minor updates to doc strings
+
+## v2.18.0 (2020-12-03)
+
+### Features
+
+ * all de/serializers support content type
+ * warn on 'Stopped' (non-Completed) jobs
+ * all predictors support serializer/deserializer overrides
+
+### Bug Fixes and Other Changes
+
+ * v2 upgrade tool should ignore cell starting with '%'
+ * use iterrows to iterate pandas dataframe
+ * check for distributions in TF estimator
+
+### Documentation Changes
+
+ * Update link to Sagemaker PyTorch Docker Containers
+ * create artifact restricted to SM context note
+
+### Testing and Release Infrastructure
+
+ * remove flaky assertion in test_integ_history_server
+ * adjust assertion of TensorFlow MNIST test
+
 ## v2.17.0 (2020-12-02)
 
 ### Features
 
@@ -1 +1 @@
-2.17.1.dev0
+2.19.1.dev0
@@ -0,0 +1,10 @@
+/* override table width restrictions */
+.wy-table-responsive table td, .wy-table-responsive table th {
+    white-space: normal;
+}
+
+.wy-table-responsive {
+    margin-bottom: 24px;
+    max-width: 100%;
+    overflow: visible;
+}
@@ -9,5 +9,6 @@ The SageMaker Python SDK consists of a variety classes for preparing data, train
 
     prep_data/feature_store
     training/index
+    training/distributed
     inference/index
     utility/index
@@ -1,7 +1,79 @@
 Debugger
 --------
 
-.. automodule:: sagemaker.debugger
-    :members:
-    :undoc-members:
+Amazon SageMaker Debugger provides full visibility
+into training jobs of state-of-the-art machine learning models.
+This SageMaker Debugger module provides high-level methods
+to set up Debugger configurations to
+monitor, profile, and debug your training job.
+Configure the Debugger-specific parameters when constructing
+a SageMaker estimator to gain visibility and insights
+into your training job.
+
+.. currentmodule:: sagemaker.debugger
+
+.. autoclass:: get_rule_container_image_uri
+    :show-inheritance:
+
+.. autoclass:: get_default_profiler_rule
+    :show-inheritance:
+
+.. class:: sagemaker.debugger.rule_configs
+
+    A helper module to configure the SageMaker Debugger built-in rules with
+    the :class:`~sagemaker.debugger.Rule` classmethods and
+    and the :class:`~sagemaker.debugger.ProfilerRule` classmethods.
+
+    For a full list of built-in rules, see
+    `List of Debugger Built-in Rules
+    <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html>`_.
+
+    This module is imported from the Debugger client library for rule configuration.
+    For more information, see
+    `Amazon SageMaker Debugger RulesConfig
+    <https://github.com/awslabs/sagemaker-debugger-rulesconfig>`_.
+
+.. autoclass:: RuleBase
+    :show-inheritance:
+
+.. autoclass:: Rule
+    :show-inheritance:
+    :inherited-members:
+
+.. autoclass:: ProfilerRule
+    :show-inheritance:
+    :inherited-members:
+
+.. autoclass:: CollectionConfig
+    :show-inheritance:
+
+.. autoclass:: DebuggerHookConfig
     :show-inheritance:
+
+.. autoclass:: TensorBoardOutputConfig
+    :show-inheritance:
+
+.. autoclass:: ProfilerConfig
+    :show-inheritance:
+
+.. autoclass:: FrameworkProfile
+    :show-inheritance:
+
+.. autoclass:: DetailedProfilingConfig
+    :show-inheritance:
+
+.. autoclass:: DataloaderProfilingConfig
+    :show-inheritance:
+
+.. autoclass:: PythonProfilingConfig
+    :show-inheritance:
+
+.. autoclass:: PythonProfiler
+    :show-inheritance:
+
+.. autoclass:: cProfileTimer
+    :show-inheritance:
+
+.. automodule:: sagemaker.debugger.metrics_config
+    :members: StepRange, TimeRange
+    :undoc-members:
@@ -0,0 +1,11 @@
+Distributed Training APIs
+-------------------------
+SageMaker distributed training libraries offer both data parallel and model parallel training strategies.
+They combine software and hardware technologies to improve inter-GPU and inter-node communications.
+They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
+
+.. toctree::
+   :maxdepth: 3
+
+   smd_data_parallel
+   smd_model_parallel
@@ -3,7 +3,13 @@ Training APIs
 #############
 
 .. toctree::
-   :maxdepth: 1
-   :glob:
+   :maxdepth: 4
 
-   *
+   analytics
+   automl
+   debugger
+   estimators
+   algorithm
+   tuner
+   parameter
+   processing
@@ -10,3 +10,8 @@ Processing
     :members:
     :undoc-members:
     :show-inheritance:
+
+.. automodule:: sagemaker.clarify
+    :members:
+    :undoc-members:
+    :show-inheritance:
@@ -0,0 +1,64 @@
+###################################
+Distributed data parallel
+###################################
+
+SageMaker distributed data parallel (SDP) extends SageMaker’s training
+capabilities on deep learning models with near-linear scaling efficiency,
+achieving fast time-to-train with minimal code changes.
+
+- SDP optimizes your training job for AWS network infrastructure and EC2 instance topology.
+- SDP takes advantage of gradient update to communicate between nodes with a custom AllReduce algorithm.
+
+When training a model on a large amount of data, machine learning practitioners
+will often turn to distributed training to reduce the time to train.
+In some cases, where time is of the essence,
+the business requirement is to finish training as quickly as possible or at
+least within a constrained time period.
+Then, distributed training is scaled to use a cluster of multiple nodes,
+meaning not just multiple GPUs in a computing instance, but multiple instances
+with multiple GPUs. As the cluster size increases, so does the significant drop
+in performance. This drop in performance is primarily caused the communications
+overhead between nodes in a cluster.
+
+
+.. rubric:: Customize your training script
+
+To customize your own training script, you will need the following:
+
+.. raw:: html
+
+   <div data-section-style="5" style="">
+
+-  You must provide TensorFlow / PyTorch training scripts that are
+   adapted to use SDP.
+-  Your input data must be in an S3 bucket or in FSx in the AWS region
+   that you will use to launch your training job. If you use the Jupyter
+   notebooks provided, create a SageMaker notebook instance in the same
+   region as the bucket that contains your input data. For more
+   information about storing your training data, refer to
+   the `SageMaker Python SDK data
+   inputs <https://sagemaker.readthedocs.io/en/stable/overview.html#use-file-systems-as-training-inputs>`__ documentation.
+
+.. raw:: html
+
+   </div>
+
+Use the API guides for each framework to see
+examples of training scripts that can be used to convert your training scripts.
+Then, use one of the example notebooks as your template to launch a training job.
+You’ll need to swap your training script with the one that came with the
+notebook and modify any input functions as necessary.
+Once you have launched a training job, you can monitor it using CloudWatch.
+
+Then you can see how to deploy your trained model to an endpoint by
+following one of the example notebooks for deploying a model. Finally,
+you can follow an example notebook to test inference on your deployed
+model.
+
+
+
+.. toctree::
+   :maxdepth: 2
+
+   smd_data_parallel_pytorch
+   smd_data_parallel_tensorflow