Skip to content

Commit 09c9638

Browse files
authored
Restructure examples for TF, and removed example notebooks everywhere (except 1) (aws#72)
* Restructure examples for TF, and removed example notebooks everywhere * Fix path in script * Fix xgboost versions in doc * Fix hook not being passed to train method, and opt var clash * Move scripts out * Run mirrored strategy and move scripts into main guards * Remove unsupported distributed training scripts And removed mirrored strategy script as it's failing even without hook * Add point about actions in overview * Fix markdown table syntax error * Add readme section * Add table * Add table * Update README.md * Update README.md * headers and links * Updated path to example
1 parent 18ffc2c commit 09c9638

39 files changed

+776
-11886
lines changed

README.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Amazon SageMaker Debugger
22

33
- [Overview](#overview)
4-
- [Examples](#sagemaker-example)
4+
- [Examples](#examples)
55
- [How It Works](#how-it-works)
6+
- [Docs](#docs)
67

78
## Overview
89
Amazon SageMaker Debugger is an offering from AWS which help you automate the debugging of machine learning training jobs.
@@ -15,6 +16,7 @@ It supports TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6+.
1516
- Real-time training job monitoring through Rules
1617
- Automated anomaly detection and state assertions
1718
- Interactive exploration of saved tensors
19+
- Actions on your training jobs based on the status of Rules
1820
- Distributed training support
1921
- TensorBoard support
2022

@@ -51,6 +53,12 @@ sagemaker_simple_estimator = sm.tensorflow.TensorFlow(
5153
)
5254

5355
sagemaker_simple_estimator.fit()
56+
tensors_path = sagemaker_simple_estimator.latest_job_debugger_artifacts_path()
57+
58+
import smdebug as smd
59+
trial = smd.trials.create_trial(out_dir=tensors_path)
60+
print(f"Saved these tensors: {trial.tensor_names()}")
61+
print(f"Loss values during evaluation were {trial.tensor('CrossEntropyLoss:0').values(mode=smd.modes.EVAL)}")
5462
```
5563

5664
That's it! Amazon SageMaker will automatically monitor your training job for you with the Rules specified and create a CloudWatch
@@ -101,12 +109,15 @@ Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are
101109
The reason for different setups is that SageMaker Zero-Script-Change (via Deep Learning Containers) uses custom framework forks of TensorFlow, PyTorch, MXNet, and XGBoost to save tensors automatically.
102110
These framework forks are not available in custom containers or non-SM environments, so you must modify your training script in these environments.
103111

104-
See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and Bring-Your-Own-Container (BYOC) experience.\
105-
See the frameworks pages for details on modifying the training script:
106-
- [TensorFlow](docs/tensorflow.md)
107-
- [PyTorch](docs/pytorch.md)
108-
- [MXNet](docs/mxnet.md)
109-
- [XGBoost](docs/xgboost.md)
112+
## Docs
113+
114+
| Section | Description |
115+
| --- | --- |
116+
| [SageMaker Training](docs/sagemaker.md) | SageMaker users, we recommend you start with this page on how to run SageMaker training jobs with SageMaker Debugger |
117+
| Frameworks <ul><li>[TensorFlow](docs/tensorflow.md)</li><li>[PyTorch](docs/pytorch.md)</li><li>[MXNet](docs/mxnet.md)</li><li>[XGBoost](docs/xgboost.md)</li></ul> | See the frameworks pages for details on what's supported and how to modify your training script if applicable |
118+
| [Programming Model for Analysis](docs/analysis.md) | For description of the programming model provided by our APIs which allows you to perform interactive exploration of tensors saved as well as to write your own Rules monitoring your training jobs. |
119+
| [APIs](docs/api.md) | Full description of our APIs |
120+
110121

111122
## License
112123
This library is licensed under the Apache 2.0 License.

docs/sagemaker.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Here's a list of frameworks and versions which support this experience.
2929
| [TensorFlow](tensorflow.md) | 1.15 |
3030
| [MXNet](mxnet.md) | 1.6 |
3131
| [PyTorch](pytorch.md) | 1.3 |
32-
| [XGBoost](xgboost.md) | |
32+
| [XGBoost](xgboost.md) | >=0.90-2 [As Built-in algorithm](xgboost.md#use-xgboost-as-a-built-in-algorithm)|
3333

3434
More details for the deep learning frameworks on which containers these are can be found here: [SageMaker Framework Containers](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html) and [AWS Deep Learning Containers](https://aws.amazon.com/machine-learning/containers/). You do not have to specify any training container image if you want to use them on SageMaker. You only need to specify the version above to use these containers.
3535

@@ -43,7 +43,7 @@ This library `smdebug` itself supports versions other than the ones listed above
4343
| Keras (with TensorFlow backend) | 2.3 |
4444
| [MXNet](mxnet.md) | 1.4, 1.5, 1.6 |
4545
| [PyTorch](pytorch.md) | 1.2, 1.3 |
46-
| [XGBoost](xgboost.md) | |
46+
| [XGBoost](xgboost.md) | [As Framework](xgboost.md#use-xgboost-as-a-framework) |
4747

4848
#### Setting up SageMaker Debugger with your script on your container
4949

@@ -189,7 +189,7 @@ The Built-in Rules, or SageMaker Rules, are described in detail on [this page](h
189189
Scope of Validity | Rules |
190190
|---|---|
191191
| Generic Deep Learning models (TensorFlow, Apache MXNet, and PyTorch) |<ul><li>[`dead_relu`](https://docs.aws.amazon.com/sagemaker/latest/dg/dead-relu.html)</li><li>[`exploding_tensor`](https://docs.aws.amazon.com/sagemaker/latest/dg/exploding-tensor.html)</li><li>[`poor_weight_initialization`](https://docs.aws.amazon.com/sagemaker/latest/dg/poor-weight-initialization.html)</li><li>[`saturated_activation`](https://docs.aws.amazon.com/sagemaker/latest/dg/saturated-activation.html)</li><li>[`vanishing_gradient`](https://docs.aws.amazon.com/sagemaker/latest/dg/vanishing-gradient.html)</li><li>[`weight_update_ratio`](https://docs.aws.amazon.com/sagemaker/latest/dg/weight-update-ratio.html)</li></ul> |
192-
| Generic Deep learning models (TensorFlow, MXNet, and PyTorch) and the XGBoost algorithm | <ul><li>[`all_zero`](https://docs.aws.amazon.com/sagemaker/latest/dg/all-zero.html)</li><li>[`class_imbalance`](https://docs.aws.amazon.com/sagemaker/latest/dg/class-imbalance.html)</li><li>[`confusion`](https://docs.aws.amazon.com/sagemaker/latest/dg/confusion.html)</li><li>[`loss_not_decreasing`](https://docs.aws.amazon.com/sagemaker/latest/dg/loss-not-decreasing.html)</li><li>[`overfit`](https://docs.aws.amazon.com/sagemaker/latest/dg/overfit.html)</li><li>[`overtraining`](https://docs.aws.amazon.com/sagemaker/latest/dg/overtraining.html)</li><li>[`similar_across_runs`](https://docs.aws.amazon.com/sagemaker/latest/dg/similar-across-runs.html)</li><li>[`tensor_variance`](https://docs.aws.amazon.com/sagemaker/latest/dg/tensor-variance.html)</li><li>[`unchanged_tensor`](https://docs.aws.amazon.com/sagemaker/latest/dg/unchanged-tensor.html)</li>/ul>|
192+
| Generic Deep learning models (TensorFlow, MXNet, and PyTorch) and the XGBoost algorithm | <ul><li>[`all_zero`](https://docs.aws.amazon.com/sagemaker/latest/dg/all-zero.html)</li><li>[`class_imbalance`](https://docs.aws.amazon.com/sagemaker/latest/dg/class-imbalance.html)</li><li>[`confusion`](https://docs.aws.amazon.com/sagemaker/latest/dg/confusion.html)</li><li>[`loss_not_decreasing`](https://docs.aws.amazon.com/sagemaker/latest/dg/loss-not-decreasing.html)</li><li>[`overfit`](https://docs.aws.amazon.com/sagemaker/latest/dg/overfit.html)</li><li>[`overtraining`](https://docs.aws.amazon.com/sagemaker/latest/dg/overtraining.html)</li><li>[`similar_across_runs`](https://docs.aws.amazon.com/sagemaker/latest/dg/similar-across-runs.html)</li><li>[`tensor_variance`](https://docs.aws.amazon.com/sagemaker/latest/dg/tensor-variance.html)</li><li>[`unchanged_tensor`](https://docs.aws.amazon.com/sagemaker/latest/dg/unchanged-tensor.html)</li></ul>|
193193
| Deep learning applications |<ul><li>[`check_input_images`](https://docs.aws.amazon.com/sagemaker/latest/dg/checkinput-mages.html)</li><li>[`nlp_sequence_ratio`](https://docs.aws.amazon.com/sagemaker/latest/dg/nlp-sequence-ratio.html)</li></ul> |
194194
| XGBoost algorithm | <ul><li>[`tree_depth`](https://docs.aws.amazon.com/sagemaker/latest/dg/tree-depth.html)</li></ul>|
195195

docs/xgboost.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010
### Use XGBoost as a built-in algorithm
1111

1212
The XGBoost algorithm can be used 1) as a built-in algorithm, or 2) as a framework such as MXNet, PyTorch, or Tensorflow.
13-
If SageMaker XGBoost is used as a built-in algorithm in container verision `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience).
13+
If SageMaker XGBoost is used as a built-in algorithm in container version `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience).
1414
See [XGBoost Algorithm AWS docmentation](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) for more information on how to use XGBoost as a built-in algorithm.
15-
See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Aamazon SageMaker Debugger.
15+
See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Amazon SageMaker Debugger.
1616
See [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) for more information on how to configure the Amazon SageMaker Debugger from the Python SDK.
1717

1818
### Use XGBoost as a framework

examples/mxnet/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
## Example Notebooks
2+
Please refer to the example notebooks in [Amazon SageMaker Examples repository](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger)

0 commit comments

Comments
 (0)