You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ There are two ways to use it: Automatic mode and configurable mode.
22
22
23
23
## Example: Amazon SageMaker Zero-Code-Change
24
24
This example uses a zero-script-change experience, where you can use your training script as-is.
25
-
See the [example notebooks](https://link.com) for more details.
25
+
See the [example notebooks](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for more details.
26
26
```python
27
27
import sagemaker
28
28
from sagemaker.debugger import rule_configs, Rule, CollectionConfig
@@ -88,7 +88,7 @@ print(f"Loss values were {trial.tensor('CrossEntropyLoss:0')}")
88
88
Amazon SageMaker Debugger uses a `hook` to store the values of tensors throughout the training process. Another process called a `rule` job
89
89
simultaneously monitors and validates these outputs to ensure that training is progressing as expected.
90
90
A rule might check for vanishing gradients, or exploding tensor values, or poor weight initialization.
91
-
If a rule is triggered, it will raise a CloudWatch event and stop the training job, saving you time
91
+
If a rule is triggered, it will raise a CloudWatch event, saving you time
92
92
and money.
93
93
94
94
Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are three main use cases:
@@ -99,9 +99,9 @@ Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are
99
99
The reason for different setups is that SageMaker Zero-Script-Change (via Deep Learning Containers) uses custom framework forks of TensorFlow, PyTorch, MXNet, and XGBoost to save tensors automatically.
100
100
These framework forks are not available in custom containers or non-SM environments, so you must modify your training script in these environments.
101
101
102
-
See the [SageMaker page](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
102
+
See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
103
103
See the frameworks pages for details on modifying the training script:
Copy file name to clipboardExpand all lines: docs/api.md
+8-10Lines changed: 8 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
2
2
# Common API
3
3
These objects exist across all frameworks.
4
-
-[SageMaker Zero-Code-Change vs. Python API](#sagemaker)
5
4
-[Creating a Hook](#creating-a-hook)
6
5
-[Hook from SageMaker](#hook-from-sagemaker)
7
6
-[Hook from Python](#hook-from-python)
@@ -14,8 +13,7 @@ These objects exist across all frameworks.
14
13
15
14
The imports assume `import smdebug.{tensorflow,pytorch,mxnet,xgboost} as smd`.
16
15
17
-
**Hook**: The main interface to use training. This object can be passed as a model hook/callback
18
-
in Tensorflow and Keras. It keeps track of collections and writes output files at each step.
16
+
**Hook**: The main class to pass as a callback object, or to create callback functions. It keeps track of collections and writes output files at each step.
19
17
-`hook = smd.Hook(out_dir="/tmp/mnist_job")`
20
18
21
19
**Mode**: One of "train", "eval", "predict", or "global". Helpful for segmenting data based on the phase
@@ -32,10 +30,10 @@ tensors to include/exclude.
32
30
**ReductionConfig**: Allows you to save a reduction, such as 'mean' or 'l1 norm', instead of the full tensor.
**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors. See [trials documentation](https://link.com).
33
+
**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors. See [trials documentation](analysis.md).
**Rule**: A condition that will trigger an exception and terminate the training job early, for example a vanishing gradient. See [rules documentation](https://link.com).
36
+
**Rule**: A condition that will trigger an exception, for example a vanishing gradient. See [rules documentation](analysis.md).
39
37
40
38
41
39
---
@@ -44,7 +42,7 @@ tensors to include/exclude.
44
42
45
43
### Hook from SageMaker
46
44
If you create a SageMaker job and specify the hook configuration in the SageMaker Estimator API
47
-
as described in [AWS Docs](https://link.com),
45
+
as described in [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html),
48
46
the a JSON file will be automatically written. You can create a hook from this file by calling
49
47
```python
50
48
hook = smd.{hook_class}.create_from_json_file()
@@ -53,10 +51,10 @@ with no arguments and then use the hook Python API in your script. `hook_class`
53
51
54
52
### Hook from Python
55
53
See the framework-specific pages for more details.
Copy file name to clipboardExpand all lines: docs/mxnet.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# MXNet
2
2
3
-
SageMaker Zero-Code-Change supported container: MXNet 1.6. See [AWS Docs](https://link.com) for more information.\
3
+
SageMaker Zero-Code-Change supported container: MXNet 1.6. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
4
4
Python API supported versions: MXNet 1.4, 1.5, 1.6.
Copy file name to clipboardExpand all lines: docs/pytorch.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# PyTorch
2
2
3
-
SageMaker Zero-Code-Change supported containers: PyTorch 1.3. See [AWS Docs](https://link.com) for more information.\
3
+
SageMaker Zero-Code-Change supported containers: PyTorch 1.3. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
4
4
Python API supported versions: 1.2, 1.3.
5
5
6
6
## Contents
@@ -71,8 +71,8 @@ for (inputs, labels) in trainloader:
71
71
```
72
72
73
73
## Full API
74
-
See the [Common API](https://link.com) page for details about Collection, SaveConfig, and ReductionConfig.\
75
-
See the [Analysis](https://link.com) page for details about analyzing a training job.
74
+
See the [Common API](api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
75
+
See the [Analysis](analysis.md) page for details about analyzing a training job.
Copy file name to clipboardExpand all lines: docs/sagemaker.md
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -13,10 +13,10 @@ These framework forks are not available in custom containers or non-SM environme
13
13
14
14
This configuration is used for both ZCC and BYOC. The only difference is that with a custom container, you modify your training script as well. See the framework pages below for details on how to modify your training script.
0 commit comments