Skip to content

Commit 4cfc85e

Browse files
jarednielsenrahul003
authored andcommitted
Docs (aws#96)
* Fix yet another flaky test that doesn't do what it should * beef up pt and mxnet
1 parent e566abb commit 4cfc85e

File tree

3 files changed

+79
-9
lines changed

3 files changed

+79
-9
lines changed

docs/mxnet.md

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,44 @@
11
# MXNet
22

3-
SageMaker Zero-Code-Change supported container: MXNet 1.6. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
4-
Python API supported versions: MXNet 1.4, 1.5, 1.6.
5-
63
## Contents
4+
- [Support](#support)
5+
- [How to Use](#how-to-use)
76
- [Example](#mxnet-example)
87
- [Full API](#full-api)
98

9+
---
10+
11+
## Support
12+
13+
### Versions
14+
- Zero Script Change experience where you need no modifications to your training script is supported in the official [SageMaker Framework Container for MXNet 1.6](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html), or the [AWS Deep Learning Container for MXNet 1.6](https://aws.amazon.com/machine-learning/containers/).
15+
16+
- This library itself supports the following versions when you use our API which requires a few minimal changes to your training script: MXNet 1.4, 1.5, 1.6.
17+
18+
---
19+
20+
## How to Use
21+
### Using Zero Script Change containers
22+
In this case, you don't need to do anything to get the hook running. You are encouraged to configure the hook from the SageMaker python SDK so you can run different jobs with different configurations without having to modify your script. If you want access to the hook to configure certain things which can not be configured through the SageMaker SDK, you can retrieve the hook as follows.
23+
```
24+
import smdebug.mxnet as smd
25+
hook = smd.Hook.create_from_json_file()
26+
```
27+
Note that you can create the hook from smdebug's python API as is being done in the next section even in such containers.
28+
29+
### Bring your own container experience
30+
#### 1. Create a hook
31+
If using SageMaker, you will configure the hook in SageMaker's python SDK using the Estimator class. Instantiate it with
32+
`smd.Hook.create_from_json_file()`. Otherwise, call the hook class constructor, `smd.Hook()`.
33+
34+
#### 2. Register the model to the hook
35+
Call `hook.register_block(net)`.
36+
37+
#### 3. (Optional) Configure Collections, SaveConfig and ReductionConfig
38+
See the [Common API](api.md) page for details on how to do this.
39+
40+
---
41+
1042
## MXNet Example
1143
```python
1244
import smdebug.mxnet as smd

docs/pytorch.md

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,47 @@
11
# PyTorch
22

3-
SageMaker Zero-Code-Change supported containers: PyTorch 1.3. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
4-
Python API supported versions: 1.2, 1.3.
5-
63
## Contents
4+
- [Support](#support)
5+
- [How to Use](#how-to-use)
76
- [Module Loss Example](#module-loss-example)
87
- [Functional Loss Example](#functional-loss-example)
98
- [Full API](#full-api)
109

10+
## Support
11+
12+
### Versions
13+
- Zero Script Change experience where you need no modifications to your training script is supported in the official [SageMaker Framework Container for PyTorch 1.3](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html), or the [AWS Deep Learning Container for PyTorch 1.3](https://aws.amazon.com/machine-learning/containers/).
14+
15+
- The library itself supports the following versions when using changes to the training script: PyTorch 1.2, 1.3.
16+
17+
---
18+
19+
## How to Use
20+
### Using Zero Script Change containers
21+
In this case, you don't need to do anything to get the hook running. You are encouraged to configure the hook from the SageMaker python SDK so you can run different jobs with different configurations without having to modify your script. If you want access to the hook to configure certain things which can not be configured through the SageMaker SDK, you can retrieve the hook as follows.
22+
```
23+
import smdebug.pytorch as smd
24+
hook = smd.Hook.create_from_json_file()
25+
```
26+
Note that you can create the hook from smdebug's python API as is being done in the next section even in such containers.
27+
28+
### Bring your own container experience
29+
#### 1. Create a hook
30+
If using SageMaker, you will configure the hook in SageMaker's python SDK using the Estimator class. Instantiate it with
31+
`smd.Hook.create_from_json_file()`. Otherwise, call the hook class constructor, `smd.Hook()`.
32+
33+
#### 2. Register the model to the hook
34+
Call `hook.register_module(net)`.
35+
36+
#### 3. Register your loss function to the hook
37+
If using a loss which is a subclass of `nn.Module`, call `hook.register_loss(loss_criterion)` once before starting training.\
38+
If using a loss which is a subclass of `nn.functional`, call `hook.record_tensor_value(loss)` after each training step.
39+
40+
#### 4. (Optional) Configure Collections, SaveConfig and ReductionConfig
41+
See the [Common API](api.md) page for details on how to do this.
42+
43+
---
44+
1145
## Module Loss Example
1246
```python
1347
import smdebug.pytorch as smd
@@ -38,6 +72,8 @@ for (inputs, labels) in trainloader:
3872
optimizer.step()
3973
```
4074

75+
---
76+
4177
## Functional Loss Example
4278
```python
4379
import smdebug.pytorch as smd
@@ -70,6 +106,8 @@ for (inputs, labels) in trainloader:
70106
optimizer.step()
71107
```
72108

109+
---
110+
73111
## Full API
74112
See the [Common API](api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
75113
See the [Analysis](analysis.md) page for details about analyzing a training job.

tests/mxnet/test_training_end.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ def test_end_local_training():
3434
@pytest.mark.slow # 0:04 to run
3535
def test_end_s3_training():
3636
run_id = str(uuid.uuid4())
37-
bucket = "smdebugcodebuildtest"
38-
key = "newlogsRunTest/" + run_id
39-
out_dir = bucket + "/" + key
37+
bucket = "smdebug-testing"
38+
key = f"outputs/{uuid.uuid4()}"
39+
out_dir = "s3://" + bucket + "/" + key
4040
assert has_training_ended(out_dir) == False
4141
subprocess.check_call(
4242
[

0 commit comments

Comments
 (0)