@@ -5,71 +5,71 @@ Tornasole MXNet provides the following constructs:
5
5
TornasoleHook is the entry point for Tornasole into your program.
6
6
7
7
```
8
-
8
+
9
9
class TornasoleHook
10
10
"""
11
- A class used to represent the hook which gets attached to the
12
- training process.
13
-
14
-
15
-
11
+ A class used to represent the hook which gets attached to the
12
+ training process.
13
+
14
+
15
+
16
16
Attributes
17
17
----------
18
18
out_dir : str
19
19
represents a path to which the outputs of tornasole will be written to.
20
20
This can be a local path or an S3 prefix of the form s3://bucket_name/prefix.
21
- Note that for Sagemaker, you always need to specify the out_dir as `/opt/ml/output/tensors`.
22
- In the future, we will make this the default in Sagemaker environments.
23
-
21
+ Note that for Sagemaker, you always need to specify the out_dir as `/opt/ml/output/tensors`.
22
+ In the future, we will make this the default in Sagemaker environments.
23
+
24
24
dry_run : bool
25
25
when dry_run is set to True, behavior is only described in the log file.
26
- The tensors are not actually saved.
27
-
26
+ The tensors are not actually saved.
27
+
28
28
worker: str
29
29
name of worker in a multi process training job
30
30
outputs and tensors are organized by this name during retrieval.
31
-
31
+
32
32
save_config: SaveConfig object or a dictionary from mode to SaveConfig objects
33
- SaveConfig allows you to customize when tensors are saved.
34
- Hook takes SaveConfig object which is applied as
33
+ SaveConfig allows you to customize when tensors are saved.
34
+ Hook takes SaveConfig object which is applied as
35
35
default for all included tensors.
36
- A collection can optionally have its own SaveConfig object
36
+ A collection can optionally have its own SaveConfig object
37
37
which overrides this for its tensors.
38
38
If you pass a dictionary from mode->SaveConfig, then that
39
39
SaveConfig is applied to tensors included for that mode.
40
- example: {modes.TRAIN: SaveConfig(save_interval=10),
40
+ example: {modes.TRAIN: SaveConfig(save_interval=10),
41
41
modes.EVAL:SaveConfig(save_interval=1)}
42
42
Refer to documentation for SaveConfig.
43
-
43
+
44
44
reduction_config: ReductionConfig object
45
- ReductionConfig allows you to save tensors as their reductions
46
- instead of saving full tensors.
45
+ ReductionConfig allows you to save tensors as their reductions
46
+ instead of saving full tensors.
47
47
If ReductionConfig is passed then the chosen reductions are applied
48
48
as default for all tensors included.
49
49
A collection can optionally have its own ReductionConfig object
50
- which overrides this for its tensors.
51
-
50
+ which overrides this for its tensors.
51
+
52
52
include_regex: list of str
53
53
takes as input the list of string representing regular expressions. Tensors whose names match
54
54
these regular expressions will be saved. These tensors will be available as part of the `default`
55
55
collection.
56
-
56
+
57
57
include_collections: list of str
58
58
takes as input the names of collections which should be saved.
59
59
by default, ['weights','gradients', 'bias', 'default'] are passed to include_collections.
60
-
60
+
61
61
save_all: bool
62
62
a shortcut for saving all tensors in the model.
63
63
tensors are all grouped into the `default` collection
64
-
64
+
65
65
def __init__(self,
66
66
out_dir,
67
67
dry_run=False,
68
68
worker='worker0',
69
69
reduction_config=None,
70
70
save_config=SaveConfig(save_interval=100),
71
71
include_regex=None,
72
- include_collections=['weights', 'gradients', 'bias', 'default'],
72
+ include_collections=['weights', 'gradients', 'bias', 'default'],
73
73
save_all=False,
74
74
):
75
75
```
@@ -96,10 +96,10 @@ The _reduction\_config_ is optional. If not specified, the reductions are not ap
96
96
97
97
### Collection
98
98
99
- Collection object helps group tensors for easier handling of tensors being saved.
100
- A collection has its own list of tensors, include/exclude regex patterns, reduction config and save config.
101
- This allows setting of different save and reduction configs for different tensors.
102
- These collections are then also available during analysis with ` tornasole_rules ` .
99
+ Collection object helps group tensors for easier handling of tensors being saved.
100
+ A collection has its own list of tensors, include/exclude regex patterns, reduction config and save config.
101
+ This allows setting of different save and reduction configs for different tensors.
102
+ These collections are then also available during analysis with ` tornasole_rules ` .
103
103
104
104
#### Creating or accessing a collection
105
105
@@ -131,75 +131,75 @@ The following methods can be called on a collection object.
131
131
| ``` coll.set_reduction_config() ``` | Sets reduction config for the collection |
132
132
133
133
### SaveConfig
134
- SaveConfig class allows you to customize the frequency of saving tensors.
135
- The hook takes a SaveConfig object which is applied as
136
- default to all tensors included.
137
- A collection can also have its own SaveConfig object which is applied
134
+ SaveConfig class allows you to customize the frequency of saving tensors.
135
+ The hook takes a SaveConfig object which is applied as
136
+ default to all tensors included.
137
+ A collection can also have its own SaveConfig object which is applied
138
138
to the tensors belonging to that collection.
139
139
140
- SaveConfig also allows you to save tensors when certain tensors become nan.
140
+ SaveConfig also allows you to save tensors when certain tensors become nan.
141
141
This list of tensors to watch for is taken as a list of strings representing names of tensors.
142
142
143
143
```
144
-
144
+
145
145
class SaveConfig:
146
-
146
+
147
147
Attributes
148
148
----------
149
-
149
+
150
150
save_interval: int
151
- allows you to save every n steps by passing n to save_interval
152
-
151
+ allows you to save every n steps by passing n to save_interval
152
+
153
153
skip_num_steps: int
154
154
allows you to avoid saving for the first n steps of the job.
155
155
it defaults to 0, i.e. don't skip any steps in the beginning.
156
-
156
+
157
157
save_steps: list of int
158
158
save at all the steps given in this list.
159
159
if this is given, it ignores the save_interval.
160
-
160
+
161
161
when_nan: list of str representing name of tensor
162
162
saves the tensors to which this saveConfig is attached
163
163
whenever any of the tensors in this list become nan or infinite.
164
164
This means that if your save_interval is set to 10, and 'loss' is in when_nan
165
165
your tensors will be saved whenever save_interval is multiple of 10 as well as
166
166
whenever loss becomes nan or infinite.
167
- ```
167
+ ```
168
168
169
169
The default value of _ save\_ interval_ is 100. The TornasoleHook that uses a default SaveConfig object will store the tensors every 100th step.
170
170
171
171
172
- ### ReductionConfig
172
+ ### ReductionConfig
173
173
ReductionConfig allows the saving of certain reductions of tensors instead
174
174
of saving the full tensor. The motivation here is to reduce the amount of data
175
175
saved, and increase the speed in cases where you don't need the full
176
176
tensor. The reduction operations which are computed in the training process
177
- and then saved.
178
- During analysis, these are available as reductions of the original tensor.
177
+ and then saved.
178
+ During analysis, these are available as reductions of the original tensor.
179
179
Please note that using reduction config means that you will not have
180
180
the full tensor available during analysis, so this can restrict what you can do with the tensor saved.
181
- The hook takes a ReductionConfig object which is applied as default to all tensors included.
182
- A collection can also have its own ReductionConfig object which is applied
181
+ The hook takes a ReductionConfig object which is applied as default to all tensors included.
182
+ A collection can also have its own ReductionConfig object which is applied
183
183
to the tensors belonging to that collection.
184
184
185
185
```
186
-
186
+
187
187
Attributes
188
188
----------
189
-
189
+
190
190
reductions: list of str
191
191
takes list of names of reductions to be computed.
192
192
should be one of 'min', 'max', 'median', 'mean', 'std', 'sum', 'prod'
193
-
193
+
194
194
abs_reductions: list of str
195
195
takes list of names of reductions to be computed after converting the tensor
196
196
to abs(tensor) i.e. reductions are applied on the absolute values of tensor.
197
197
should be one of 'min', 'max', 'median', 'mean', 'std', 'sum', 'prod'
198
-
198
+
199
199
norms: list of str
200
200
takes names of norms to be computed of the tensor.
201
201
should be one of 'l1', 'l2'
202
-
202
+
203
203
abs_norms: list of str
204
204
takes names of norms to be computed of the tensor after taking absolute value
205
205
should be one of 'l1', 'l2'
0 commit comments