Skip to content

Commit cee43fb

Browse files
authored
Implement common hooks in core (aws#212)
* Implement common hooks in core * combine abstract and static methods * rename method * fix variable name * fix test * fix test * Fix loss decreasing test * remove data * Reduce length of tests, also removed unnecessary s3 test * Fix test * Delete test.py * Address review by Edward * Call super constructor immediately * Reduce logging in CI and fix tests * Fix bug and reduce verbosity in CI * Trying out CI with loss test disabled * Update hook.py * Disable unreliable test * use new collection file name * fix args for method * create coll by default * fix pytorch tests * Add xgboost tests * remove exception catching * remove extra import * Fix all names of collections to use CollectionKeys, and create collection in a safe way for when json parser creates collections * Cleanup * Rename input, and add xgboost to CI evn * fix bug of tuple name of dict * Add couple of docstrings * Address renamed cleanup method * Rename bias to biases * cleanup removal of dir code * Remove save manager * docs for loss not decreasing * rename reduction method * pass out dir to test * fix test * use coll manager to add tensors * change a loop and some imports
1 parent d669c9e commit cee43fb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+861
-999
lines changed

docs/mxnet/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ tm.get_collection("ReluActivation").set_reduction_config(ReductionConfig(reducti
269269
tm.get_collection("flatten").include(["flatten*"])
270270
tm.get_collection("flatten").set_save_config(SaveConfig(save_steps=[4,5,6]))
271271
tm.get_collection("flatten").set_reduction_config(ReductionConfig(norms=["l1"], abs_norms=["l2"]))
272-
hook = TornasoleHook(out_dir=out_dir, include_collections=['weights', 'bias','gradients',
272+
hook = TornasoleHook(out_dir=out_dir, include_collections=['weights', 'biases','gradients',
273273
'default', 'ReluActivation', 'flatten'])
274274
```
275275

@@ -280,7 +280,7 @@ Refer [API](api.md) for a list of the reductions available as well as examples.
280280

281281
There are different ways to save tensors when using Tornasole.
282282
Tornasole provides easy ways to save certain standard tensors by way of default collections (a Collection represents a group of tensors).
283-
Examples of such collections are 'weights', 'gradients', 'bias' and 'default'.
283+
Examples of such collections are 'weights', 'gradients', 'biases' and 'default'.
284284
Besides the tensors in above default collections, you can save tensors by name or regex patterns on those names.
285285
Users can also specify a certain block in the model to save the inputs and outputs of that block.
286286
This section will take you through these ways in more detail.
@@ -289,7 +289,7 @@ This section will take you through these ways in more detail.
289289
The TornasoleHook API supports _include\_regex_ parameter. The users can specify a regex pattern with this pattern. The TornasoleHook will store the tensors that match with the specified regex pattern. With this approach, users can store the tensors without explicitly creating a Collection object. The specified regex pattern will be associated with 'default' Collection and the SaveConfig object that is associated with the 'default' collection.
290290

291291
#### Default Collections
292-
Currently, the tornasole\_mxnet hook creates Collection objects for 'weights', 'gradients', 'bias' and 'default'. These collections contain the regex pattern that match with tensors of type weights, gradient and bias. The regex pattern for the 'default' collection is set when user specifies _include\_regex_ with TornasoleHook or sets the _SaveAll=True_. These collections use the SaveConfig parameter provided with the TornasoleHook initialization. The TornasoleHook will store the related tensors, if user does not specify any special collection with _include\_collections_ parameter. If user specifies a collection with _include\_collections_ the above default collections will not be in effect.
292+
Currently, the tornasole\_mxnet hook creates Collection objects for 'weights', 'gradients', 'biases' and 'default'. These collections contain the regex pattern that match with tensors of type weights, gradient and bias. The regex pattern for the 'default' collection is set when user specifies _include\_regex_ with TornasoleHook or sets the _SaveAll=True_. These collections use the SaveConfig parameter provided with the TornasoleHook initialization. The TornasoleHook will store the related tensors, if user does not specify any special collection with _include\_collections_ parameter. If user specifies a collection with _include\_collections_ the above default collections will not be in effect.
293293

294294
#### Custom Collections
295295
You can also create any other customized collection yourself.
@@ -424,7 +424,7 @@ def create_tornasole_hook(output_s3_uri, block):
424424
# In order to log the inputs and output of a model, we will create a collection as follows:
425425
tm.get_collection('TopBlock').add_block_tensors(block, inputs=True, outputs=True)
426426
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
427-
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=['weights', 'gradients', 'bias','TopBlock'])
427+
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=['weights', 'gradients', 'biases','TopBlock'])
428428
return hook
429429
```
430430

@@ -503,7 +503,7 @@ def create_tornasole_hook(output_s3_uri, block):
503503
tm.get_collection(block.name).add_block_tensors(block, inputs=True, outputs=True)
504504
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
505505
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=[
506-
'weights', 'gradients', 'bias', block.name])
506+
'weights', 'gradients', 'biases', block.name])
507507
return hook
508508
```
509509

docs/mxnet/api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ TornasoleHook is the entry point for Tornasole into your program.
5656
5757
include_collections: list of str
5858
takes as input the names of collections which should be saved.
59-
by default, ['weights','gradients', 'bias', 'default'] are passed to include_collections.
59+
by default, ['weights','gradients', 'biases', 'default'] are passed to include_collections.
6060
6161
save_all: bool
6262
a shortcut for saving all tensors in the model.
@@ -69,7 +69,7 @@ TornasoleHook is the entry point for Tornasole into your program.
6969
reduction_config=None,
7070
save_config=SaveConfig(save_interval=100),
7171
include_regex=None,
72-
include_collections=['weights', 'gradients', 'bias', 'default'],
72+
include_collections=['weights', 'gradients', 'biases', 'default'],
7373
save_all=False,
7474
):
7575
```

docs/pytorch/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -401,7 +401,7 @@ def create_tornasole_hook(output_dir, module):
401401
402402
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
403403
hook = TornasoleHook(out_dir=output_dir, save_config=SaveConfig(save_steps=[i * 10 for i in range(5)]),
404-
include_collections=['weights', 'gradients', 'bias','l_mod'])
404+
include_collections=['weights', 'gradients', 'biases','l_mod'])
405405
```
406406

407407
Here is how to register the above hook.

docs/pytorch/api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ A class used to represent the hook which gets attached to the
7373
reduction_config=None,
7474
save_config=default_save_config(),
7575
include_regex=None,
76-
include_collections=['weights', 'bias', 'gradients', 'default'],
76+
include_collections=['weights', 'biases', 'gradients', 'default'],
7777
save_all=False):
7878
```
7979
### Collection

docs/rules/FirstPartyRules.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ For this rule, users must specify either the `collection_names` or `tensor_regex
7373

7474
```
7575
from tornasole.rules.generic import AllZero
76-
collections = ['weights', 'bias']
76+
collections = ['weights', 'biases']
7777
tensor_regex = ['input*']
7878
allzero = AllZero(base_trial=trial_obj, collection_names=collections, tensor_regex=tensor_regex)
7979
```
@@ -133,6 +133,10 @@ for step 21 is compared with the loss for step 9. The next step where loss is ch
133133
since 10 steps after 21 is 31, and at 31 and 32 loss is not being saved.
134134
- `diff_percent`: float (default is 0.0) (between 0.0 and 100.0)
135135
The minimum difference in percentage that loss should be lower by. By default, the rule just checks if loss is going down. If you want to specify a stricter check that loss is going down fast enough, you might want to pass diff_percent.
136+
- `mode`: string
137+
The name of tornasole mode to query tensor values for rule checking.
138+
If this is not passed, the rule checks for eval mode, then training mode and then global mode in this order.
139+
136140

137141
```
138142
from tornasole.rules.generic import LossNotDecreasing

examples/mxnet/scripts/mnist_gluon_all_zero_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def create_tornasole_hook(output_s3_uri):
113113

114114
# Create a hook that logs weights, biases and gradients while training the model.
115115
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config,
116-
include_collections=['ReluActivation','weights', 'bias','gradients'])
116+
include_collections=['ReluActivation','weights', 'biases','gradients'])
117117
return hook
118118

119119

examples/mxnet/scripts/mnist_gluon_basic_hook_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def create_tornasole_hook(output_s3_uri):
111111

112112
# Create a hook that logs weights, biases and gradients while training the model.
113113
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=['weights', 'gradients',
114-
'bias'])
114+
'biases'])
115115
return hook
116116

117117

examples/mxnet/scripts/mnist_gluon_block_input_output_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def create_tornasole_hook(output_s3_uri, block):
119119

120120
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
121121
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=[
122-
'weights', 'gradients', 'bias', block.name])
122+
'weights', 'gradients', 'biases', block.name])
123123
return hook
124124

125125

examples/mxnet/scripts/mnist_gluon_model_input_output_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def create_tornasole_hook(output_s3_uri, block):
106106
tm.get_collection('TopBlock').add_block_tensors(block, inputs=True, outputs=True)
107107

108108
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
109-
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=['weights', 'gradients', 'bias','TopBlock'])
109+
hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config, include_collections=['weights', 'gradients', 'biases','TopBlock'])
110110
return hook
111111

112112

examples/mxnet/scripts/mnist_gluon_vg_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ def create_tornasole_hook(output_uri, tornasole_frequency):
114114

115115
# Create a hook that logs weights, biases and gradients while training the model.
116116
hook = TornasoleHook(out_dir=output_uri, save_config=save_config, include_collections=['weights', 'gradients',
117-
'bias'])
117+
'biases'])
118118
return hook
119119

120120

examples/mxnet/scripts/mnist_mxnet.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ def create_tornasole_hook(output_uri):
103103
# Create a hook that logs weights, biases and gradients while training the model.
104104
hook = TornasoleHook(out_dir=output_uri,
105105
save_config=save_config,
106-
include_collections=['weights', 'gradients', 'bias'])
106+
include_collections=['weights', 'gradients', 'biases'])
107107
return hook
108108

109109

examples/pytorch/scripts/pytorch_hook_demos.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def create_tornasole_hook(output_dir, module=None, hook_type='saveall'):
9191

9292
# Create a hook that logs weights, biases, gradients and inputs/outputs of model every 5 steps from steps 0-100 while training.
9393
hook = TornasoleHook(out_dir=output_dir, save_config=SaveConfig(save_steps=[i * 5 for i in range(20)]),
94-
include_collections=['weights', 'gradients', 'bias','l_mod'])
94+
include_collections=['weights', 'gradients', 'biases','l_mod'])
9595
elif hook_type == 'weights-bias-gradients':
9696
save_config = SaveConfig(save_steps=[i * 5 for i in range(20)])
9797
# Create a hook that logs ONLY weights, biases, and gradients every 5 steps (from steps 0-100) while training the model.

examples/pytorch/scripts/simple.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def create_tornasole_hook(output_dir, module=None, hook_type='saveall',
5858
hook = TornasoleHook(out_dir=output_dir,
5959
save_config=SaveConfig(save_steps=save_steps),
6060
include_collections=['weights', 'gradients',
61-
'bias', 'l_mod'])
61+
'biases', 'l_mod'])
6262
elif hook_type == 'weights-bias-gradients':
6363
save_config = SaveConfig(save_steps=save_steps)
6464
# Create a hook that logs ONLY weights, biases, and gradients

examples/tensorflow/scripts/mnist.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
parser.add_argument('--num_steps', type=int,
1818
help="Number of steps to train for. If this" \
1919
"is passed, it overrides num_epochs")
20+
parser.add_argument('--num_eval_steps', type=int,
21+
help="Number of steps to evaluate for. If this" \
22+
"is passed, it doesnt evaluate over the full eval set")
2023
parser.add_argument('--model_dir', type=str, default='/tmp/mnist_model')
2124
args = parser.parse_args()
2225

@@ -134,4 +137,6 @@ def cnn_model_fn(features, labels, mode):
134137
hooks=[hook])
135138

136139
hook.set_mode(ts.modes.EVAL)
137-
mnist_classifier.evaluate(input_fn=eval_input_fn, hooks=[hook])
140+
mnist_classifier.evaluate(input_fn=eval_input_fn,
141+
steps=args.num_eval_steps,
142+
hooks=[hook])

sagemaker-docs/DeveloperGuide_MXNet.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ tm.get_collection("ReluActivation").set_reduction_config(ReductionConfig(reducti
191191
tm.get_collection("flatten").include(["flatten*"])
192192
tm.get_collection("flatten").set_save_config(SaveConfig(save_steps=[4,5,6]))
193193
tm.get_collection("flatten").set_reduction_config(ReductionConfig(norms=["l1"], abs_norms=["l2"]))
194-
hook = TornasoleHook(out_dir=out_dir, include_collections=['weights', 'bias','gradients',
194+
hook = TornasoleHook(out_dir=out_dir, include_collections=['weights', 'biases','gradients',
195195
'default', 'ReluActivation', 'flatten'])
196196
```
197197

@@ -202,7 +202,7 @@ Refer [API](api.md) for a list of the reductions available as well as examples.
202202

203203
There are different ways to save tensors when using Tornasole.
204204
Tornasole provides easy ways to save certain standard tensors by way of default collections (a Collection represents a group of tensors).
205-
Examples of such collections are 'weights', 'gradients', 'bias' and 'default'.
205+
Examples of such collections are 'weights', 'gradients', 'biases' and 'default'.
206206
Besides the tensors in above default collections, you can save tensors by name or regex patterns on those names.
207207
Users can also specify a certain block in the model to save the inputs and outputs of that block.
208208
This section will take you through these ways in more detail.
@@ -211,7 +211,7 @@ This section will take you through these ways in more detail.
211211
The TornasoleHook API supports _include\_regex_ parameter. The users can specify a regex pattern with this pattern. The TornasoleHook will store the tensors that match with the specified regex pattern. With this approach, users can store the tensors without explicitly creating a Collection object. The specified regex pattern will be associated with 'default' Collection and the SaveConfig object that is associated with the 'default' collection.
212212

213213
#### Default Collections
214-
Currently, the tornasole\_mxnet hook creates Collection objects for 'weights', 'gradients', 'bias' and 'default'. These collections contain the regex pattern that match with tensors of type weights, gradient and bias. The regex pattern for the 'default' collection is set when user specifies _include\_regex_ with TornasoleHook or sets the _SaveAll=True_. These collections use the SaveConfig parameter provided with the TornasoleHook initialization. The TornasoleHook will store the related tensors, if user does not specify any special collection with _include\_collections_ parameter. If user specifies a collection with _include\_collections_ the above default collections will not be in effect.
214+
Currently, the tornasole\_mxnet hook creates Collection objects for 'weights', 'gradients', 'biases' and 'default'. These collections contain the regex pattern that match with tensors of type weights, gradient and bias. The regex pattern for the 'default' collection is set when user specifies _include\_regex_ with TornasoleHook or sets the _SaveAll=True_. These collections use the SaveConfig parameter provided with the TornasoleHook initialization. The TornasoleHook will store the related tensors, if user does not specify any special collection with _include\_collections_ parameter. If user specifies a collection with _include\_collections_ the above default collections will not be in effect.
215215

216216
#### Custom Collections
217217
You can also create any other customized collection yourself.

sagemaker-docs/DeveloperGuide_PyTorch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ def create_tornasole_hook(output_dir, module):
290290
291291
# Create a hook that logs weights, biases, gradients and inputs outputs of model while training.
292292
hook = TornasoleHook(out_dir=output_dir, save_config=SaveConfig(save_steps=[i * 10 for i in range(5)]),
293-
include_collections=['weights', 'gradients', 'bias','l_mod'])
293+
include_collections=['weights', 'gradients', 'biases','l_mod'])
294294
```
295295

296296
Here is how to register the above hook.

sagemaker-docs/FirstPartyRules.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,9 @@ for step 21 is compared with the loss for step 9. The next step where loss is ch
195195
since 10 steps after 21 is 31, and at 31 and 32 loss is not being saved.
196196
- `diff_percent`: float (default is 0.0) (between 0.0 and 100.0)
197197
The minimum difference in percentage that loss should be lower by. By default, the rule just checks if loss is going down. If you want to specify a stricter check that loss is going down fast enough, you might want to pass diff_percent.
198+
- `mode`: string
199+
The name of tornasole mode to query tensor values for rule checking.
200+
If this is not passed, the rule checks for eval mode, then training mode and then global mode in this order.
198201

199202
```python
200203
rules_specification = [

tests/analysis/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,9 @@
8989
- tensorflow
9090
- *Disable
9191
- [*tf_mnist,
92-
--lr 0.001 --tornasole_train_frequency 10 --random_seed True,
92+
--lr 0.001 --tornasole_train_frequency 10 --random_seed True --num_steps 100,
9393
*invoker,
94-
--rule_name lossnotdecreasing --flag True --end_step 1000
94+
--rule_name lossnotdecreasing --flag True --end_step 100
9595
]
9696
-
9797
- loss_not_decreasing/tf/false
@@ -100,7 +100,7 @@
100100
- [*simple,
101101
--lr 0.05 --scale 1 --steps 1009 --tornasole_frequency 13 --random_seed True,
102102
*invoker,
103-
--rule_name lossnotdecreasing --flag False --num_steps 100 --min_difference 12
103+
--rule_name lossnotdecreasing --flag False --end_step 50 --diff_percent 50
104104
]
105105

106106
# test cases for mxnet

tests/core/test_collections.py

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
COLLECTIONS_FILE_NAME
44
from tornasole.core.reduction_config import ReductionConfig
55
from tornasole.core.save_config import SaveConfig, SaveConfigMode
6-
from tornasole.core.save_manager import SaveManager
76
from tornasole.core.modes import ModeKeys
8-
7+
from tornasole.pytorch.hook import TornasoleHook
8+
import datetime
99

1010
def test_export_load():
1111
# with none as save config
@@ -71,12 +71,15 @@ def test_collection_defaults_to_hook_config():
7171
cm.create_collection('foo')
7272
cm.get('foo').set_save_config({ModeKeys.EVAL: SaveConfigMode(save_interval=20)})
7373

74-
sm = SaveManager(
75-
collection_manager=cm,
76-
include_collections_names=['foo'],
77-
default_reduction_config=ReductionConfig(),
78-
default_save_config={ModeKeys.TRAIN: SaveConfigMode(save_interval=10)},
79-
)
74+
75+
hook = TornasoleHook(
76+
out_dir='/tmp/test_collections/' + str(datetime.datetime.now()),
77+
save_config={ModeKeys.TRAIN: SaveConfigMode(save_interval=10)},
78+
include_collections=['foo'],
79+
reduction_config=ReductionConfig(save_raw_tensor=True))
80+
hook.collection_manager = cm
8081
assert cm.get('foo').save_config.mode_save_configs[ModeKeys.TRAIN] is None
81-
sm.prepare()
82-
assert cm.get('foo').save_config.mode_save_configs[ModeKeys.TRAIN].save_interval == 10
82+
assert cm.get('foo').reduction_config is None
83+
hook._prepare_collections()
84+
assert cm.get('foo').save_config.mode_save_configs[ModeKeys.TRAIN].save_interval == 10
85+
assert cm.get('foo').reduction_config.save_raw_tensor is True

tests/mxnet/mnist_gluon_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,4 +120,4 @@ def run_mnist_gluon_model(hook=None, hybridize=False, set_modes=False, register_
120120
valid_acc/len(valid_data), time.time()-tic))
121121

122122
# for tests we have to call cleanup ourselves as destructor won't be called now
123-
# hook.cleanup()
123+
# hook._cleanup()

tests/mxnet/test_hook_all_zero.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ def test_hook_all_zero(hook=None, out_dir=None):
1818
run_id = 'trial_' + datetime.now().strftime('%Y%m%d-%H%M%S%f')
1919
out_dir = './newlogsRunTest/' + run_id
2020
print("Registering the hook with out_dir {0}".format(out_dir))
21-
hook = t_hook(out_dir=out_dir, save_config=save_config, include_collections=['ReluActivation','weights', 'bias','gradients'])
21+
hook = t_hook(out_dir=out_dir, save_config=save_config, include_collections=['ReluActivation','weights', 'biases','gradients'])
2222
run_mnist_gluon_model(hook=hook, num_steps_train=10, num_steps_eval=10, make_input_zero=True)
2323

2424

0 commit comments

Comments
 (0)