|
| 1 | + |
| 2 | +# Common API |
| 3 | +These objects exist across all frameworks. |
| 4 | +- [SageMaker Zero-Code-Change vs. Python API](#sagemaker) |
| 5 | +- [Creating a Hook](#creating-a-hook) |
| 6 | + - [Hook from SageMaker](#hook-from-sagemaker) |
| 7 | + - [Hook from Python](#hook-from-python) |
| 8 | +- [Modes](#modes) |
| 9 | +- [Collection](#collection) |
| 10 | +- [SaveConfig](#saveconfig) |
| 11 | +- [ReductionConfig](#reductionconfig) |
| 12 | + |
| 13 | +--- |
| 14 | +## SageMaker Zero-Code-Change vs. Python API |
| 15 | + |
| 16 | +There are two ways to use sagemaker-debugger: SageMaker Zero-Code-Change or Python API. |
| 17 | + |
| 18 | +SageMaker Zero-Code-Change will use a custom framework fork to automatically instantiate the hook, register tensors, and create collections. |
| 19 | +All you need to do is decide which built-in rules to use. Further documentation is available on [AWS Docs](https://link.com). |
| 20 | +```python |
| 21 | +import sagemaker |
| 22 | +from sagemaker.debugger import rule_configs, Rule, CollectionConfig, DebuggerHookConfig, TensorBoardOutputConfig |
| 23 | + |
| 24 | +hook_config = DebuggerHookConfig( |
| 25 | + s3_output_path = args.s3_path, |
| 26 | + container_local_path = args.local_path, |
| 27 | + hook_parameters = { |
| 28 | + "save_steps": "0,20,40,60,80" |
| 29 | + }, |
| 30 | + collection_configs = { |
| 31 | + { "CollectionName": "weights" }, |
| 32 | + { "CollectionName": "biases" }, |
| 33 | + }, |
| 34 | +) |
| 35 | + |
| 36 | +rule = Rule.sagemaker( |
| 37 | + rule_configs.exploding_tensor(), |
| 38 | + rule_parameters={ |
| 39 | + "tensor_regex": ".*" |
| 40 | + }, |
| 41 | + collections_to_save=[ |
| 42 | + CollectionConfig(name="weights"), |
| 43 | + CollectionConfig(name="losses"), |
| 44 | + ], |
| 45 | +) |
| 46 | + |
| 47 | +sagemaker_simple_estimator = sagemaker.tensorflow.TensorFlow( |
| 48 | + entry_point="script.py", |
| 49 | + role=sagemaker.get_execution_role(), |
| 50 | + framework_version="1.15", |
| 51 | + py_version="py3", |
| 52 | + rules=[rule], |
| 53 | + debugger_hook_config=hook_config, |
| 54 | +) |
| 55 | + |
| 56 | +sagemaker_simple_estimator.fit() |
| 57 | +``` |
| 58 | + |
| 59 | +The Python API requires more configuration but is also more flexible. You must write your own custom rules |
| 60 | +instead of using SageMaker's built-in rules, but you can use it with a custom container in SageMaker or in your own |
| 61 | +environment. It is described further below. |
| 62 | + |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## Creating a Hook |
| 67 | + |
| 68 | +### Hook from SageMaker |
| 69 | +If you create a SageMaker job and specify the hook configuration in the SageMaker Estimator API |
| 70 | +as described in [AWS Docs](https://link.com), |
| 71 | +the a JSON file will be automatically written. You can create a hook from this file by calling |
| 72 | +```python |
| 73 | +hook = smd.{hook_class}.create_from_json_file() |
| 74 | +``` |
| 75 | +with no arguments and then use the hook Python API in your script. `hook_class` will be `Hook` for PyTorch, MXNet, and XGBoost. It will be one of `KerasHook`, `SessionHook`, `EstimatorHook` for TensorFlow. |
| 76 | + |
| 77 | +### Hook from Python |
| 78 | +See the framework-specific pages for more details. |
| 79 | +* [TensorFlow](https://link.com) |
| 80 | +* [PyTorch](https://link.com) |
| 81 | +* [MXNet](https://link.com) |
| 82 | +* [XGBoost](https://link.com) |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## Modes |
| 87 | +Used to signify which part of training you're in, similar to Keras modes. `GLOBAL` mode is used as |
| 88 | +a default. Choose from |
| 89 | +```python |
| 90 | +smd.modes.TRAIN |
| 91 | +smd.modes.EVAL |
| 92 | +smd.modes.PREDICT |
| 93 | +smd.modes.GLOBAL |
| 94 | +``` |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Collection |
| 99 | + |
| 100 | +The Collection object groups tensors such as "losses", "weights", "biases", or "gradients". |
| 101 | +A collection has its own list of tensors, include/exclude regex patterns, reduction config and save config. |
| 102 | +This allows setting of different save and reduction configs for different tensors. |
| 103 | +These collections are then also available during analysis. |
| 104 | + |
| 105 | +You can choose which of these builtin collections (or define your own) to save in the hook's `include_collections` parameter. By default, only a few collections are saved. |
| 106 | + |
| 107 | +| Framework | include_collections (default) | |
| 108 | +|---|---| |
| 109 | +| `TensorFlow` | METRICS, LOSSES, SEARCHABLE_SCALARS | |
| 110 | +| `PyTorch` | LOSSES, SCALARS | |
| 111 | +| `MXNet` | LOSSES, SCALARS | |
| 112 | +| `XGBoost` | METRICS | |
| 113 | + |
| 114 | +Each framework has pre-defined settings for certain collections. For example, TensorFlow's KerasHook |
| 115 | +will automatically place weights into the `smd.CollectionKeys.WEIGHTS` collection. PyTorch uses the regex |
| 116 | +`"^(?!gradient).*weight` to automatically place tensors in the weights collection. |
| 117 | + |
| 118 | +| CollectionKey | Frameworks | Description | |
| 119 | +|---|---|---| |
| 120 | +| `ALL` | all | Saves all tensors. | |
| 121 | +| `DEFAULT` | all | ??? | |
| 122 | +| `WEIGHTS` | TensorFlow, PyTorch, MXNet | Matches all weights tensors. | |
| 123 | +| `BIASES` | TensorFlow, PyTorch, MXNet | Matches all biases tensors. | |
| 124 | +| `GRADIENTS` | TensorFlow, PyTorch, MXNet | Matches all gradients tensors. In TensorFlow non-DLC, must use `hook.wrap_optimizer()`. | |
| 125 | +| `LOSSES` | TensorFlow, PyTorch, MXNet | Matches all loss tensors. | |
| 126 | +| `SCALARS` | TensorFlow, PyTorch, MXNet | Matches all scalar tensors, such as loss or accuracy. | |
| 127 | +| `METRICS` | TensorFlow, XGBoost | ??? | |
| 128 | +| `INPUTS` | TensorFlow | Matches all inputs to a layer (outputs of the previous layer). | |
| 129 | +| `OUTPUTS` | TensorFlow | Matches all outputs of a layer (inputs of the following layer). | |
| 130 | +| `SEARCHABLE_SCALARS` | TensorFlow | Scalars that will go to SageMaker Metrics. | |
| 131 | +| `OPTIMIZER_VARIABLES` | TensorFlow | Matches all optimizer variables. | |
| 132 | +| `HYPERPARAMETERS` | XGBoost | ... | |
| 133 | +| `PREDICTIONS` | XGBoost | ... | |
| 134 | +| `LABELS` | XGBoost | ... | |
| 135 | +| `FEATURE_IMPORTANCE` | XGBoost | ... | |
| 136 | +| `AVERAGE_SHAP` | XGBoost | ... | |
| 137 | +| `FULL_SHAP` | XGBoost | ... | |
| 138 | +| `TREES` | XGBoost | ... | |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | +```python |
| 144 | +coll = smd.Collection( |
| 145 | + name, |
| 146 | + include_regex = None, |
| 147 | + tensor_names = None, |
| 148 | + reduction_config = None, |
| 149 | + save_config = None, |
| 150 | + save_histogram = True, |
| 151 | +) |
| 152 | +``` |
| 153 | +`name` (str): Used to identify the collection.\ |
| 154 | +`include_regex` (list[str]): The regexes to match tensor names for the collection.\ |
| 155 | +`tensor_names` (list[str]): A list of tensor names to include.\ |
| 156 | +`reduction_config`: (ReductionConfig object): Which reductions to store in the collection.\ |
| 157 | +`save_config` (SaveConfig object): Settings for how often to save the collection.\ |
| 158 | +`save_histogram` (bool): Whether to save histogram data for the collection. Only used if tensorboard support is enabled. Not computed for scalar collections such as losses. |
| 159 | + |
| 160 | +### Accessing a Collection |
| 161 | + |
| 162 | +| Function | Behavior | |
| 163 | +|---|---| |
| 164 | +| ```hook.get_collection(collection_name)``` | Returns the collection with the given name. Creates the collection with default settings if it doesn't already exist. | |
| 165 | +| ```hook.get_collections()``` | Returns all collections as a dictionary with the keys being names of the collections. | |
| 166 | +| ```hook.add_to_collection(collection_name, args)``` | Equivalent to calling `coll.add(args)` on the collection with name `collection_name`. | |
| 167 | + |
| 168 | +### Properties of a Collection |
| 169 | +| Property | Description | |
| 170 | +|---|---| |
| 171 | +| `tensor_names` | Get or set list of tensor names as strings. | |
| 172 | +| `include_regex` | Get or set list of regexes to include. | |
| 173 | +| `reduction_config` | Get or set the ReductionConfig object. | |
| 174 | +| `save_config` | Get or set the SaveConfig object. | |
| 175 | + |
| 176 | + |
| 177 | +### Methods on a Collection |
| 178 | + |
| 179 | +| Method | Behavior | |
| 180 | +|---|---| |
| 181 | +| ```coll.include(regex)``` | Takes a regex string or a list of regex strings to match tensors to include in the collection. | |
| 182 | +| ```coll.add(tensor)``` | **(TensorFlow only)** Takes an instance or list or set of tf.Tensor/tf.Variable/tf.MirroredVariable/tf.Operation to add to the collection. | |
| 183 | +| ```coll.add_keras_layer(layer, inputs=False, outputs=True)``` | **(tf.keras only)** Takes an instance of a tf.keras layer and logs input/output tensors for that module. By default, only outputs are saved. | |
| 184 | +| ```coll.add_module_tensors(module, inputs=False, outputs=True)``` | **(PyTorch only)** Takes an instance of a PyTorch module and logs input/output tensors for that module. By default, only outputs are saved. | |
| 185 | +| ```coll.add_block_tensors(block, inputs=False, outputs=True)``` | **(MXNet only)** Takes an instance of a Gluon block,and logs input/output tensors for that module. By default, only outputs are saved. | |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## SaveConfig |
| 190 | +The SaveConfig class customizes the frequency of saving tensors. |
| 191 | +The hook takes a SaveConfig object which is applied as default to all tensors included. |
| 192 | +A collection can also have a SaveConfig object which is applied to the collection's tensors. |
| 193 | + |
| 194 | +SaveConfig also allows you to save tensors when certain tensors become nan. |
| 195 | +This list of tensors to watch for is taken as a list of strings representing names of tensors. |
| 196 | + |
| 197 | +```python |
| 198 | +save_config = smd.SaveConfig( |
| 199 | + mode_save_configs = None, |
| 200 | + save_interval = 100, |
| 201 | + start_step = 0, |
| 202 | + end_step = None, |
| 203 | + save_steps = None, |
| 204 | +) |
| 205 | +``` |
| 206 | +`mode_save_configs` (dict): Used for advanced cases; see details below.\ |
| 207 | +`save_interval` (int): How often, in steps, to save tensors. Defaults to 100. \ |
| 208 | +`start_step` (int): When to start saving tensors.\ |
| 209 | +`end_step` (int): When to stop saving tensors, exclusive.\ |
| 210 | +`save_steps` (list[int]): Specific steps to save tensors at. Union with all other parameters. |
| 211 | + |
| 212 | +For example, |
| 213 | + |
| 214 | +`SaveConfig()` will save at steps [0, 100, ...].\ |
| 215 | +`SaveConfig(save_interval=1)` will save at steps [0, 1, ...]\ |
| 216 | +`SaveConfig(save_interval=100, end_step=200)` will save at steps [0, 200].\ |
| 217 | +`SaveConfig(save_interval=100, end_step=201)` will save at steps [0, 100, 200].\ |
| 218 | +`SaveConfig(save_interval=100, start_step=150)` will save at steps [200, 300, ...].\ |
| 219 | +`SaveConfig(save_steps=[3, 7])` will save at steps [3, 7]. |
| 220 | + |
| 221 | +There is also a more advanced use case, where you specify a different SaveConfig for each mode. |
| 222 | +It is best understood through an example: |
| 223 | +```python |
| 224 | +SaveConfig(mode_save_configs={ |
| 225 | + smd.modes.TRAIN: smd.SaveConfigMode(save_interval=1), |
| 226 | + smd.modes.EVAL: smd.SaveConfigMode(save_interval=2), |
| 227 | + smd.modes.PREDICT: smd.SaveConfigMode(save_interval=3), |
| 228 | + smd.modes.GLOBAL: smd.SaveConfigMode(save_interval=4) |
| 229 | +}) |
| 230 | +``` |
| 231 | +Essentially, create a dictionary mapping modes to SaveConfigMode objects. The SaveConfigMode objects |
| 232 | +take the same four parameters (save_interval, start_step, end_step, save_steps) as the main object. |
| 233 | +Any mode not specified will default to the default configuration. If a mode is provided but not all |
| 234 | +params are specified, we use the default values for non-specified parameters. |
| 235 | + |
| 236 | +--- |
| 237 | + |
| 238 | +## ReductionConfig |
| 239 | +ReductionConfig allows the saving of certain reductions of tensors instead |
| 240 | +of saving the full tensor. The motivation here is to reduce the amount of data |
| 241 | +saved, and increase the speed in cases where you don't need the full |
| 242 | +tensor. The reduction operations which are computed in the training process |
| 243 | +and then saved. |
| 244 | + |
| 245 | +During analysis, these are available as reductions of the original tensor. |
| 246 | +Please note that using reduction config means that you will not have |
| 247 | +the full tensor available during analysis, so this can restrict what you can do with the tensor saved. |
| 248 | +The hook takes a ReductionConfig object which is applied as default to all tensors included. |
| 249 | +A collection can also have its own ReductionConfig object which is applied |
| 250 | +to the tensors belonging to that collection. |
| 251 | + |
| 252 | +```python |
| 253 | +reduction_config = smd.ReductionConfig( |
| 254 | + reductions = None, |
| 255 | + abs_reductions = None, |
| 256 | + norms = None, |
| 257 | + abs_norms = None, |
| 258 | + save_raw_tensor = False, |
| 259 | +) |
| 260 | +``` |
| 261 | +`reductions` (list[str]): Takes names of reductions, choosing from "min", "max", "median", "mean", "std", "variance", "sum", "prod".\ |
| 262 | +`abs_reductions` (list[str]): Same as reductions, except the reduction will be computed on the absolute value of the tensor.\ |
| 263 | +`norms` (list[str]): Takes names of norms to compute, choosing from "l1", "l2".\ |
| 264 | +`abs_norms` (list[str]): Same as norms, except the norm will be computed on the absolute value of the tensor.\ |
| 265 | +`save_raw_tensor` (bool): Saves the tensor directly, in addition to other desired reductions. |
| 266 | + |
| 267 | +For example, |
| 268 | + |
| 269 | +`ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1'])` |
| 270 | + |
| 271 | +will return the standard deviation and variance, the mean of the absolute value, and the l1 norm. |
0 commit comments