Skip to content

Commit 5a8fbc0

Browse files
vandanavkjarednielsen
authored andcommitted
Env var description (aws#55)
* Env var description * Remove event file retry limit * Add USE_SMDEBUG * Edit to tensorboard env var * Clarify about smdebug config * Fix formatting
1 parent 245784b commit 5a8fbc0

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

docs/api.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ These objects exist across all frameworks.
88
- [Collection](#collection)
99
- [SaveConfig](#saveconfig)
1010
- [ReductionConfig](#reductionconfig)
11+
- [Environment Variables](#environment-variables)
1112

1213
## Glossary
1314

@@ -244,3 +245,107 @@ For example,
244245
`ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1'])`
245246

246247
will return the standard deviation and variance, the mean of the absolute value, and the l1 norm.
248+
249+
250+
---
251+
252+
## Environment Variables
253+
254+
#### `USE_SMDEBUG`:
255+
256+
Setting this variable to 0 turns off the hook that is created by default. This can be used
257+
if the user doesn't want to use SageMaker Debugger.
258+
259+
#### `SMDEBUG_CONFIG_FILE_PATH`:
260+
261+
Contains the path to the JSON file that describes the smdebug hook.
262+
263+
At the minimum, the JSON config should contain the path where smdebug should output tensors.
264+
Example:
265+
266+
`{ "LocalPath": "/my/smdebug_hook/path" }`
267+
268+
In SageMaker environment, this path is set to point to a pre-defined location containing a valid JSON.
269+
In non-SageMaker environment, SageMaker-Debugger is not used if this environment variable is not set and
270+
a hook is not created manually.
271+
272+
Sample JSON from which a hook can be created:
273+
```json
274+
{
275+
"LocalPath": "/my/smdebug_hook/path",
276+
"HookParameters": {
277+
"save_all": false,
278+
"include_regex": "regex1,regex2",
279+
"save_interval": "100",
280+
"save_steps": "1,2,3,4",
281+
"start_step": "1",
282+
"end_step": "1000000",
283+
"reductions": "min,max,mean"
284+
},
285+
"CollectionConfigurations": [
286+
{
287+
"CollectionName": "collection_obj_name1",
288+
"CollectionParameters": {
289+
"include_regex": "regexe5*",
290+
"save_interval": 100,
291+
"save_steps": "1,2,3",
292+
"start_step": 1,
293+
"reductions": "min"
294+
}
295+
},
296+
]
297+
}
298+
299+
```
300+
301+
#### `TENSORBOARD_CONFIG_FILE_PATH`:
302+
303+
Contains the path to the JSON file that specifies where TensorBoard artifacts need to
304+
be placed.
305+
306+
Sample JSON file:
307+
308+
`{ "LocalPath": "/my/tensorboard/path" }`
309+
310+
In SageMaker environment, the presence of this JSON is necessary to log any Tensorboard artifact.
311+
By default, this path is set to point to a pre-defined location in SageMaker.
312+
313+
tensorboard_dir can also be passed while creating the hook [Creating a hook](###Hook from Python) using the API or
314+
in the JSON specified in SMDEBUG_CONFIG_FILE_PATH. For this, export_tensorboard should be set to True.
315+
This option to set tensorboard_dir is available in both, SageMaker and non-SageMaker environments.
316+
317+
318+
#### `CHECKPOINT_CONFIG_FILE_PATH`:
319+
320+
Contains the path to the JSON file that specifies where training checkpoints need to
321+
be placed. This is used in the context of spot training.
322+
323+
Sample JSON file:
324+
325+
`{ "LocalPath": "/my/checkpoint/path" }`
326+
327+
In SageMaker environment, the presence of this JSON is necessary to save checkpoints.
328+
By default, this path is set to point to a pre-defined location in SageMaker.
329+
330+
331+
#### `SAGEMAKER_METRICS_DIRECTORY`:
332+
333+
Contains the path to the directory where metrics will be recorded for consumption by SageMaker Metrics.
334+
This is relevant only in SageMaker environment, where this variable points to a pre-defined location.
335+
336+
337+
#### `TRAINING_END_DELAY_REFRESH`:
338+
339+
During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
340+
directory contains collections, events, and index files. This environment variable
341+
specifies how many seconds to wait before refreshing the index files to check if training has ended
342+
and the tensor is available. By default value, this value is set to 1.
343+
344+
345+
#### `INCOMPLETE_STEP_WAIT_WINDOW`:
346+
347+
During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
348+
directory contains collections, events, and index files. A trial checks to see if a step
349+
specified in the smdebug hook has been completed. This environment variable
350+
specifies the maximum number of incomplete steps that the trial will wait for before marking
351+
half of them as complete. Default: 1000

0 commit comments

Comments
 (0)