@@ -8,6 +8,7 @@ These objects exist across all frameworks.
8
8
- [ Collection] ( #collection )
9
9
- [ SaveConfig] ( #saveconfig )
10
10
- [ ReductionConfig] ( #reductionconfig )
11
+ - [ Environment Variables] ( #environment-variables )
11
12
12
13
## Glossary
13
14
@@ -244,3 +245,107 @@ For example,
244
245
` ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1']) `
245
246
246
247
will return the standard deviation and variance, the mean of the absolute value, and the l1 norm.
248
+
249
+
250
+ ---
251
+
252
+ ## Environment Variables
253
+
254
+ #### ` USE_SMDEBUG ` :
255
+
256
+ Setting this variable to 0 turns off the hook that is created by default. This can be used
257
+ if the user doesn't want to use SageMaker Debugger.
258
+
259
+ #### ` SMDEBUG_CONFIG_FILE_PATH ` :
260
+
261
+ Contains the path to the JSON file that describes the smdebug hook.
262
+
263
+ At the minimum, the JSON config should contain the path where smdebug should output tensors.
264
+ Example:
265
+
266
+ ` { "LocalPath": "/my/smdebug_hook/path" } `
267
+
268
+ In SageMaker environment, this path is set to point to a pre-defined location containing a valid JSON.
269
+ In non-SageMaker environment, SageMaker-Debugger is not used if this environment variable is not set and
270
+ a hook is not created manually.
271
+
272
+ Sample JSON from which a hook can be created:
273
+ ``` json
274
+ {
275
+ "LocalPath" : " /my/smdebug_hook/path" ,
276
+ "HookParameters" : {
277
+ "save_all" : false ,
278
+ "include_regex" : " regex1,regex2" ,
279
+ "save_interval" : " 100" ,
280
+ "save_steps" : " 1,2,3,4" ,
281
+ "start_step" : " 1" ,
282
+ "end_step" : " 1000000" ,
283
+ "reductions" : " min,max,mean"
284
+ },
285
+ "CollectionConfigurations" : [
286
+ {
287
+ "CollectionName" : " collection_obj_name1" ,
288
+ "CollectionParameters" : {
289
+ "include_regex" : " regexe5*" ,
290
+ "save_interval" : 100 ,
291
+ "save_steps" : " 1,2,3" ,
292
+ "start_step" : 1 ,
293
+ "reductions" : " min"
294
+ }
295
+ },
296
+ ]
297
+ }
298
+
299
+ ```
300
+
301
+ #### ` TENSORBOARD_CONFIG_FILE_PATH ` :
302
+
303
+ Contains the path to the JSON file that specifies where TensorBoard artifacts need to
304
+ be placed.
305
+
306
+ Sample JSON file:
307
+
308
+ ` { "LocalPath": "/my/tensorboard/path" } `
309
+
310
+ In SageMaker environment, the presence of this JSON is necessary to log any Tensorboard artifact.
311
+ By default, this path is set to point to a pre-defined location in SageMaker.
312
+
313
+ tensorboard_dir can also be passed while creating the hook [ Creating a hook] (###Hook from Python) using the API or
314
+ in the JSON specified in SMDEBUG_CONFIG_FILE_PATH. For this, export_tensorboard should be set to True.
315
+ This option to set tensorboard_dir is available in both, SageMaker and non-SageMaker environments.
316
+
317
+
318
+ #### ` CHECKPOINT_CONFIG_FILE_PATH ` :
319
+
320
+ Contains the path to the JSON file that specifies where training checkpoints need to
321
+ be placed. This is used in the context of spot training.
322
+
323
+ Sample JSON file:
324
+
325
+ ` { "LocalPath": "/my/checkpoint/path" } `
326
+
327
+ In SageMaker environment, the presence of this JSON is necessary to save checkpoints.
328
+ By default, this path is set to point to a pre-defined location in SageMaker.
329
+
330
+
331
+ #### ` SAGEMAKER_METRICS_DIRECTORY ` :
332
+
333
+ Contains the path to the directory where metrics will be recorded for consumption by SageMaker Metrics.
334
+ This is relevant only in SageMaker environment, where this variable points to a pre-defined location.
335
+
336
+
337
+ #### ` TRAINING_END_DELAY_REFRESH ` :
338
+
339
+ During analysis, a [ trial] ( analysis.md ) is created to query for tensors from a specified directory. This
340
+ directory contains collections, events, and index files. This environment variable
341
+ specifies how many seconds to wait before refreshing the index files to check if training has ended
342
+ and the tensor is available. By default value, this value is set to 1.
343
+
344
+
345
+ #### ` INCOMPLETE_STEP_WAIT_WINDOW ` :
346
+
347
+ During analysis, a [ trial] ( analysis.md ) is created to query for tensors from a specified directory. This
348
+ directory contains collections, events, and index files. A trial checks to see if a step
349
+ specified in the smdebug hook has been completed. This environment variable
350
+ specifies the maximum number of incomplete steps that the trial will wait for before marking
351
+ half of them as complete. Default: 1000
0 commit comments