|
81 | 81 | "\n",
|
82 | 82 | "### Instantiate and initialize tornasole hook\n",
|
83 | 83 | "\n",
|
84 |
| - "**NOTE: In order to enable Tornasole functionality while running the script in SageMaker, the hook must be initialized with 'out_dir = /opt/ml/output/tensors'.**\n", |
85 |
| - "\n", |
86 | 84 | "```\n",
|
87 | 85 | " # Create SaveConfig that instructs engine to log graph tensors every 10 steps.\n",
|
88 | 86 | " save_config = SaveConfig(save_interval=10)\n",
|
89 | 87 | " # Create a hook that logs tensors of weights, biases and gradients while training the model.\n",
|
90 |
| - " tornasole_path = '/opt/ml/output/tensors'\n", |
91 |
| - " hook = TornasoleHook(out_dir=output_s3_uri, save_config=save_config)\n", |
| 88 | + " hook = TornasoleHook(save_config=save_config)\n", |
92 | 89 | "```\n",
|
93 | 90 | "\n",
|
94 | 91 | "### Register Tornasole hook to the model before starting of the training.\n",
|
|
105 | 102 | "```\n",
|
106 | 103 | "\n",
|
107 | 104 | "#### Set the mode\n",
|
108 |
| - "Set the mode you are running the job in. This helps you group steps by mode, \n", |
109 |
| - "for easier analysis. \n", |
110 |
| - "If you do not specify this, it saves steps under a `default` mode.\n", |
111 |
| - "\n", |
| 105 | + "Tornasole has the concept of modes (TRAIN, EVAL, PREDICT) to separate out different modes of the jobs.\n", |
| 106 | + "Set the mode you are running in your job. Every time the mode changes in your job, please set the current mode. This helps you group steps by mode, for easier analysis. Setting the mode is optional but recommended. If you do not specify this, Tornasole saves all steps under a `GLOBAL` mode. \n", |
112 | 107 | "```\n",
|
113 | 108 | "hook.set_mode(ts.modes.TRAIN)\n",
|
114 | 109 | "```\n",
|
115 | 110 | "\n",
|
116 |
| - "Refer [DeveloperGuide_MXNet.md](../DeveloperGuide_MXNet.md) for more details on the APIs Tornasole provides to help you save tensors.\n", |
| 111 | + "Refer [DeveloperGuide_MXNet.md](../../DeveloperGuide_MXNet.md) for more details on the APIs Tornasole provides to help you save tensors.\n", |
117 | 112 | "\n",
|
118 | 113 | "\n",
|
119 | 114 | "## SageMaker with Tornasole\n",
|
|
250 | 245 | "metadata": {},
|
251 | 246 | "outputs": [],
|
252 | 247 | "source": [
|
253 |
| - "import boto3\n", |
254 | 248 | "import sagemaker\n",
|
255 | 249 | "from sagemaker.mxnet import MXNet\n",
|
256 | 250 | "\n",
|
|
270 | 264 | "\n",
|
271 | 265 | "The 'entry_point_script' points to the MXNet training script that has the TornasoleHook integrated.\n",
|
272 | 266 | "\n",
|
273 |
| - "The 'hyperparameters' are the parameters that will be passed to the training script. Please note that the **tornasole_path** parameter is set to be **/opt/ml/output/tensors**. This is **mandatory** when running the training script with SageMaker and Tornasole.\n", |
| 267 | + "The 'hyperparameters' are the parameters that will be passed to the training script.\n", |
274 | 268 | "\n"
|
275 | 269 | ]
|
276 | 270 | },
|
|
281 | 275 | "outputs": [],
|
282 | 276 | "source": [
|
283 | 277 | "entry_point_script = '../scripts/mnist_gluon_basic_hook_demo.py'\n",
|
284 |
| - "hyperparameters = {'tornasole_path' : '/opt/ml/output/tensors', 'random_seed' : True, 'num_steps': 6}" |
| 278 | + "hyperparameters = {'random_seed' : True, 'num_steps': 6}" |
285 | 279 | ]
|
286 | 280 | },
|
287 | 281 | {
|
|
377 | 371 | "outputs": [],
|
378 | 372 | "source": [
|
379 | 373 | "entry_point_script = '../scripts/mnist_gluon_vg_demo.py'\n",
|
380 |
| - "bad_hyperparameters = {'tornasole_path' : '/opt/ml/output/tensors', 'random_seed' : True, 'num_steps': 33, 'tornasole_frequency' : 30}" |
| 374 | + "bad_hyperparameters = {'random_seed' : True, 'num_steps': 33, 'tornasole_frequency' : 30}" |
381 | 375 | ]
|
382 | 376 | },
|
383 | 377 | {
|
|
0 commit comments