atqy
diff --git a/‎bin/sagemaker-containers/tag_as_latest.sh
Lines changed: 2 additions & 0 deletions b/‎bin/sagemaker-containers/tag_as_latest.sh
Lines changed: 2 additions & 0 deletions
diff --git a/‎bin/upload_for_sagemaker.sh
Lines changed: 3 additions & 2 deletions b/‎bin/upload_for_sagemaker.sh
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/tensorflow/examples/mnist.md
Lines changed: 1 addition & 1 deletion b/‎docs/tensorflow/examples/mnist.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tensorflow/examples/resnet50.md
Lines changed: 1 addition & 1 deletion b/‎docs/tensorflow/examples/resnet50.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tensorflow/examples/simple.md
Lines changed: 1 addition & 1 deletion b/‎docs/tensorflow/examples/simple.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tensorflow/examples/sm_resnet50.md
Lines changed: 175 additions & 0 deletions b/‎docs/tensorflow/examples/sm_resnet50.md
Lines changed: 175 additions & 0 deletions
@@ -1,5 +1,7 @@
 #!/usr/bin/env bash
 
+set -ex
+
 if [ -z "$1" ]; then echo "Pass the tag which should be made the latest tag" &&  exit 1; fi
 
 for region in us-east-1 us-east-2 us-west-1 us-west-2 ap-south-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 ap-northeast-1 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 eu-north-1 sa-east-1
 
@@ -6,6 +6,7 @@ export AWS_PROFILE=removethissoitdoesntcrash
 # API DOCS
 aws s3 cp docs/mxnet/api.md s3://tornasole-external-preview-use1/frameworks/mxnet/
 aws s3 cp docs/tensorflow/api.md s3://tornasole-external-preview-use1/frameworks/tensorflow/
+aws s3 cp docs/tensorflow/examples/sm_resnet50.md s3://tornasole-external-preview-use1/frameworks/tensorflow/
 aws s3 cp docs/pytorch/api.md s3://tornasole-external-preview-use1/frameworks/pytorch/
 
 # DEV GUIDES
@@ -16,11 +17,11 @@ aws s3 cp sagemaker-docs/DeveloperGuide_Rules.md s3://tornasole-external-preview
 
 # MXNET EXAMPLES
 aws s3 sync examples/mxnet/sagemaker-notebooks s3://tornasole-external-preview-use1/frameworks/mxnet/examples/notebooks
-aws s3 cp examples/mxnet/scripts/mnist_mxnet.py s3://tornasole-external-preview-use1/frameworks/mxnet/examples/scripts
+aws s3 cp examples/mxnet/scripts/mnist_mxnet.py s3://tornasole-external-preview-use1/frameworks/mxnet/examples/scripts/
 
 # TF EXAMPLES
 aws s3 sync examples/tensorflow/sagemaker-notebooks s3://tornasole-external-preview-use1/frameworks/tensorflow/examples/notebooks
-aws s3 cp examples/tensorflow/scripts/simple.py s3://tornasole-external-preview-use1/frameworks/tensorflow/examples/scripts
+aws s3 sync examples/tensorflow/scripts s3://tornasole-external-preview-use1/frameworks/tensorflow/examples/scripts
 
 # PYTORCH EXAMPLES
 #aws s3 sync examples/pytorch s3://tornasole-external-preview-use1/frameworks/pytorch/examples
 
@@ -9,7 +9,7 @@ Below we call out the changes for Tornasole in the above script and describe the
 
 **Importing TornasoleTF**
 ```
-import tornasole_tf as ts
+import tornasole.tensorflow as ts
 ```
 **Saving gradients**
 
 
@@ -12,7 +12,7 @@ Below we call out the changes for Tornasole in the above script and describe the
 
 **Importing TornasoleTF**
 ```
-import tornasole_tf as ts
+import tornasole.tensorflow as ts
 ```
 **Saving weights**
 ```
 
@@ -8,7 +8,7 @@ Below we call out the changes for Tornasole in the above script and describe the
 
 **Importing TornasoleTF**
 ```
-import tornasole_tf as ts
+import tornasole.tensorflow as ts
 ```
 **Saving all tensors**
 ```
 
@@ -0,0 +1,175 @@
+# ResNet50 Imagenet Example
+We provide an example script `train_imagenet_resnet_hvd.py` which is a Tornasole-enabled TensorFlow training script for ResNet50/ImageNet. 
+**Please note that this script needs a GPU**. 
+It uses the Estimator interface of TensorFlow. 
+Here we show different scenarios of how to use Tornasole to 
+save different tensors during training for analysis. 
+Below are listed the changes we made to integrate these different 
+behaviors of Tornasole as well as example commands for you to try.
+
+## Integrating Tornasole
+Below we call out the changes for Tornasole in the above script and describe them
+
+**Importing TornasoleTF**
+```
+import tornasole.tensorflow as ts
+```
+**Saving weights**
+```
+include_collections.append('weights')
+```
+**Saving gradients**
+
+We need to wrap our optimizer with TornasoleOptimizer, and use this optimizer to minimize loss. 
+This will also enable us to access the gradients during analysis without having to identify which tensors out of the saved ones are the gradients.
+```
+opt = TornasoleOptimizer(opt)
+
+include_collections.append('gradients')
+ts.TornasoleHook(..., include_collections=include_collections, ...)
+```
+**Saving relu activations by variable**
+```
+x = tf.nn.relu(x + shortcut)
+ts.add_to_collection('relu_activations', x)
+...
+include_collections.append('relu_activations')
+ts.TornasoleHook(..., include_collections=include_collections, ...)
+```
+**Saving relu activations as reductions**
+```
+
+x = tf.nn.relu(x + shortcut)
+ts.add_to_collection('relu_activations', x)
+...
+rnc = ts.ReductionConfig(reductions=reductions, abs_reductions=abs_reductions)
+...
+ts.TornasoleHook(..., reduction_config=rnc, ...)
+```
+**Saving by regex**
+```
+ts.get_collection('default').include(FLAGS.tornasole_include)
+include_collections.append('default')
+ts.TornasoleHook(..., include_collections=include_collections, ...)
+```
+**Setting save interval**
+```
+ts.TornasoleHook(...,save_config=ts.SaveConfig(save_interval=FLAGS.tornasole_step_interval)...)
+```
+**Setting the right mode**
+
+You will see in the code that the appropriate mode has been set before the train or evaluate function calls.
+For example, the line:
+```
+hook.set_mode(ts.modes.TRAIN)
+```
+
+**Adding the hook**
+```
+training_hooks = []
+...
+training_hooks.append(hook)
+classifier.train(
+    input_fn=lambda: make_dataset(...),
+    max_steps=nstep,
+    hooks=training_hooks)
+```
+
+## Running the example
+Here we provide example hyperparameters dictionaries to run this script in different scenarios from within SageMaker. You can replace the resnet_hyperparams dictionary in the notebook we provided to use the following hyperparams dictionaries to run the jobs in these scenarios.
+
+### Run with synthetic or real data
+By default the following commands run with synthetic data. If you have ImageNet data prepared in tfrecord format, 
+ you can pass the path to that with the parameter data_dir.
+
+### Saving weights and gradients with Tornasole
+```
+hyperparams = {
+    'enable_tornasole': True,
+    'tornasole_save_weights': True,
+    'tornasole_save_gradients': True,
+    'tornasole_step_interval': 100
+}
+```
+
+### Simulating gradients which 'vanish'
+We simulate the scenario of gradients being really small (vanishing) by initializing weights with a small constant. 
+
+```
+hyperparams = {
+    'enable_tornasole': True,
+    'tornasole_save_weights': True,
+    'tornasole_save_gradients': True,
+    'tornasole_step_interval': 100,
+    'constant_initializer': 0.01
+}
+``` 
+#### Rule: VanishingGradient
+To monitor this condition for the first 10000 training steps, you can setup a Vanishing Gradient rule  as follows:
+
+```
+rule_specifications=[
+    {
+        "RuleName": "VanishingGradient",
+        "InstanceType": "ml.c5.4xlarge",
+        "RuntimeConfigurations": {
+            "end-step": "10000",
+        }
+    }
+]
+
+```
+#### Saving activations of RELU layers in full
+```
+hyperparams = {
+    'enable_tornasole': True,
+    'tornasole_save_relu_activations': True,
+    'tornasole_step_interval': 200,
+}
+```
+#### Saving activations of RELU layers as reductions
+```
+hyperparams = {
+    'enable_tornasole': True,
+    'tornasole_save_relu_activations': True,
+    'tornasole_step_interval': 200,
+    'tornasole_relu_reductions': 'min,max,mean,variance',
+    'tornasole_relu_reductions_abs': 'mean,variance',
+}
+```
+#### Saving weights every step
+If you want to compute and track the ratio of weights and updates, 
+you can do that by saving weights every step as follows 
+```
+hyperparams = {
+    'enable_tornasole': True,
+    'tornasole_save_weights': True,
+    'tornasole_step_interval': 1
+}
+```
+##### Rule: WeightUpdateRatio 
+To monitor the weights and updates during training, you can setup a WeightUpdateRatio rule as follows:
+
+```
+rule_specifications=[
+    {
+        "RuleName": "WeightUpdateRatio",
+        "InstanceType": "ml.c5.4xlarge",
+    }
+]
+```
+
+##### Rule: UnchangedTensor
+You can also invoke this rule to 
+monitor if tensors are not changing at every step. Here we are passing '.*' as the tensor_regex to monitor all tensors.
+```
+rule_specifications=[
+    {
+        "RuleName": "UnchangedTensor",
+        "InstanceType": "ml.c5.4xlarge",
+        "RuntimeConfigurations": {
+            "tensor_regex": ".*"
+        }
+    }
+]
+```