|
124 | 124 | "cell_type": "markdown",
|
125 | 125 | "metadata": {},
|
126 | 126 | "source": [
|
127 |
| - "The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. \n", |
| 127 | + "The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/r1.12.0/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. \n", |
128 | 128 | "\n",
|
129 | 129 | "### A regular ```model_fn```\n",
|
130 | 130 | "\n",
|
131 | 131 | "A regular **```model_fn```** follows the pattern:\n",
|
132 |
| - "1. [defines a neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L96)\n", |
133 |
| - "- [applies the ```features``` in the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L178)\n", |
134 |
| - "- [if the ```mode``` is ```PREDICT```, returns the output from the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L186)\n", |
135 |
| - "- [calculates the loss function comparing the output with the ```labels```](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L188)\n", |
136 |
| - "- [creates an optimizer and minimizes the loss function to improve the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L193)\n", |
137 |
| - "- [returns the output, optimizer and loss function](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L205)\n", |
| 132 | + "1. [defines a neural network](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L103)\n", |
| 133 | + "- [applies the ```features``` in the neural network](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L104)\n", |
| 134 | + "- [if the ```mode``` is ```PREDICT```, returns the output from the neural network](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L108)\n", |
| 135 | + "- [calculates the loss function comparing the output with the ```labels```](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L142)\n", |
| 136 | + "- [creates an optimizer and minimizes the loss function to improve the neural network](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L121)\n", |
| 137 | + "- [returns the output, optimizer and loss function](https://github.com/tensorflow/models/blob/r1.12.0/official/mnist/mnist.py#L136-L139)\n", |
138 | 138 | "\n",
|
139 | 139 | "### Writing a ```model_fn``` for distributed training\n",
|
140 | 140 | "When distributed training happens, the same neural network will be sent to the multiple training instances. Each instance will predict a batch of the dataset, calculate loss and minimize the optimizer. One entire loop of this process is called **training step**.\n",
|
141 | 141 | "\n",
|
142 | 142 | "#### Syncronizing training steps\n",
|
143 |
| - "A [global step](https://www.tensorflow.org/api_docs/python/tf/train/global_step) is a global variable shared between the instances. It necessary for distributed training, so the optimizer will keep track of the number of **training steps** between runs: \n", |
| 143 | + "A [global step](https://github.com/tensorflow/docs/blob/r1.12/site/en/api_docs/python/tf/train/global_step.md) is a global variable shared between the instances. It necessary for distributed training, so the optimizer will keep track of the number of **training steps** between runs: \n", |
144 | 144 | "\n",
|
145 | 145 | "```python\n",
|
146 | 146 | "train_op = optimizer.minimize(loss, tf.train.get_or_create_global_step())\n",
|
|
176 | 176 | "metadata": {},
|
177 | 177 | "outputs": [],
|
178 | 178 | "source": [
|
179 |
| - "estimator = TensorFlow(entry_point='mnist.py',\n", |
180 |
| - " role=role,\n", |
181 |
| - " framework_version='1.12.0',\n", |
182 |
| - " training_steps=1000, \n", |
183 |
| - " evaluation_steps=100,\n", |
184 |
| - " train_instance_count=1,\n", |
185 |
| - " train_instance_type='ml.m4.xlarge',\n", |
186 |
| - " base_job_name='DEMO-hpo-tensorflow')" |
| 179 | + "estimator = TensorFlow(py_version='py3',\n", |
| 180 | + " entry_point='mnist.py',\n", |
| 181 | + " role=role,\n", |
| 182 | + " framework_version='1.12.0',\n", |
| 183 | + " training_steps=1000,\n", |
| 184 | + " evaluation_steps=100,\n", |
| 185 | + " instance_count=1,\n", |
| 186 | + " instance_type='ml.m4.xlarge',\n", |
| 187 | + " base_job_name='DEMO-hpo-tensorflow')" |
187 | 188 | ]
|
188 | 189 | },
|
189 | 190 | {
|
|
268 | 269 | "metadata": {},
|
269 | 270 | "outputs": [],
|
270 | 271 | "source": [
|
271 |
| - "tuner.fit(inputs)" |
| 272 | + "tuner.fit(inputs, wait=False)" |
272 | 273 | ]
|
273 | 274 | },
|
274 | 275 | {
|
|
0 commit comments