Skip to content

Commit 2e41b5b

Browse files
authored
Update Training-SL.md
1 parent a71be84 commit 2e41b5b

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

Documents/Training-SL.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ This algorithm is basically trying to train the neural network to remember what
44

55
The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use supervised learning to train the neural network from how you are playing the game yourself.
66

7-
## Overall Steps
7+
## Overall Steps to Setup
88
1. Create a environment using ML-Agent API. See the [instruction from Unity](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md)
99
3. Change the BrainType of your brain to `InternalTrainable` in inspector.
1010
2. Create a Trainer
1111
1. Attach a `TrainerMimic.cs` to any GameObject.
12-
2. Create a `TrainerParamsMimic` scriptable object with proper parameters in your project and assign it to the Params field in `TrainerMimic.cs`.
12+
2. Create a `TrainerParamsMimic` scriptable object with proper parameters in your project(in project window selelct `Create/ml-agent/ppo/TrainerParamsMimic`), and assign it to the Params field in `TrainerMimic.cs`.
1313
3. Assign the Trainer to the `Trainer` field of your Brain.
1414
3. Create a Model
1515
1. Attach a `SupervisedLearningModel.cs` to any GameObject.
16-
2. Create a `SupervisedLearningNetworkSimple` scriptable object in your project and assign it to the Network field in `SupervisedLearningModel.cs`.
16+
2. Create a `SupervisedLearningNetworkSimple` scriptable object in your project(in project window selelct `Create/ml-agent/ppo/SupervisedLearningNetworkSimple`), and assign it to the Network field in `SupervisedLearningModel.cs`.
1717
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
1818

1919
4. Create a Decision
@@ -30,22 +30,27 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
3030
* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
3131
* `parameters`: You need to assign this field with a TrainerParamsMimic scriptable object.
3232
* `continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training.
33-
* `checkpointPath`: the path of the checkpoint, including the file name.
33+
* `checkpointPath`: The path of the checkpoint directory.
34+
* `checkpointFileName`: The name of the checkpoint file
3435
* `steps`: Just to show you the current step of the training.
35-
* 'isCollectingData': If the training is collecting training data from Agents with Decision.
36-
* `dataBufferCount`: Current collected data count.
36+
* `isCollectingData`: If the training is collecting training data from Agents with Decision.
37+
* `trainingDataSaveFileName`: The name of the training data file. The collected training data is saved/loaded from here.
38+
* `dataBufferCount`: Shows the current collected data count.
3739

3840
#### TrainerParamsMimic
3941
* `learningRate`: Learning rate used to train the neural network.
4042
* `maxTotalSteps`: Max steps the trainer will be training.
4143
* `saveModelInterval`: The trained model will be saved every this amount of steps.
44+
* `logInterval`: How many traing steps between each logging.
4245
* `batchSize`: Mini batch size when training.
4346
* `numIterationPerTrain`: How many batches to train for each step(fixed update).
4447
* `requiredDataBeforeTraining`: How many collected data count is needed before it start to traing the neural network.
4548
* `maxBufferSize`: Max buffer size of collected data. If the data buffer count exceeds this number, old data will be overrided. Set this to 0 to remove the limit.
4649

4750
#### SupervisedLearningModel.cs
4851
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
52+
* `modelName`: The name of the model. It is used for the namescope When buliding the neural network. Can be empty by default.
53+
* `weightSaveMode`: This decides the names of the weights of neural network when saving to checkpoints as serialized dictionary. No need to changes this ususally.
4954
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
5055
* `optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
5156

@@ -66,7 +71,7 @@ You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model inst
6671

6772
Note that currently the GAN network we made does not support visual observation.
6873

69-
#### Steps
74+
#### Steps to Setup
7075
Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
7176

7277
- Create a GAN model:
@@ -76,6 +81,8 @@ Most the same steps as using regular [supervised learning](Overall Steps) as bef
7681

7782
#### GANModel.cs
7883
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
84+
* `modelName`: The name of the model. It is used for the namescope When buliding the neural network. Can be empty by default.
85+
* `weightSaveMode`: This decides the names of the weights of neural network when saving to checkpoints as serialized dictionary. No need to changes this ususally.
7986
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
8087
* `generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine.
8188
* `outputShape`: Output shape of GAN. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.

0 commit comments

Comments
 (0)