You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documents/Training-SL.md
+14-7Lines changed: 14 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -4,16 +4,16 @@ This algorithm is basically trying to train the neural network to remember what
4
4
5
5
The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use supervised learning to train the neural network from how you are playing the game yourself.
6
6
7
-
## Overall Steps
7
+
## Overall Steps to Setup
8
8
1. Create a environment using ML-Agent API. See the [instruction from Unity](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md)
9
9
3. Change the BrainType of your brain to `InternalTrainable` in inspector.
10
10
2. Create a Trainer
11
11
1. Attach a `TrainerMimic.cs` to any GameObject.
12
-
2. Create a `TrainerParamsMimic` scriptable object with proper parameters in your project and assign it to the Params field in `TrainerMimic.cs`.
12
+
2. Create a `TrainerParamsMimic` scriptable object with proper parameters in your project(in project window selelct `Create/ml-agent/ppo/TrainerParamsMimic`), and assign it to the Params field in `TrainerMimic.cs`.
13
13
3. Assign the Trainer to the `Trainer` field of your Brain.
14
14
3. Create a Model
15
15
1. Attach a `SupervisedLearningModel.cs` to any GameObject.
16
-
2. Create a `SupervisedLearningNetworkSimple` scriptable object in your project and assign it to the Network field in `SupervisedLearningModel.cs`.
16
+
2. Create a `SupervisedLearningNetworkSimple` scriptable object in your project(in project window selelct `Create/ml-agent/ppo/SupervisedLearningNetworkSimple`), and assign it to the Network field in `SupervisedLearningModel.cs`.
17
17
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
18
18
19
19
4. Create a Decision
@@ -30,22 +30,27 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
30
30
*`isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
31
31
*`parameters`: You need to assign this field with a TrainerParamsMimic scriptable object.
32
32
*`continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training.
33
-
*`checkpointPath`: the path of the checkpoint, including the file name.
33
+
*`checkpointPath`: The path of the checkpoint directory.
34
+
*`checkpointFileName`: The name of the checkpoint file
34
35
*`steps`: Just to show you the current step of the training.
35
-
* 'isCollectingData': If the training is collecting training data from Agents with Decision.
36
-
*`dataBufferCount`: Current collected data count.
36
+
*`isCollectingData`: If the training is collecting training data from Agents with Decision.
37
+
*`trainingDataSaveFileName`: The name of the training data file. The collected training data is saved/loaded from here.
38
+
*`dataBufferCount`: Shows the current collected data count.
37
39
38
40
#### TrainerParamsMimic
39
41
*`learningRate`: Learning rate used to train the neural network.
40
42
*`maxTotalSteps`: Max steps the trainer will be training.
41
43
*`saveModelInterval`: The trained model will be saved every this amount of steps.
44
+
*`logInterval`: How many traing steps between each logging.
42
45
*`batchSize`: Mini batch size when training.
43
46
*`numIterationPerTrain`: How many batches to train for each step(fixed update).
44
47
*`requiredDataBeforeTraining`: How many collected data count is needed before it start to traing the neural network.
45
48
*`maxBufferSize`: Max buffer size of collected data. If the data buffer count exceeds this number, old data will be overrided. Set this to 0 to remove the limit.
46
49
47
50
#### SupervisedLearningModel.cs
48
51
*`checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
52
+
*`modelName`: The name of the model. It is used for the namescope When buliding the neural network. Can be empty by default.
53
+
*`weightSaveMode`: This decides the names of the weights of neural network when saving to checkpoints as serialized dictionary. No need to changes this ususally.
49
54
*`Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
50
55
*`optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
51
56
@@ -66,7 +71,7 @@ You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model inst
66
71
67
72
Note that currently the GAN network we made does not support visual observation.
68
73
69
-
#### Steps
74
+
#### Steps to Setup
70
75
Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
71
76
72
77
- Create a GAN model:
@@ -76,6 +81,8 @@ Most the same steps as using regular [supervised learning](Overall Steps) as bef
76
81
77
82
#### GANModel.cs
78
83
*`checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
84
+
*`modelName`: The name of the model. It is used for the namescope When buliding the neural network. Can be empty by default.
85
+
*`weightSaveMode`: This decides the names of the weights of neural network when saving to checkpoints as serialized dictionary. No need to changes this ususally.
79
86
*`Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
80
87
*`generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine.
81
88
*`outputShape`: Output shape of GAN. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
0 commit comments