Skip to content

Commit 43936ee

Browse files
authored
Update Training-PPO.md
1 parent a512431 commit 43936ee

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

Documents/Training-PPO.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ The example [Getting Started with the 3D Balance Ball Environment](Getting-Start
2323
## Explanation of fields in the inspector
2424
We use similar parameters as in Unity ML-Agents. If something is confusing, read see their [document](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) for mode datails.
2525

26-
### TrainerPPO.cs
26+
#### TrainerPPO.cs
2727
* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
2828
* `parameters`: You need to assign this field with a TrainerParamsPPO scriptable object.
2929
* `continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training.
3030
* `checkpointPath`: the path of the checkpoint, including the file name.
3131
* `steps`: Just to show you the current step of the training.
3232

33-
### TrainerParamsPPO
33+
#### TrainerParamsPPO
3434
* `learningRate`: Learning rate used to train the neural network.
3535
* `maxTotalSteps`: Max steps the trainer will be training.
3636
* `saveModelInterval`: The trained model will be saved every this amount of steps.
@@ -45,12 +45,12 @@ We use similar parameters as in Unity ML-Agents. If something is confusing, read
4545
* `numEpochPerTrain`: For each training, the data in the buffer will be used repeatedly this amount of times.
4646
* `useHeuristicChance`: See [Training with Heuristics](#training-with-heuristics).
4747

48-
### RLModelPPO.cs
48+
#### RLModelPPO.cs
4949
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
5050
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
5151
* `optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here.
5252

53-
### RLNetworkSimpleAC
53+
#### RLNetworkSimpleAC
5454
This is a simple implementation of RLNetworkAC that you can create a plug it in as a neural network definition for any RLModelPPO. PPO uses actor/critic structure(See PPO algorithm).
5555
- `actorHiddenLayers`/`criticHiddenLayers`: Hidden layers of the network. The array size if the number of hidden layers. In each element, there are for parameters that defines each layer. Those do not have default values, so you have to fill them.
5656
- size: Size of this hidden layer.

0 commit comments

Comments
 (0)