Skip to content

Commit db08f04

Browse files
authored
Update ExamplesList.md
1 parent b999098 commit db08f04

File tree

1 file changed

+58
-55
lines changed

1 file changed

+58
-55
lines changed

Documents/ExamplesList.md

Lines changed: 58 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -30,94 +30,97 @@ This is just a copy of the Unity ML-Agents' [3DBall environment](https://github.
3030
* Scenes:
3131
- 3DBall: Basic PPO example used by [Getting Started with Balance Ball](Getting-Started-with-Balance-Ball.md) tutorial.
3232
- 3DBallNE: Basic Neural Evolution example.
33+
34+
## BananaCollectors
3335

34-
## Pong
3536
<p align="center">
36-
<img src="Images/ExampleList/Pong.png"
37-
alt="Pong"
37+
<img src="Images/ExampleList/BananaCollectors.png"
38+
alt="BananaCollectors"
3839
width="600" border="10" />
3940
</p>
4041

42+
This is just a copy of the Unity ML-Agents' [Banana Collectors](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#banana-collector), with modifications for in editor training tutorial.
4143

42-
Classic Pong game. Two agents play with each other in this game.
43-
44-
* Observation: Vector of size 6. the y positions of the agent itself and the opponent. Position and velocity of the ball. The observations are transformed so that each agent feels they are the agent at the left.
4544
* Scenes:
46-
- PongRL:
47-
- Use PPO algorithm.
48-
- Discrete action space: up, stay, down.
49-
- PongRLWithHeuristic:
50-
- Use PPO with heuristic. The left agent collects extra training data from manual designed AI decisions and it is added to the regular PPO data collected from right agent for training.
51-
- Discrete action space: up, stay, down.
52-
- PongSL:
53-
- Use supervised learning. The left agent uses manual control for collecting data. Once enough data is collected, it will start to supervised learning to train the brain.
54-
- Discrete action space: up, stay, down.
55-
- PongSLGAN:
56-
- Use supervised learning. But the learning model is GAN instead of regular one.
57-
- Continuous action space: vertical velocity.
58-
59-
## Pole
45+
- Banana: Basic PPO example. It is also an example of using discrete action branching.
46+
47+
## GAN2DPlane
6048
<p align="center">
61-
<img src="Images/ExampleList/Pole.png"
62-
alt="Pole"
49+
<img src="Images/ExampleList/GAN2DPlane.png"
50+
alt="GAN2DPlane"
6351
width="600" border="10" />
6452
</p>
65-
A 2D physcis based game where the agent need to give a torque to the pole to keep it up.
6653

67-
* Continuous action space: torque.
68-
* Algorithm: PPO.
54+
A simple demo of how to use GAN directly.
55+
56+
Click StartTraining to generate training data and start training.
57+
58+
Click UseGAN to generate data from GAN(blue).
59+
60+
## GridWorld
61+
62+
<p align="center">
63+
<img src="Images/ExampleList/GridWorld.png"
64+
alt="GridWorld"
65+
width="600" border="10" />
66+
</p>
67+
68+
This is just a copy of the Unity ML-Agents' [GridWorld](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#gridworld), with modifications for in editor training tutorial.
69+
6970
* Scenes:
70-
- Pole: Vector Observation of size 2. Angular velocity and curren angle.
71-
- PoleVisual: Visual Observation. Game is modified so that the graphics shows the angular velocity.
72-
73-
71+
- GridWorld: Basic PPO example. It uses visual observation and discrete action space with [masking](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Design-Agents.md#masking-discrete-actions).
72+
7473
## Maze
7574
<p align="center">
7675
<img src="Images/ExampleList/Maze.png"
7776
alt="Maze"
7877
width="600" border="10" />
7978
</p>
80-
8179
A game where the agent(yellow) has to reach to the destination(green) and avoid the obstables(red).
8280

8381
* Discrete action space: Up, Down, Left, Right.
8482
* Algorithm: PPO.
8583
* Scenes:
8684
- MazePPO: Vector Observation with size of the map(36 by default). Each color has different value.
8785
- MazePPOVisual: Visual Observation of size 32x32, colored.
88-
89-
## GAN2DPlane
86+
87+
## Pole
9088
<p align="center">
91-
<img src="Images/ExampleList/GAN2DPlane.png"
92-
alt="GAN2DPlane"
89+
<img src="Images/ExampleList/Pole.png"
90+
alt="Pole"
9391
width="600" border="10" />
9492
</p>
93+
A 2D physcis based game where the agent need to give a torque to the pole to keep it up.
9594

96-
A simple demo of how to use GAN directly.
97-
98-
Click StartTraining to generate training data and start training.
99-
100-
Click UseGAN to generate data from GAN(blue).
101-
102-
## Crawler
95+
* Continuous action space: torque.
96+
* Algorithm: PPO.
97+
* Scenes:
98+
- Pole: Vector Observation of size 2. Angular velocity and curren angle.
99+
- PoleVisual: Visual Observation. Game is modified so that the graphics shows the angular velocity.
100+
101+
102+
## Pong
103103
<p align="center">
104-
<img src="Images/ExampleList/Crawler.png"
105-
alt="Crawler"
104+
<img src="Images/ExampleList/Pong.png"
105+
alt="Pong"
106106
width="600" border="10" />
107107
</p>
108108

109-
This is just a copy of the Unity ML-Agents' [3DBall environment](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#3dball-3d-balance-ball), with modifications.
110109

111-
* Scenes:
112-
- CrawlerNE: Neural Evolution example.
113-
- Crawler??: Experiemental scene for hybrid training(Similiar to Evolved Policy Gradient.)
114-
110+
Classic Pong game. Two agents play with each other in this game.
115111

116-
## Walker
117-
<p align="center">
118-
<img src="Images/ExampleList/Walker.png"
119-
alt="Walker"
120-
width="600" border="10" />
121-
</p>
112+
* Observation: Vector of size 6. the y positions of the agent itself and the opponent. Position and velocity of the ball. The observations are transformed so that each agent feels they are the agent at the left.
113+
* Scenes:
114+
- PongRL:
115+
- Use PPO algorithm.
116+
- Discrete action space: up, stay, down.
117+
- PongRLWithSLInit:
118+
- Use PPO algorithm. However, the PPO model is initailize with weights trained from Supervised Learning. This might make the training faster to reach the best result.
119+
- Discrete action space: up, stay, down.
120+
- PongSL:
121+
- Use supervised learning. The left agent uses manual control for collecting data. Once enough data is collected, it will start to supervised learning to train the brain.
122+
- Discrete action space: up, stay, down.
123+
- PongSLGAN:
124+
- Use supervised learning. But the learning model is GAN instead of regular one.
125+
- Continuous action space: vertical velocity.
122126

123-
A copy of Unity MLAgent's Walker example. A test scene for hybrid training. Not working at all. Don't use it.

0 commit comments

Comments
 (0)