You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a copy of the Unity ML-Agents' [Banana Collectors](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#banana-collector), with modifications for in editor training tutorial.
41
43
42
-
Classic Pong game. Two agents play with each other in this game.
43
-
44
-
* Observation: Vector of size 6. the y positions of the agent itself and the opponent. Position and velocity of the ball. The observations are transformed so that each agent feels they are the agent at the left.
45
44
* Scenes:
46
-
- PongRL:
47
-
- Use PPO algorithm.
48
-
- Discrete action space: up, stay, down.
49
-
- PongRLWithHeuristic:
50
-
- Use PPO with heuristic. The left agent collects extra training data from manual designed AI decisions and it is added to the regular PPO data collected from right agent for training.
51
-
- Discrete action space: up, stay, down.
52
-
- PongSL:
53
-
- Use supervised learning. The left agent uses manual control for collecting data. Once enough data is collected, it will start to supervised learning to train the brain.
54
-
- Discrete action space: up, stay, down.
55
-
- PongSLGAN:
56
-
- Use supervised learning. But the learning model is GAN instead of regular one.
57
-
- Continuous action space: vertical velocity.
58
-
59
-
## Pole
45
+
- Banana: Basic PPO example. It is also an example of using discrete action branching.
46
+
47
+
## GAN2DPlane
60
48
<palign="center">
61
-
<img src="Images/ExampleList/Pole.png"
62
-
alt="Pole"
49
+
<img src="Images/ExampleList/GAN2DPlane.png"
50
+
alt="GAN2DPlane"
63
51
width="600" border="10" />
64
52
</p>
65
-
A 2D physcis based game where the agent need to give a torque to the pole to keep it up.
66
53
67
-
* Continuous action space: torque.
68
-
* Algorithm: PPO.
54
+
A simple demo of how to use GAN directly.
55
+
56
+
Click StartTraining to generate training data and start training.
57
+
58
+
Click UseGAN to generate data from GAN(blue).
59
+
60
+
## GridWorld
61
+
62
+
<palign="center">
63
+
<img src="Images/ExampleList/GridWorld.png"
64
+
alt="GridWorld"
65
+
width="600" border="10" />
66
+
</p>
67
+
68
+
This is just a copy of the Unity ML-Agents' [GridWorld](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#gridworld), with modifications for in editor training tutorial.
69
+
69
70
* Scenes:
70
-
- Pole: Vector Observation of size 2. Angular velocity and curren angle.
71
-
- PoleVisual: Visual Observation. Game is modified so that the graphics shows the angular velocity.
72
-
73
-
71
+
- GridWorld: Basic PPO example. It uses visual observation and discrete action space with [masking](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Design-Agents.md#masking-discrete-actions).
72
+
74
73
## Maze
75
74
<palign="center">
76
75
<img src="Images/ExampleList/Maze.png"
77
76
alt="Maze"
78
77
width="600" border="10" />
79
78
</p>
80
-
81
79
A game where the agent(yellow) has to reach to the destination(green) and avoid the obstables(red).
82
80
83
81
* Discrete action space: Up, Down, Left, Right.
84
82
* Algorithm: PPO.
85
83
* Scenes:
86
84
- MazePPO: Vector Observation with size of the map(36 by default). Each color has different value.
87
85
- MazePPOVisual: Visual Observation of size 32x32, colored.
88
-
89
-
## GAN2DPlane
86
+
87
+
## Pole
90
88
<palign="center">
91
-
<img src="Images/ExampleList/GAN2DPlane.png"
92
-
alt="GAN2DPlane"
89
+
<img src="Images/ExampleList/Pole.png"
90
+
alt="Pole"
93
91
width="600" border="10" />
94
92
</p>
93
+
A 2D physcis based game where the agent need to give a torque to the pole to keep it up.
95
94
96
-
A simple demo of how to use GAN directly.
97
-
98
-
Click StartTraining to generate training data and start training.
99
-
100
-
Click UseGAN to generate data from GAN(blue).
101
-
102
-
## Crawler
95
+
* Continuous action space: torque.
96
+
* Algorithm: PPO.
97
+
* Scenes:
98
+
- Pole: Vector Observation of size 2. Angular velocity and curren angle.
99
+
- PoleVisual: Visual Observation. Game is modified so that the graphics shows the angular velocity.
100
+
101
+
102
+
## Pong
103
103
<palign="center">
104
-
<img src="Images/ExampleList/Crawler.png"
105
-
alt="Crawler"
104
+
<img src="Images/ExampleList/Pong.png"
105
+
alt="Pong"
106
106
width="600" border="10" />
107
107
</p>
108
108
109
-
This is just a copy of the Unity ML-Agents' [3DBall environment](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#3dball-3d-balance-ball), with modifications.
110
109
111
-
* Scenes:
112
-
- CrawlerNE: Neural Evolution example.
113
-
- Crawler??: Experiemental scene for hybrid training(Similiar to Evolved Policy Gradient.)
114
-
110
+
Classic Pong game. Two agents play with each other in this game.
115
111
116
-
## Walker
117
-
<palign="center">
118
-
<img src="Images/ExampleList/Walker.png"
119
-
alt="Walker"
120
-
width="600" border="10" />
121
-
</p>
112
+
* Observation: Vector of size 6. the y positions of the agent itself and the opponent. Position and velocity of the ball. The observations are transformed so that each agent feels they are the agent at the left.
113
+
* Scenes:
114
+
- PongRL:
115
+
- Use PPO algorithm.
116
+
- Discrete action space: up, stay, down.
117
+
- PongRLWithSLInit:
118
+
- Use PPO algorithm. However, the PPO model is initailize with weights trained from Supervised Learning. This might make the training faster to reach the best result.
119
+
- Discrete action space: up, stay, down.
120
+
- PongSL:
121
+
- Use supervised learning. The left agent uses manual control for collecting data. Once enough data is collected, it will start to supervised learning to train the brain.
122
+
- Discrete action space: up, stay, down.
123
+
- PongSLGAN:
124
+
- Use supervised learning. But the learning model is GAN instead of regular one.
125
+
- Continuous action space: vertical velocity.
122
126
123
-
A copy of Unity MLAgent's Walker example. A test scene for hybrid training. Not working at all. Don't use it.
0 commit comments