[refactor] Structure configuration files into classes #3936

ervteng · 2020-05-07T22:38:33Z

Proposed change(s)

This PR implements the changes described in this design doc. In particular, it replaces the configuration dictionary (called trainer_params in many files) with objects based on the attrs library. These are more flexible than NamedTuples and support smart defaults, validators, typing, and can be structured/unstructured from Dicts based on the cattrs library.

This PR also assigns reasonable defaults to every hyperparameter, and mlagents-learn now allows for training without any configuration file specified (defaults to PPO with extrinsic reward signal). In the future, it will allow for defaults that are being set relative to other defaults (as done in the SelfPlaySettings).

The core of the changes are in settings.py.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Attrs:
https://www.attrs.org/en/stable/

Cattrs:
https://cattrs.readthedocs.io/en/latest/

This PR does not yet refactor how the settings are passed down from Trainer->Policy-> Optimizer. The other two objects still receive the entire TrainerSettings object.
TODO: Improved testing for error handling in settings.py, ~~Update docs.~~

NOTE: The appearance of output_path as default in the configs is a bit odd, as output_path is not settable through the configuration (it's determined by the run-id). I plan to remove output_path in a follow-up PR.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

andrewcoh · 2020-05-15T23:03:53Z

config/upgrade_config.py

@@ -0,0 +1,110 @@
+import attr


How long do you think we'll need to keep this script around?

Hopefully only for 1-2 releases. Or perhaps it should be released with Trainer 1.0 and removed after that.

andrewcoh · 2020-05-15T23:05:40Z

docs/Training-Configuration-File.md

+| `hyperparameters -> buffer_size`            | (default = `10240` for PPO and `50000` for SAC) Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. **This should be multiple times larger than `batch_size`**. Typically a larger `buffer_size` corresponds to more stable training updates. In SAC, the max size of the experience buffer - on the order of thousands of times longer than your episodes, so that SAC can learn from old as well as new experiences. <br><br>Typical range: PPO: `2048` - `409600`; SAC: `50000` - `1000000`                                                                                                                                                      |
+| `hyperparameters -> learning_rate`          | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3`                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| `hyperparameters -> learning_rate_schedule` | (default = `linear` for PPO and `constant` for SAC) Determines how learning rate changes over time. For PPO, we recommend decaying learning rate until max_steps so learning converges more stably. However, for some cases (e.g. training for an unknown amount of time) this feature can be disabled. For SAC, we recommend holding learning rate constant so that the agent can continue to learn until its Q function converges naturally. <br><br>`linear` decays the learning_rate linearly, reaching 0 at max_steps, while `constant` keeps the learning rate constant for the entire training run.                                                                                                           |
+| `max_steps`              | (default = `500000`) Total number of experience points that must be collected from the simulation before ending the training process. <br><br>Typical range: `5e5` - `1e7`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |


'experience points'

ml-agents/mlagents/trainers/components/bc/module.py

andrewcoh · 2020-05-15T23:20:06Z

ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py

@@ -26,9 +20,11 @@ def __init__(
        :param encoding_size: The size of the hidden encoding layer for the ICM


param docstring :(

harperj

Looks good to my eye -- nice refactor 👍

harperj · 2020-05-20T20:35:28Z

ml-agents/mlagents/trainers/ppo/trainer.py

+            self.trainer_settings["sequence_length"]
+            > self.trainer_settings["batch_size"]
+            and self.trainer_settings["use_recurrent"]


I think this can be:

Suggested change

self.trainer_settings["sequence_length"]

> self.trainer_settings["batch_size"]

and self.trainer_settings["use_recurrent"]

self.trainer_settings.sequence_length

> self.trainer_settings.batch_size

and self.trainer_settings.use_recurrent

Ah, this method isn't actually used anymore, the check is done in settings.py in _check_batch_size_seq_length. I've removed them from PPO and SAC.

harperj · 2020-05-20T20:36:39Z

ml-agents/mlagents/trainers/sac/optimizer.py

+                vis_encode_type = policy_network_settings.vis_encode_type
+
+                self.tau = hyperparameters.tau
+                self.burn_in_ratio = 0.0


Looks like this used to be a trainer param but is not just being set to 0.0? Any reason why?

This wasn't configurable before (also set to 0 as a constant at the top of the file). Currently the burn-in feature doesn't actually work properly with values greater than 0. Will fix or remove in a PR that just touches that.

harperj · 2020-05-20T20:38:09Z

ml-agents/mlagents/trainers/sac/trainer.py

+            self.trainer_settings["sequence_length"]
+            > self.trainer_settings["batch_size"]
+            and self.trainer_settings["use_recurrent"]


Suggested change

self.trainer_settings["sequence_length"]

> self.trainer_settings["batch_size"]

and self.trainer_settings["use_recurrent"]

self.trainer_settings.sequence_length

> self.trainer_settings.batch_size

and self.trainer_settings.use_recurrent

Ah, this method isn't actually used anymore, the check is done in settings.py in _check_batch_size_seq_length. I've removed them from PPO and SAC.

harperj · 2020-05-20T20:44:04Z

ml-agents/mlagents/trainers/settings.py

+    )
+    init_path: Optional[str] = None
+    output_path: str = "default"
+    # TODO: Remove parser default and remove from CLI


Address before merging?

Ervin Teng added 27 commits April 29, 2020 21:12

Use attrs for RunOptions and CLI

f2ce4c7

Add example of strict type conversion

124d777

Recursively apply cattr with being strict

7b39baa

PPO trains

b5121af

Use new settings for BC module

3dfe312

Use correct enum typing

cd23b0a

SAC now works

0ba816d

Better SAC defaults

ad33ab1

Reward Signals and GhostTrainer to new settings

a8406d9

Conversion script and fix mypy

a826bb4

Update curriculum to new settings

65a0e13

Fix issue with mypy fix

cf7990d

Enable running without config file

5060638

Fix issue with upgrade script

69ebbfb

Fix some tests

9e7c32c

Fix most of simple_rl tests

d29b4b7

Fix remaining simple_rl tests

c9c6613

Remove unneeded methods

32b934d

Fix some more tests

d0c3bd3

Fix meta curriculum test

a2bb9a0

Fix remaining tests

8885cb0

Merge branch 'master' into develop-attrs

a40fa55

Fix update config script

f5a97c8

Revert 3DBall.yaml

85827ce

Convert PPO configs to new format

b3bb269

Update SAC configs

41b11f1

Remove nulls from configs, update imitation

02b54fc

ervteng requested review from harperj, awjuliani and andrewcoh May 7, 2020 22:38

Ervin Teng added 6 commits May 14, 2020 17:12

Merge branch 'master' into develop-attrs

b4c587c

Change docs

2ebb433

Merge branch 'master' into develop-attrs

28507bf

Update with migration

74d523d

Fix run_experiment

93ab9d3

Fix simple_rl test

68634fb

ervteng marked this pull request as ready for review May 15, 2020 02:08

Ervin Teng added 5 commits May 15, 2020 10:15

Update docs with defaults

182b7a5

Add comment about BC

5c39284

Add more tests for settings

2ff78d7

Update changelog

c6234b2

Test missing demo_path

32ffc97

andrewcoh reviewed May 15, 2020

View reviewed changes

ml-agents/mlagents/trainers/components/bc/module.py Show resolved Hide resolved

andrewcoh reviewed May 15, 2020

View reviewed changes

Ervin Teng added 2 commits May 18, 2020 14:27

Improve docs and docstrings

18142fc

Merge branch 'master' into develop-attrs

e3d6b06

harperj approved these changes May 20, 2020

View reviewed changes

Ervin Teng added 5 commits May 20, 2020 13:58

Move keep_checkpoints to config rather than CLI

81d1186

Remove unused check param keys

04a7860

Remove keep_checkpoints from learn.py

43e5acd

Fix last test

f22bae8

Fix docs

da4bc73

ervteng merged commit 721b869 into master May 26, 2020

delete-merged-branch bot deleted the develop-attrs branch May 26, 2020 22:48

ervteng mentioned this pull request May 28, 2020

added sequence_length to doc #4023

Closed

9 tasks

ervteng mentioned this pull request Jun 5, 2020

[refactor] Remove nonfunctional output_path option from TrainerSettings #4087

Merged

10 tasks

chriselion mentioned this pull request Jan 28, 2021

Learning-Environment-Create-New rollerball_config.yaml is missing sequence_length parameter #4016

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] Structure configuration files into classes #3936

[refactor] Structure configuration files into classes #3936

Uh oh!

ervteng commented May 7, 2020 •

edited

Loading

Uh oh!

andrewcoh May 15, 2020

Uh oh!

ervteng May 18, 2020

Uh oh!

andrewcoh May 15, 2020

Uh oh!

Uh oh!

andrewcoh May 15, 2020

Uh oh!

harperj left a comment

Uh oh!

harperj May 20, 2020

Uh oh!

ervteng May 20, 2020

Uh oh!

harperj May 20, 2020

Uh oh!

ervteng May 20, 2020

Uh oh!

harperj May 20, 2020

Uh oh!

ervteng May 20, 2020

Uh oh!

harperj May 20, 2020

Uh oh!

ervteng May 20, 2020

Uh oh!

Uh oh!

		@@ -26,9 +20,11 @@ def __init__(
		:param encoding_size: The size of the hidden encoding layer for the ICM

[refactor] Structure configuration files into classes #3936

[refactor] Structure configuration files into classes #3936

Uh oh!

Conversation

ervteng commented May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harperj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng commented May 7, 2020 •

edited

Loading