Skip to content

[refactor] Structure configuration files into classes #3936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 26, 2020
Merged

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented May 7, 2020

Proposed change(s)

This PR implements the changes described in this design doc. In particular, it replaces the configuration dictionary (called trainer_params in many files) with objects based on the attrs library. These are more flexible than NamedTuples and support smart defaults, validators, typing, and can be structured/unstructured from Dicts based on the cattrs library.

This PR also assigns reasonable defaults to every hyperparameter, and mlagents-learn now allows for training without any configuration file specified (defaults to PPO with extrinsic reward signal). In the future, it will allow for defaults that are being set relative to other defaults (as done in the SelfPlaySettings).

The core of the changes are in settings.py.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Attrs:
https://www.attrs.org/en/stable/

Cattrs:
https://cattrs.readthedocs.io/en/latest/

This PR does not yet refactor how the settings are passed down from Trainer->Policy-> Optimizer. The other two objects still receive the entire TrainerSettings object.
TODO: Improved testing for error handling in settings.py, Update docs.

NOTE: The appearance of output_path as default in the configs is a bit odd, as output_path is not settable through the configuration (it's determined by the run-id). I plan to remove output_path in a follow-up PR.

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@ervteng ervteng requested review from harperj, awjuliani and andrewcoh May 7, 2020 22:38
@ervteng ervteng marked this pull request as ready for review May 15, 2020 02:08
@@ -0,0 +1,110 @@
import attr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do you think we'll need to keep this script around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully only for 1-2 releases. Or perhaps it should be released with Trainer 1.0 and removed after that.

| `hyperparameters -> buffer_size` | (default = `10240` for PPO and `50000` for SAC) Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. **This should be multiple times larger than `batch_size`**. Typically a larger `buffer_size` corresponds to more stable training updates. In SAC, the max size of the experience buffer - on the order of thousands of times longer than your episodes, so that SAC can learn from old as well as new experiences. <br><br>Typical range: PPO: `2048` - `409600`; SAC: `50000` - `1000000` |
| `hyperparameters -> learning_rate` | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3` |
| `hyperparameters -> learning_rate_schedule` | (default = `linear` for PPO and `constant` for SAC) Determines how learning rate changes over time. For PPO, we recommend decaying learning rate until max_steps so learning converges more stably. However, for some cases (e.g. training for an unknown amount of time) this feature can be disabled. For SAC, we recommend holding learning rate constant so that the agent can continue to learn until its Q function converges naturally. <br><br>`linear` decays the learning_rate linearly, reaching 0 at max_steps, while `constant` keeps the learning rate constant for the entire training run. |
| `max_steps` | (default = `500000`) Total number of experience points that must be collected from the simulation before ending the training process. <br><br>Typical range: `5e5` - `1e7` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'experience points'

@@ -26,9 +20,11 @@ def __init__(
:param encoding_size: The size of the hidden encoding layer for the ICM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param docstring :(

Copy link
Contributor

@harperj harperj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my eye -- nice refactor 👍

Comment on lines 83 to 85
self.trainer_settings["sequence_length"]
> self.trainer_settings["batch_size"]
and self.trainer_settings["use_recurrent"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be:

Suggested change
self.trainer_settings["sequence_length"]
> self.trainer_settings["batch_size"]
and self.trainer_settings["use_recurrent"]
self.trainer_settings.sequence_length
> self.trainer_settings.batch_size
and self.trainer_settings.use_recurrent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this method isn't actually used anymore, the check is done in settings.py in _check_batch_size_seq_length. I've removed them from PPO and SAC.

vis_encode_type = policy_network_settings.vis_encode_type

self.tau = hyperparameters.tau
self.burn_in_ratio = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this used to be a trainer param but is not just being set to 0.0? Any reason why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't configurable before (also set to 0 as a constant at the top of the file). Currently the burn-in feature doesn't actually work properly with values greater than 0. Will fix or remove in a PR that just touches that.

Comment on lines 85 to 87
self.trainer_settings["sequence_length"]
> self.trainer_settings["batch_size"]
and self.trainer_settings["use_recurrent"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.trainer_settings["sequence_length"]
> self.trainer_settings["batch_size"]
and self.trainer_settings["use_recurrent"]
self.trainer_settings.sequence_length
> self.trainer_settings.batch_size
and self.trainer_settings.use_recurrent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this method isn't actually used anymore, the check is done in settings.py in _check_batch_size_seq_length. I've removed them from PPO and SAC.

)
init_path: Optional[str] = None
output_path: str = "default"
# TODO: Remove parser default and remove from CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@ervteng ervteng merged commit 721b869 into master May 26, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-attrs branch May 26, 2020 22:48
@ervteng ervteng mentioned this pull request May 28, 2020
9 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants