-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Refactor of Curriculum and parameter sampling #4160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s samplers and floats can be used interchangeably
parameter. | ||
""" | ||
|
||
lessons: List[Lesson] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this generally should match the YAML structure/keys, otherwise it won't be able to be dumped. So in this case this class should contain curriculum: List[Lesson]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create these using the structure method, not the default converter. This code works as intended.
I did not name it curriculum because it can sometimes contain a single lesson (not a curriculum wen there is a single value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's kind of defeating the purpose of the attrs
classes if they don't match the YAML. In the other classes doing an unstructure
of the entire RunOptions
class structure creates a usable YAML file.
Hence I think the environment_parameters
dict should contain as the entries either this object (which contains curriculum
) or a Lesson
object if it only has a single lesson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you are saying. And I think it is related to your other comment. I am trying to make the EnvironmentParameterSetting agnostic of how the parameters where written in the yaml. This allows me to have much simpler logic in the update_lesson
method since I know I always have a list of lessons, always with a completion criteria and a sampler. But in the yaml, there are multiple ways to specify a sampler :
small_wall_height:
curriculum:
- Lesson0:
value: 1.5
small_wall_height: 1.5
other_wall_height:
sampler_type: uniform
sampler_parameters:
min_value: 1.50
max_value: 1.51
But all must result in a list of lessons. I could have the settings mirror the yaml, but then I would need to write a converter from the settings to the list of Lessons.
Do you think there is a way to both mirror the yaml and convert it to a list of lessons ?
Or do you think I should not have a list of lessons to begin with and have my logic be case by case (curriculum / sampler / constant) ?
* Add test for settings export * Update ml-agents/mlagents/trainers/tests/test_settings.py Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]>
…-Technologies/ml-agents into develop-env-param-refactor
var min = m_ResetParams.GetWithDefault("big_wall_min_height", 8); | ||
var max = m_ResetParams.GetWithDefault("big_wall_max_height", 8); | ||
var height = min + Random.value * (max - min); | ||
var height = m_ResetParams.GetWithDefault("big_wall_height", 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On WallJump, the sampling now happens using the sampler feature
@@ -119,6 +121,7 @@ def _reward_signal_steps_per_update_default(self): | |||
return self.steps_per_update | |||
|
|||
|
|||
# INTRINSIC REWARD SIGNALS ############################################################# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an attempt at organizing the sections of this file better. I can remove if it does not deliver.
EnvironmentParameterSettings._check_lesson_chain( | ||
d_final[environment_parameter].curriculum, environment_parameter | ||
) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, everything under environment_parameters gets converted into a Lesson
. My concern with this is that this sets the precedent that every new env parameter controller needs to go through the control for curriculum. It's not obvious how this might extend to things like an adaptive sampling scheme that don't have predefined completion criterion or something like Ervin and Scott's task parameterizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this is OK. A new sampler type (like adaptive, or active learning, etc.) would be set as a Value in the 1st lesson. Running it through curriculum would allow us, for instance, to do multiple steps (e.g. Lesson 0 is a constant value, Lesson 1 is a regular Sampler, Lesson 2 is an adaptive sampler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are basically saying that all future environment parameter controllers will be controlled as curriculums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always modify environment parameter as needed but sampling was easy to fold as curriculum. I think for environment parameters, whatever method we find for generating the parameters, it should be possible to fold it into a curriculum, even if it has only one lesson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is (1) this unnecessarily incurs the cost of being a curriculum (i.e. the extra function calls in the EnvParMan) when it doesn't need to be and (2) environment parameter controllers may use update rules that are not as rigid as curriculum update thresholds or depend on something other metric than progress/reward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) The cost of being a curriculum allows us to have a much simpler interface of the EnvironmentParameterManager. I agree this is a (very small) added cost of computation but reusing code is important IMO.
(2) Nothing prevents us to make the completion criteria evolve in the future to fit a future use case or create adaptive samplers. We can also add new fields and methods to the EnvironmentParameterManager later if we need, but for this PR, formulating the samplers as one lesson curriculum seemed like an opportunity for code simplification (since curriculum now support sampling).
We can try to edit the design doc with an alternative to this implementation?
Would the EnvironmentParameterSetting have an optional list of lessons and an optional sampler instead of a list of lessons ?
Co-authored-by: Ervin T. <[email protected]>
ml-agents/mlagents/trainers/learn.py
Outdated
run_seed: int, | ||
restore: bool = False, | ||
) -> Optional[EnvironmentParameterManager]: | ||
if config is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return an empty EnvironmentParameterManager in this case? I think it would simplify some of the None checks in TrainerController.
|
||
def reset_env_if_ready(self, env: EnvManager) -> None: | ||
if self.meta_curriculum: | ||
if self.param_manager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this code could get simpler if we knew param_manager was non-None
): | ||
behavior_to_consider = lesson.completion_criteria.behavior | ||
if behavior_to_consider in trainer_steps: | ||
must_increment, new_smoothing = CompletionCriteriaSettings.need_increment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this static? lesson.completion_criteria.need_increment(...) feels more natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small optional feedback, looks good otherwise.
Co-authored-by: Chris Elion <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed some future ideas for this feature in an offline slack thread but nothing that should block this.
* Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155) * handle empty and invalid tags * don't compare null or empty tags * changelog * avoid console spam when editing tag name * [docs] Fix a typo in a link in the package docs (#4161) * update FAQ to include disabling graphics (#4159) * update FAQ and exception message * remove colab link * [MLA-1009] observable performance tests (#4031) * WIP perf tests * WIP perf test * add marker tests too * move to devproject * yamato first pass * chmod * fix trigger, fix meta files * fix utr command * fix artifact paths * Update com.unity.ml-agents-performance.yml * test properties, reduce some noise * timer around RequestDecision * actually set ObservableAttributeHandling * undo asmdef changes * Yamato inference tests (#4066) * better errors for missing constants * run inference in yamato after training * add extension * debug subprocess args * fix exe path * search for executable * fix dumb bug * -batchmode * fail if inference fails * install tf2onnx on yamato * allow onnx for overrides (expect to fail now) * enable logs * fix commandline arg * catch exception from SetModel and exit * cleanup error message * model artifacts, logs as artifacts, fix pip * don't run onnx * cleanup and comment * update extension handling * Update CONTRIBUTING.md (#4170) we removed contributions welcome. * FoodCollectorAgent - don't convert bool to int (#4169) * [docs] Corrected num_layers default value (#4167) * [docs] Fix table formatting (#4168) * [CI] Better hyperparameters for Pyramids-SAC, WalkerStatic-SAC, and Reacher-PPO (#4154) * [bug-fix] Initialize-from being incorrectly loaded as "None" rather than None (#4175) * Modified the documentation of the Heuristic method (default action = previous action) (#4174) * Modifying the documentation to explain that Heuristic method default action will be the previous action decided by the heuristic. Changing this behavior would be a breking change. * Rephrase the working of the documentation of the default action of the Heuristic method * Forgot an import * [MLA-240] Physic-based pose generation (#4166) * hierarchy POC * WIP * abstract class * clean up init * separate files * cleanup, unit test * add Articulation util * sensor WIP * ArticulationBody sensor, starting docs * docstrings, cleanup * hierarchy tests, transform operators * unit tests * use Pose struct instead * delete QTTransform * rename * renames and compile fixes * remove ArticulationBodySensor* for now * revert CrawlerAgent changes * rename * Add TargetController/OrientationCubeController Components & Bugfix (#4157) * added Target and OCube controllers. updated crawler envs * update walker prefab * add refs to prefab * Update Crawler.prefab * update platform, ragdoll, ocube prefabs * reformat file * reformat files * fix behavior name * add final retrained crawler and walker nn files * collect hip ocube rot in world space * update crawler observations and update prefabs * change to 20M steps * update crwl prefab to 142 observ * update obsvs to 241. add expvel reward * change walkspeed to 3 * add new crawler and walker nn files * adjust rewards * enable other pairs * add RewardManager * cleanup about to do final training * cleanup add nn files for increased facing rew reduced height rew * try no facing rew * add vel only policy, try dy target * inc torq on cube * added dynamic cube nn. gonna try 40M steps * add 40M step test, more cleanup * change back to 20M steps * Update WalkerStatic.unity * add no vel pen nn file * .005 head height rew * remove extra walker in scene * Update WalkerWithTargetPair.prefab * Update WalkerStatic.unity * more cleanup add new nn file with less head height reward * added Target and OCube controllers. updated crawler envs * update walker prefab * add refs to prefab * Update Crawler.prefab * update platform, ragdoll, ocube prefabs * reformat file * reformat files * fix behavior name * add final retrained crawler and walker nn files * collect hip ocube rot in world space * update crawler observations and update prefabs * change to 20M steps * update crwl prefab to 142 observ * update obsvs to 241. add expvel reward * change walkspeed to 3 * add new crawler and walker nn files * adjust rewards * enable other pairs * add RewardManager * cleanup about to do final training * cleanup add nn files for increased facing rew reduced height rew * try no facing rew * add vel only policy, try dy target * inc torq on cube * added dynamic cube nn. gonna try 40M steps * add 40M step test, more cleanup * change back to 20M steps * Update WalkerStatic.unity * add no vel pen nn file * .005 head height rew * remove extra walker in scene * Update WalkerWithTargetPair.prefab * Update WalkerStatic.unity * more cleanup add new nn file with less head height reward * cleanup * remove comment * more cleanup * correct format * Update ProjectVersion.txt * change to Log() * cleanup * use the starting y position instead of a hard coded height * test old fromtorot * add 236 model * testing new 236 nn files * add final walker nn files * cleanup * crawler cleanup * update crawler observ size * add final crawler nn files * fixed formatting ssues * [refactor] Remove BrainParameters from Python code (#4138) * Fix extension package tests (#4189) * Move Heuristic fixes to changelog bug section (#4177) * Fix a typo in Python-API.md (#4179) Fix behavior_spec to behavior_specs * Docs: note about required Windows Python x86-64 (#4060) * note about required Windows Python x86-64 Co-authored-by: Arthur Juliani <[email protected]> Co-authored-by: andrewcoh <[email protected]> * documentation touchups (#4099) * doc updates getting started page now uses consistent run-id re-order create-new docs to have less back/forth between unity and text editor * add link explaining decisions where we tell the reader to modify its parameter * Fix 3DBall PPO hard regression (#4133) * enforce warnings-as-errors, fix warning (#4191) * (case 1255312) Conditionally use different namespace for ScriptedImporters (#4188) * (case 1255312) Conditionally use different namespace for ScriptedImporters. * Add semi-colon. * Update barracuda dependency. * Update changelog. * Fix PR number. * Refactor of Curriculum and parameter sampling (#4160) * Introduced the Constant Parameter Sampler that will be useful later as samplers and floats can be used interchangeably * Refactored the settings.py to refect the new format of the config.yaml * First working version * Added the unit tests * Update to Upgrade for Updates * fixing the tests * Upgraded the config files * Fixes * Additional error catching * addressing some comments * Making the code nicer with cattr * Added and registered an unstructure hook for PrameterRandomization * Updating C# Walljump * Adding comments * Add test for settings export (#4164) * Add test for settings export * Update ml-agents/mlagents/trainers/tests/test_settings.py Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> * Including environment parameters for the test for settings export * First documentation update * Fixing a link * Updating changelog and migrating * adding some more tests for the conversion script * fixing bugs and using samplers in the walljump curriculum * Changing the format of the curriculum file as per discussion * Addressing comments * Update ml-agents/mlagents/trainers/settings.py Co-authored-by: Ervin T. <[email protected]> * Update docs/Migrating.md Co-authored-by: Chris Elion <[email protected]> * addressing comments Co-authored-by: Ervin T <[email protected]> Co-authored-by: Chris Elion <[email protected]> * [bug-fix] Make StatsReporter thread-safe (#4201) * Update changelog for release 4 (#4202) * don't allow --num-envs >1 with no --env (#4203) * don't allow --num-envs >1 with no --env * changelog * PR feedback * Add warning if behavior name not found in trainer config (#4204) Co-authored-by: Ervin T. <[email protected]> Co-authored-by: Chris Elion <[email protected]> * better logging for NaN rewards (#4205) * [MLA-1145] don't allow --num-envs >1 with no --env (#4209) * don't allow --num-envs >1 with no --env (#4203) * Convert checkpoints to .NN (#4127) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <[email protected]> * Update version for release 4 (master) (#4207) * Update version for release 4 * newline in json file * actually fix newline Co-authored-by: Chris Elion <[email protected]> * Update version for release 4 (release branch) (#4210) * Update versions for release 4 * Link validation file should ignore itself * Remove 'unreleased' section from changelog * Change to 0.18.0 for python versions * also update extensions package version Co-authored-by: Chris Elion <[email protected]> * [MLA-1141] Rigidbody and ArticulationBody sensors (#4192) * Update release table (#4221) * Add macOS Catalina notice to FAQ (#4222) * Add macOS Catalina notice to FAQ * Change wording and line breaks. * update com.unity.ml-agents.extensions to Apache 2.0 license (#4223) * update com.unity.ml-agents.extensions to Apache 2.0 license (#4223) (#4225) * Throw if Academy.EnvironmentStep() is called recursively (#4227) * speed up infinite loops * changelog * fix job deps (#4230) * use old yamato test config (#4231) * Run all package test types (#4232) * Revert "use old yamato test config" (#4233) * Revert "use old yamato test config (#4231)" This reverts commit e5e21dc. * Apply changes from #4232 * update document (#4237) small fix to documentation formatting * Update changelog for .nn checkpoints (#4240) Co-authored-by: sankalp04 <[email protected]> * Don't drop multiple stats from the same step (#4236) * add pyupgrade to pre-commit and run (#4239) * [MLA-427] make pyupgrade convert f-strings too (#4244) * make pyupgrade convert f-strings too * Run coverage checks with python3 (#4245) * Run code coverage for extensions package (#4243) * run code coverage for extensions package * reasonable coverage pct * fix artifactory url (#4246) * Refactor TFPolicy and Policy * don't try/except for control flow (#4251) * Update two docstring references to NNPolicy * Longer demos for ragdoll envs (#4247) * [docs] buffer_size parameter clarification (#4252) * [docs] buffer_size parameter clarification It was not fully clear that it has a different behavior for PPO and SAC. The docs update should improve the understanding. * [docs] updated buffer_size parameter clarification Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> * Remove un-needed check * Remove irrelevant tests * Address feedback * [MLA-1172] Reduce calls to training_behaviors (#4259) * Remove unnecessary line (#4260) * [MLA-1138] joint observations (#4224) * Update to latest master * Refactor TFPolicy and Policy (#4254) * Refactor TFPolicy and Policy * Move TF-specific files to tf/ folder * Move EncoderType and ScheduleType to settings.py * Move Torch files to separate folder * Update imports to keep Torch working * [bugfix] summary writer no longer crashes if Hyperparameters could not be written (#4265) * Bug fix, returnning an empty string in case of error breaks the summary writter * addressing comments * [refactor] Move TF-specific files to tf/ folder (#4266) * Break up models.py into separate files * Use network_settings for configuring networks * [refactor] Make classes except Optimizer framework agnostic (#4268) * Fixing tensorboard command line params (#4262) * Update Using-Tensorboard.md "--logdir=results" is broken in newer versions of tensor board; "logdir results" without the equal sign works. See tensorflow/tensorboard#686 * Removing equal sign from tensorboard command line params in docs Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Pulkit Midha <[email protected]> Co-authored-by: andrewcoh <[email protected]> Co-authored-by: Furkan Çelik <[email protected]> Co-authored-by: Yuan Gao <[email protected]> Co-authored-by: Chris Elion <[email protected]> Co-authored-by: Anupam Bhatnagar <[email protected]> Co-authored-by: Jeffrey Shih <[email protected]> Co-authored-by: Christian Coenen <[email protected]> Co-authored-by: Florian Pöhler <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Hunter-Unity <[email protected]> Co-authored-by: yongjun823 <[email protected]> Co-authored-by: Stefano Cecere <[email protected]> Co-authored-by: Arthur Juliani <[email protected]> Co-authored-by: Tom Thompson <[email protected]> Co-authored-by: Chris Goy <[email protected]> Co-authored-by: sankalp04 <[email protected]> Co-authored-by: Jonathan Harper <[email protected]> Co-authored-by: Ruo-Ping (Rachel) Dong <[email protected]> Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Nancy Iskander <[email protected]>
* Begin porting work * Add ResNet and distributions * Dynamically construct actor and critic * Initial optimizer port * Refactoring policy and optimizer * Resolving a few bugs * Share more code between tf and torch policies * Slightly closer to running model * Training runs, but doesn’t actually work * Fix a couple additional bugs * Add conditional sigma for distribution * Fix normalization * Support discrete actions as well * Continuous and discrete now train * Mulkti-discrete now working * Visual observations now train as well * GRU in-progress and dynamic cnns * Fix for memories * Remove unused arg * Combine actor and critic classes. Initial export. * Support tf and pytorch alongside one another * Prepare model for onnx export * Use LSTM and fix a few merge errors * Fix bug in probs calculation * Optimize np -> tensor operations * Time action sample function * Small performance improvement during inference * ONNX exporting * Fix some issues with pdf * Fix bug in pdf function * Fix ResNet * Remove double setting * Fix for discrete actions (#4181) * Fix discrete actions and GridWorld * Remove print statement * Convert List[np.ndarray] to np.ndarray before using torch.as_tensor (#4183) Big speedup in visual obs * Develop add fire exp framework (#4213) * Experiment branch for comparing torch * Updates and merging ervin changes * improvements on experiment_torch.py * Better printing of results * preliminary gpu experiment * Testing gpu * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * _ * _ * _ * _ * _ * _ * _ * _ * Attempt at gpu on tf. Does not work * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * Fixing learn.py * reformating experiment_torch.py * Pytorch port of SAC (#4219) * Update add-fire to latest master, including Policy refactor (#4263) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155) * handle empty and invalid tags * don't compare null or empty tags * changelog * avoid console spam when editing tag name * [docs] Fix a typo in a link in the package docs (#4161) * update FAQ to include disabling graphics (#4159) * update FAQ and exception message * remove colab link * [MLA-1009] observable performance tests (#4031) * WIP perf tests * WIP perf test * add marker tests too * move to devproject * yamato first pass * chmod * fix trigger, fix meta files * fix utr command * fix artifact paths * Update com.unity.ml-agents-performance.yml * test properties, reduce some noise * timer around RequestDecision * actually set ObservableAttributeHandling * undo asmdef changes * Yamato inference tests (#4066) * better errors for missing constants * run inference in yamato after training * add extension * debug subprocess args * fix exe path * search for executable * fix dumb bug * -batchmode * fail if inference fails * install tf2onnx on yamato * allow onnx for overrides (expect to fail now) * enable logs * fix commandline arg * catch exception from SetModel and exit * cleanup error message * model artifacts, logs as artifacts, fix pip * don't run onnx * cleanup and comment * update extension handling * Update CONTRIBUTING.md (#4170) we removed contributions welcome. * FoodCollectorAgent - don't convert bool to int (#4169) * [docs] Corrected num_layers default value (#4167) * [docs] Fix table formatting (#4168) * [CI] Better hyperparameters for Pyramids-SAC, WalkerStatic-SAC, and Reacher-PPO (#4154) * [bug-fix] Initialize-from being incorrectly loaded as "None" rather than None (#4175) * Modified the documentation of the Heuristic method (default action = previous action) (#4174) * Modifying the documentation to explain that Heuristic method default action will be the previous action decided by the heuristic. Changing this behavior would be a breking change. * Rephrase the working of the documentation of the default action of the Heuristic method * Forgot an import * [MLA-240] Physic-based pose generation (#4166) * hierarchy POC * WIP * abstract class * clean up init * separate files * cleanup, unit test * add Articulation util * sensor WIP * ArticulationBody sensor, starting docs * docstrings, cleanup * hierarchy tests, transform operators * unit tests * use Pose struct instead * delete QTTransform * rename * renames and compile fixes * remove ArticulationBodySensor* for now * revert CrawlerAgent changes * rename * Add TargetController/OrientationCubeController Components & Bugfix (#4157) * added Target and OCube controllers. updated crawler envs * update walker prefab * add refs to prefab * Update Crawler.prefab * update platform, ragdoll, ocube prefabs * reformat file * reformat files * fix behavior name * add final retrained crawler and walker nn files * collect hip ocube rot in world space * update crawler observations and update prefabs * change to 20M steps * update crwl prefab to 142 observ * update obsvs to 241. add expvel reward * change walkspeed to 3 * add new crawler and walker nn files * adjust rewards * enable other pairs * add RewardManager * cleanup about to do final training * cleanup add nn files for increased facing rew reduced height rew * try no facing rew * add vel only policy, try dy target * inc torq on cube * added dynamic cube nn. gonna try 40M steps * add 40M step test, more cleanup * change back to 20M steps * Update WalkerStatic.unity * add no vel pen nn file * .005 head height rew * remove extra walker in scene * Update WalkerWithTargetPair.prefab * Update WalkerStatic.unity * more cleanup add new nn file with less head height reward * added Target and OCube controllers. updated crawler envs * update walker prefab * add refs to prefab * Update Crawler.prefab * update platform, ragdoll, ocube prefabs * reformat file * reformat files * fix behavior name * add final retrained crawler and walker nn files * collect hip ocube rot in world space * update crawler observations and update prefabs * change to 20M steps * update crwl prefab to 142 observ * update obsvs to 241. add expvel reward * change walkspeed to 3 * add new crawler and walker nn files * adjust rewards * enable other pairs * add RewardManager * cleanup about to do final training * cleanup add nn files for increased facing rew reduced height rew * try no facing rew * add vel only policy, try dy target * inc torq on cube * added dynamic cube nn. gonna try 40M steps * add 40M step test, more cleanup * change back to 20M steps * Update WalkerStatic.unity * add no vel pen nn file * .005 head height rew * remove extra walker in scene * Update WalkerWithTargetPair.prefab * Update WalkerStatic.unity * more cleanup add new nn file with less head height reward * cleanup * remove comment * more cleanup * correct format * Update ProjectVersion.txt * change to Log() * cleanup * use the starting y position instead of a hard coded height * test old fromtorot * add 236 model * testing new 236 nn files * add final walker nn files * cleanup * crawler cleanup * update crawler observ size * add final crawler nn files * fixed formatting ssues * [refactor] Remove BrainParameters from Python code (#4138) * Fix extension package tests (#4189) * Move Heuristic fixes to changelog bug section (#4177) * Fix a typo in Python-API.md (#4179) Fix behavior_spec to behavior_specs * Docs: note about required Windows Python x86-64 (#4060) * note about required Windows Python x86-64 Co-authored-by: Arthur Juliani <[email protected]> Co-authored-by: andrewcoh <[email protected]> * documentation touchups (#4099) * doc updates getting started page now uses consistent run-id re-order create-new docs to have less back/forth between unity and text editor * add link explaining decisions where we tell the reader to modify its parameter * Fix 3DBall PPO hard regression (#4133) * enforce warnings-as-errors, fix warning (#4191) * (case 1255312) Conditionally use different namespace for ScriptedImporters (#4188) * (case 1255312) Conditionally use different namespace for ScriptedImporters. * Add semi-colon. * Update barracuda dependency. * Update changelog. * Fix PR number. * Refactor of Curriculum and parameter sampling (#4160) * Introduced the Constant Parameter Sampler that will be useful later as samplers and floats can be used interchangeably * Refactored the settings.py to refect the new format of the config.yaml * First working version * Added the unit tests * Update to Upgrade for Updates * fixing the tests * Upgraded the config files * Fixes * Additional error catching * addressing some comments * Making the code nicer with cattr * Added and registered an unstructure hook for PrameterRandomization * Updating C# Walljump * Adding comments * Add test for settings export (#4164) * Add test for settings export * Update ml-agents/mlagents/trainers/tests/test_settings.py Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> * Including environment parameters for the test for settings export * First documentation update * Fixing a link * Updating changelog and migrating * adding some more tests for the conversion script * fixing bugs and using samplers in the walljump curriculum * Changing the format of the curriculum file as per discussion * Addressing comments * Update ml-agents/mlagents/trainers/settings.py Co-authored-by: Ervin T. <[email protected]> * Update docs/Migrating.md Co-authored-by: Chris Elion <[email protected]> * addressing comments Co-authored-by: Ervin T <[email protected]> Co-authored-by: Chris Elion <[email protected]> * [bug-fix] Make StatsReporter thread-safe (#4201) * Update changelog for release 4 (#4202) * don't allow --num-envs >1 with no --env (#4203) * don't allow --num-envs >1 with no --env * changelog * PR feedback * Add warning if behavior name not found in trainer config (#4204) Co-authored-by: Ervin T. <[email protected]> Co-authored-by: Chris Elion <[email protected]> * better logging for NaN rewards (#4205) * [MLA-1145] don't allow --num-envs >1 with no --env (#4209) * don't allow --num-envs >1 with no --env (#4203) * Convert checkpoints to .NN (#4127) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <[email protected]> * Update version for release 4 (master) (#4207) * Update version for release 4 * newline in json file * actually fix newline Co-authored-by: Chris Elion <[email protected]> * Update version for release 4 (release branch) (#4210) * Update versions for release 4 * Link validation file should ignore itself * Remove 'unreleased' section from changelog * Change to 0.18.0 for python versions * also update extensions package version Co-authored-by: Chris Elion <[email protected]> * [MLA-1141] Rigidbody and ArticulationBody sensors (#4192) * Update release table (#4221) * Add macOS Catalina notice to FAQ (#4222) * Add macOS Catalina notice to FAQ * Change wording and line breaks. * update com.unity.ml-agents.extensions to Apache 2.0 license (#4223) * update com.unity.ml-agents.extensions to Apache 2.0 license (#4223) (#4225) * Throw if Academy.EnvironmentStep() is called recursively (#4227) * speed up infinite loops * changelog * fix job deps (#4230) * use old yamato test config (#4231) * Run all package test types (#4232) * Revert "use old yamato test config" (#4233) * Revert "use old yamato test config (#4231)" This reverts commit e5e21dc. * Apply changes from #4232 * update document (#4237) small fix to documentation formatting * Update changelog for .nn checkpoints (#4240) Co-authored-by: sankalp04 <[email protected]> * Don't drop multiple stats from the same step (#4236) * add pyupgrade to pre-commit and run (#4239) * [MLA-427] make pyupgrade convert f-strings too (#4244) * make pyupgrade convert f-strings too * Run coverage checks with python3 (#4245) * Run code coverage for extensions package (#4243) * run code coverage for extensions package * reasonable coverage pct * fix artifactory url (#4246) * Refactor TFPolicy and Policy * don't try/except for control flow (#4251) * Update two docstring references to NNPolicy * Longer demos for ragdoll envs (#4247) * [docs] buffer_size parameter clarification (#4252) * [docs] buffer_size parameter clarification It was not fully clear that it has a different behavior for PPO and SAC. The docs update should improve the understanding. * [docs] updated buffer_size parameter clarification Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> * Remove un-needed check * Remove irrelevant tests * Address feedback * [MLA-1172] Reduce calls to training_behaviors (#4259) * Remove unnecessary line (#4260) * [MLA-1138] joint observations (#4224) * Update to latest master * Refactor TFPolicy and Policy (#4254) * Refactor TFPolicy and Policy * Move TF-specific files to tf/ folder * Move EncoderType and ScheduleType to settings.py * Move Torch files to separate folder * Update imports to keep Torch working * [bugfix] summary writer no longer crashes if Hyperparameters could not be written (#4265) * Bug fix, returnning an empty string in case of error breaks the summary writter * addressing comments * [refactor] Move TF-specific files to tf/ folder (#4266) * Break up models.py into separate files * Use network_settings for configuring networks * [refactor] Make classes except Optimizer framework agnostic (#4268) * Fixing tensorboard command line params (#4262) * Update Using-Tensorboard.md "--logdir=results" is broken in newer versions of tensor board; "logdir results" without the equal sign works. See tensorflow/tensorboard#686 * Removing equal sign from tensorboard command line params in docs Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Pulkit Midha <[email protected]> Co-authored-by: andrewcoh <[email protected]> Co-authored-by: Furkan Çelik <[email protected]> Co-authored-by: Yuan Gao <[email protected]> Co-authored-by: Chris Elion <[email protected]> Co-authored-by: Anupam Bhatnagar <[email protected]> Co-authored-by: Jeffrey Shih <[email protected]> Co-authored-by: Christian Coenen <[email protected]> Co-authored-by: Florian Pöhler <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Hunter-Unity <[email protected]> Co-authored-by: yongjun823 <[email protected]> Co-authored-by: Stefano Cecere <[email protected]> Co-authored-by: Arthur Juliani <[email protected]> Co-authored-by: Tom Thompson <[email protected]> Co-authored-by: Chris Goy <[email protected]> Co-authored-by: sankalp04 <[email protected]> Co-authored-by: Jonathan Harper <[email protected]> Co-authored-by: Ruo-Ping (Rachel) Dong <[email protected]> Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Nancy Iskander <[email protected]> * [refactor] Refactor normalizers and encoders (#4275) * Refactor normalizers and encoders * Unify Critic and ValueNetwork * Rename ActionVectorEncoder * Update docstring of create_encoders * Add docstring to UnnormalizedInputEncoder * fix onnx save path and output_name * add Saver class (only TF working) * fix pytorch checkpointing. add tensors in Normalizer as parameter * remove print * move tf and add torch model serialization * remove * remove unused * add sac checkpoint * small improvements * small improvements * remove print * move checkpoint_path logic to saver * [refactor] Refactor Actor and Critic classes (#4287) * fix onnx input * fix formatting and test * [bug-fix] Fix non-LSTM SeparateActorCritic (#4306) * small improvements * small improvement * [bug-fix] Fix error with discrete probs (#4309) * [tests] Add tests for core PyTorch files (#4292) * [feature] Fix TF tests, add --torch CLI option, allow run TF without torch installed (#4305) * Test fixes on add-fire (#4317) * fix tests * Add components directory and init (#4320) * [add-fire] Halve Gaussian entropy (#4319) * Halve entropy * Fix utils test * [add-fire] Add learning rate and beta/epsilon decay to PyTorch (#4318) * Added Reward Providers for Torch (#4280) * Added Reward Providers for Torch * Use NetworkBody to encode state in the reward providers * Integrating the reward prodiders with ppo and torch * work in progress, integration with PPO. Not training properly Pyramids at the moment * Integration in PPO * Removing duplicate file * Gail and Curiosity working * addressing comments * Enfore float32 for tests * enfore np.float32 in buffer * Fix discrete export (#4322) Fix discrete export * [add-fire] Fix CategoricalDistInstance test and replace `range` with `arange` (#4327) * Develop add fire layers (#4321) * Layer initialization + swish as a layer * integrating with the existing layers * fixing tests * setting the seed for a test * Using swish and fixing tests * fixing typo * [add-fire] Merge post-0.19.0 master into add-fire (#4328) * Revert "[add-fire] Merge post-0.19.0 master into add-fire (#4328)" (#4330) This reverts commit 9913e71. * More comments and Made ResNetBlock (#4329) * update saver interface and add tests * update * Fixed the reporting of the discriminator loss (#4348) * Fixed the reporting of the discriminator loss * Update ml-agents/mlagents/trainers/torch/components/reward_providers/gail_reward_provider.py * fixing pre-commit test * Fix ONNX import for continuous * fix export input names * Behavioral Cloning Pytorch (#4293) * fix export input name * [add-fire] Add LSTM to SAC, LSTM fixes and initializations (#4324) * add comments * fix bc tests * change brain_name to behavior_name * reverting Project settings * [add-fire] Fix masked mean for 2d tensors (#4364) * Removing the experiment script from add fire (#4373) * Removing the experiment script * Removing the script * [add-fire] Add tests and fix issues with Policy (#4372) * Pytorch ghost trainer (#4370) * add test_simple_rl tests to torch * revert tests * Fix of the test for multi visual input * Make reset block submodule * fix export input_name * [add-fire] Memory class abstraction (#4375) * make visual input channel first for export * Don't use torch.split in LSTM * Add fire to test_simple_rl.py (#4378) Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Ervin T <[email protected]> * reverting unity_to_external_pb2_grpc.py * remove duplicate of curr documentation * Revert "remove duplicate of curr documentation" This reverts commit 3d7b809. * remove duplicated curriculum doc (#4386) * Fixed discrete models * Always export one Action tensor (#4388) * [add-fire] Revert unneeded changes back to master (#4389) * add comment * fix test * add fire clean up docstrings in create policies (#4391) * [add-fire] Update changelog (#4397) Co-authored-by: Arthur Juliani <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> Co-authored-by: Pulkit Midha <[email protected]> Co-authored-by: andrewcoh <[email protected]> Co-authored-by: Furkan Çelik <[email protected]> Co-authored-by: Yuan Gao <[email protected]> Co-authored-by: Chris Elion <[email protected]> Co-authored-by: Anupam Bhatnagar <[email protected]> Co-authored-by: Jeffrey Shih <[email protected]> Co-authored-by: Christian Coenen <[email protected]> Co-authored-by: Florian Pöhler <[email protected]> Co-authored-by: Hunter-Unity <[email protected]> Co-authored-by: yongjun823 <[email protected]> Co-authored-by: Stefano Cecere <[email protected]> Co-authored-by: Tom Thompson <[email protected]> Co-authored-by: Chris Goy <[email protected]> Co-authored-by: sankalp04 <[email protected]> Co-authored-by: Jonathan Harper <[email protected]> Co-authored-by: Ruo-Ping (Rachel) Dong <[email protected]> Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Nancy Iskander <[email protected]> Co-authored-by: Ruo-Ping Dong <[email protected]> Co-authored-by: Andrew Cohen <[email protected]>
Proposed change(s)
So far:
TODO :
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Refactor of Curriculum and parameter sampling as described in this document
Types of change(s)
Checklist
Other comments