[refactor] Move output files to a common results/ folder #3829

ervteng · 2020-04-23T01:59:25Z

Proposed change(s)

This PR changes the artifact output directory to this structure:

results/ 
    {run_id}/
        configuration.yaml - the configuration (after CLI overrides) used for this run.
        {behavior_name}/
            Tensorboard files
            snapshot.ckpt (and .pb) - model checkpoints and intermediate output files
        {behavior_name}.csv - CSV log
        {behavior_name}.nn - Barracuda Output
    run_logs/
        timers.json - timers file

In addition, we now output the RunOptions object as a YAML file configuration.yaml. When combined with PR #3815, the resulting YAML can then be used with mlagents-learn to re-run the same run again.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

https://docs.google.com/document/d/1uyk5JVNevfWy2DmqVHy19lcDzyQCNppPxkPOgJcQ7z4/edit#heading=h.5cr9b5kpfi98

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

fix incorrect configuration.yaml location Improve tests

ervteng · 2020-04-23T03:42:21Z

@anupambhatnagar - Added you to the review for visibility as it will require a corresponding change to the CI pipeline.

harperj · 2020-04-23T19:34:37Z

@ervteng you're right that this will break the daily CI runs. I think there are 3 places in the CI we'll need to update:

results reporter
script for updating NN files
the runner script's barracuda inference logic

One of us could take a stab if that would be useful.

Unrelatedly, I think the structure you listed in the PR description is a little off -- all files should be nested under the run-id folder. I'd also suggest we might just get rid of the run_logs folder and move those files into the general run-id folder.

ervteng · 2020-04-23T21:46:21Z

@harperj I've actually already made the CI changes in this PR: https://github.com/Unity-Technologies/ml-agents-cloud/pull/105 but definitely it needs a looksee from one of you to make sure I didn't miss anything. Seems to work when I tested it.

I think getting rid of the run-log (summaries) in the design doc could work, as there will only be two files in there (the timers.json and eventually the Player.Log). The only time it will make sense to have a separate run-log is when running many environments, as each will have their own Player.log (so Player-1.log, Player-2.log, etc.) and clutter up the {run-id} directory.

ml-agents/mlagents/trainers/trainer_controller.py

ml-agents/mlagents/trainers/learn.py

ml-agents/mlagents/trainers/trainer/trainer.py

ml-agents/mlagents/trainers/trainer_util.py

Since the default Player.log path would be overwritten on subsequent runs, we should keep the Unity Player logs in the results folder for a training run. This change uses the -logFile CLI option to the Unity Player to set the path.

Ervin Teng added 6 commits April 22, 2020 17:23

Move artifacts to results/ directory

e9d34cc

Fix tests

6867d38

Fix more tests

187a284

Move location of timers json as well

1a7bb81

Write timers to timer.json

8a71997

Write out run_options

e099894

ervteng requested review from andrewcoh and harperj April 23, 2020 01:59

Ervin Teng added 7 commits April 22, 2020 20:30

Upgrade YAML library

3de64da

fix incorrect configuration.yaml location Improve tests

Update docs

44aa3b5

Update Yamato

d5836be

Update .gitignore

c20e603

Add comments to gitignore

6ab3ac8

Update tensorboard docs

e2ee198

Update changelog and migrating

c6b019d

ervteng marked this pull request as ready for review April 23, 2020 03:40

ervteng requested a review from anupam-142857 April 23, 2020 03:40

Ervin Teng added 2 commits April 22, 2020 21:18

Use try-catch instead of requiring YAML 5.1

bf843cc

Fix issue with output directory check

0b7146e

harperj reviewed Apr 24, 2020

View reviewed changes

Ervin Teng and others added 5 commits April 27, 2020 18:37

Clean up some pathing

48ee399

Clean up comment

2fc8b06

Clean up pathing in trainer_util

7a8f8fc

Add Player.log to results folder (#3877)

f73f0d3

Since the default Player.log path would be overwritten on subsequent runs, we should keep the Unity Player logs in the results folder for a training run. This change uses the -logFile CLI option to the Unity Player to set the path.

Merge branch 'master' into develop-results-folder

98e86cf

harperj approved these changes Apr 29, 2020

View reviewed changes

Merge branch 'master' into develop-results-folder

dac1e9a

ervteng merged commit 3fee8b9 into master Apr 29, 2020

delete-merged-branch bot deleted the develop-results-folder branch April 29, 2020 22:27

github-actions bot locked as resolved and limited conversation to collaborators May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] Move output files to a common results/ folder #3829

[refactor] Move output files to a common results/ folder #3829

Uh oh!

ervteng commented Apr 23, 2020 •

edited

Loading

Uh oh!

ervteng commented Apr 23, 2020

Uh oh!

harperj commented Apr 23, 2020

Uh oh!

ervteng commented Apr 23, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[refactor] Move output files to a common results/ folder #3829

[refactor] Move output files to a common results/ folder #3829

Uh oh!

Conversation

ervteng commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

ervteng commented Apr 23, 2020

Uh oh!

harperj commented Apr 23, 2020

Uh oh!

ervteng commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ervteng commented Apr 23, 2020 •

edited

Loading

ervteng commented Apr 23, 2020 •

edited

Loading