[versioning] Save ML-Agents version in checkpoints and check on load #4035

ervteng · 2020-05-28T18:31:49Z

Proposed change(s)

This PR saves the semantic version of the trainer package (major, minor, and patch) as 3 variables in the TF graph. It is also exported into the .nn file. This lets us check whether a checkpoint is being loaded from the same version of ML-Agents, and (in the future) check which version of ML-Agents created a particular NN file.

Note that this is different than the existing version number in the NN, which is checked by C#. That number corresponds to the input and output tensors. It is possible, when upgrading the Trainer code, for an NN file to remain compatible with C# but not be loadable into Python for training (e.g. if the network architecture changes).

Currently, we throw a warning if the versions don't match when a user tries to load a checkpoint.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

chriselion · 2020-05-28T20:36:30Z

ml-agents/mlagents/trainers/policy/tf_policy.py



 logger = get_logger(__name__)


+# This is the version number of the inputs and outputs of the model, and
+# determines compatibility with inference in Barracuda.
+API_VERSION_NUMBER = 2


MODEL_FORMAT_VERSION_NUMBER? "API" doesn't really feel right here.

chriselion · 2020-05-28T20:41:39Z

ml-agents/mlagents/trainers/policy/tf_policy.py

+        :param version_string: The semantic-versioned version string (X.Y.Z).
+        :return: A Tuple containing (major_ver, minor_ver, patch_ver).
+        """
+        split_ver = version_string.split(".")[0:3]  # Remove dev tag


I think you can use distutils.version.LooseVersion to simplify this:

>>> from distutils.version import LooseVersion >>> v = LooseVersion("1.2.3.dev4") >>> v LooseVersion ('1.2.3.dev4') >>> v.version [1, 2, 3, 'dev', 4] >>> v.version[0:3] [1, 2, 3]

Oh, that is very handy! Updated to use the LooseVersion class.

Ervin Teng added 3 commits May 26, 2020 18:45

Save and check ML-Agents version in TF checkpoint

626d4ca

More testable and test for version check

b3703af

Add back in the model API version for C# check

ae27820

ervteng requested review from chriselion and andrewcoh May 28, 2020 18:31

Update changelog

49e9f30

chriselion reviewed May 28, 2020

View reviewed changes

chriselion approved these changes May 28, 2020

View reviewed changes

Ervin Teng added 8 commits May 28, 2020 13:56

Rename constant and simplify convert string

fb4ce95

Adjust SAC recurrent test

7de667b

Fix some more simple RL tests

5f833be

Fix 2d PPO and recurrent SAC

fabf9dc

More simple RL tweaks

06fd8ba

Change order of varaibles

4acc75a

Make tests a bit easier

791b148

Widen recurrent SAC test

2e7b0fe

ervteng merged commit 2b7b6e8 into master May 30, 2020

delete-merged-branch bot deleted the develop-checktfver branch May 30, 2020 00:55

github-actions bot locked as resolved and limited conversation to collaborators May 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[versioning] Save ML-Agents version in checkpoints and check on load #4035

[versioning] Save ML-Agents version in checkpoints and check on load #4035

Uh oh!

ervteng commented May 28, 2020

Uh oh!

chriselion May 28, 2020

Uh oh!

chriselion May 28, 2020

Uh oh!

ervteng May 28, 2020

Uh oh!

Uh oh!

[versioning] Save ML-Agents version in checkpoints and check on load #4035

[versioning] Save ML-Agents version in checkpoints and check on load #4035

Uh oh!

Conversation

ervteng commented May 28, 2020

Proposed change(s)

Types of change(s)

Checklist

Uh oh!

chriselion May 28, 2020

Choose a reason for hiding this comment

Uh oh!

chriselion May 28, 2020

Choose a reason for hiding this comment

Uh oh!

ervteng May 28, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!