couple of improvements to MPO #919

HenriDeh · 2023-07-04T13:34:02Z

This simply

modifies slightly the math of vec_to_tril to create more stable covariance matrices
make the upperbound of the dual adaptive to let it be higher when rewards (and thus Q-values) have higher magnitudes.

codecov · 2023-07-04T13:35:55Z

Codecov Report

Merging #919 (e5f102f) into main (b54a0b0) will increase coverage by 0.00%.
The diff coverage is 93.33%.

@@           Coverage Diff           @@
##             main     #919   +/-   ##
=======================================
  Coverage   24.27%   24.28%           
=======================================
  Files         221      221           
  Lines        7739     7741    +2     
=======================================
+ Hits         1879     1880    +1     
- Misses       5860     5861    +1

Impacted Files	Coverage Δ
...c/ReinforcementLearningCore/test/utils/networks.jl	`46.42% <ø> (ø)`
...tLearningZoo/src/algorithms/policy_gradient/mpo.jl	`0.00% <0.00%> (ø)`
...rc/ReinforcementLearningCore/src/utils/networks.jl	`75.39% <100.00%> (+0.39%)`	⬆️

... and 1 file with indirect coverage changes

HenriDeh · 2023-07-07T09:31:33Z

@jeremiahpslewis can you validate please ?

src/ReinforcementLearningCore/src/utils/networks.jl

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/mpo.jl

jeremiahpslewis · 2023-07-08T17:03:39Z

@HenriDeh I looked at this PR and did my best to review it based on things I felt I can contribute. In terms of computational correctness, I don't really have a clue for these functions, but I would suggest that you add unit tests for the functions (if only so that if future changes to the code lead to different results, we can investigate and learn whether we've broken something or it was broken all along...

jeremiahpslewis

see above

HenriDeh · 2023-07-12T08:50:45Z

All tests are passing on my machine. The failing test in e004bef seems to be a bug (a timer timed 4x10^233), quite impossible. I couldn't reproduce on my machine and it is unrelated to the changes of the PR. Hopefully this does not happen again in the CI job.

jeremiahpslewis

Looks awesome!

couple of improvements

9ebbbfa

HenriDeh requested a review from jeremiahpslewis July 4, 2023 13:34

Merge branch 'main' into mpo-imp

67bcfc4

HenriDeh enabled auto-merge (squash) July 4, 2023 15:29

jeremiahpslewis reviewed Jul 8, 2023

View reviewed changes

src/ReinforcementLearningCore/src/utils/networks.jl Outdated Show resolved Hide resolved

jeremiahpslewis reviewed Jul 8, 2023

View reviewed changes

src/ReinforcementLearningCore/src/utils/networks.jl Show resolved Hide resolved

jeremiahpslewis reviewed Jul 8, 2023

View reviewed changes

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/mpo.jl Show resolved Hide resolved

jeremiahpslewis requested changes Jul 8, 2023

View reviewed changes

HenriDeh added 2 commits July 11, 2023 17:29

decouple vec_to_tril functions

e004bef

add tests

e5f102f

HenriDeh requested a review from jeremiahpslewis July 12, 2023 08:52

jeremiahpslewis approved these changes Jul 12, 2023

View reviewed changes

HenriDeh merged commit 3182026 into main Jul 12, 2023

HenriDeh deleted the mpo-imp branch July 12, 2023 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

couple of improvements to MPO #919

couple of improvements to MPO #919

Uh oh!

HenriDeh commented Jul 4, 2023

Uh oh!

codecov bot commented Jul 4, 2023 •

edited

Loading

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiahpslewis commented Jul 8, 2023

Uh oh!

jeremiahpslewis left a comment

Uh oh!

HenriDeh commented Jul 12, 2023

Uh oh!

jeremiahpslewis left a comment

Uh oh!

Uh oh!

Uh oh!

couple of improvements to MPO #919

couple of improvements to MPO #919

Uh oh!

Conversation

HenriDeh commented Jul 4, 2023

Uh oh!

codecov bot commented Jul 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiahpslewis commented Jul 8, 2023

Uh oh!

jeremiahpslewis left a comment

Choose a reason for hiding this comment

Uh oh!

HenriDeh commented Jul 12, 2023

Uh oh!

jeremiahpslewis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jul 4, 2023 •

edited

Loading