Skip to content

couple of improvements to MPO #919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 12, 2023
Merged

couple of improvements to MPO #919

merged 4 commits into from
Jul 12, 2023

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Jul 4, 2023

This simply

  • modifies slightly the math of vec_to_tril to create more stable covariance matrices
  • make the upperbound of the dual adaptive to let it be higher when rewards (and thus Q-values) have higher magnitudes.

@HenriDeh HenriDeh requested a review from jeremiahpslewis July 4, 2023 13:34
@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Merging #919 (e5f102f) into main (b54a0b0) will increase coverage by 0.00%.
The diff coverage is 93.33%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #919   +/-   ##
=======================================
  Coverage   24.27%   24.28%           
=======================================
  Files         221      221           
  Lines        7739     7741    +2     
=======================================
+ Hits         1879     1880    +1     
- Misses       5860     5861    +1     
Impacted Files Coverage Δ
...c/ReinforcementLearningCore/test/utils/networks.jl 46.42% <ø> (ø)
...tLearningZoo/src/algorithms/policy_gradient/mpo.jl 0.00% <0.00%> (ø)
...rc/ReinforcementLearningCore/src/utils/networks.jl 75.39% <100.00%> (+0.39%) ⬆️

... and 1 file with indirect coverage changes

@HenriDeh HenriDeh enabled auto-merge (squash) July 4, 2023 15:29
@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 7, 2023

@jeremiahpslewis can you validate please ?

@jeremiahpslewis
Copy link
Member

@HenriDeh I looked at this PR and did my best to review it based on things I felt I can contribute. In terms of computational correctness, I don't really have a clue for these functions, but I would suggest that you add unit tests for the functions (if only so that if future changes to the code lead to different results, we can investigate and learn whether we've broken something or it was broken all along...

Copy link
Member

@jeremiahpslewis jeremiahpslewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@HenriDeh
Copy link
Member Author

All tests are passing on my machine. The failing test in e004bef seems to be a bug (a timer timed 4x10^233), quite impossible. I couldn't reproduce on my machine and it is unrelated to the changes of the PR. Hopefully this does not happen again in the CI job.

@HenriDeh HenriDeh requested a review from jeremiahpslewis July 12, 2023 08:52
Copy link
Member

@jeremiahpslewis jeremiahpslewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

@HenriDeh HenriDeh merged commit 3182026 into main Jul 12, 2023
@HenriDeh HenriDeh deleted the mpo-imp branch July 12, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants