-
Notifications
You must be signed in to change notification settings - Fork 9
Sumtree sampling #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sumtree sampling #60
Conversation
Looks good! Can you add a test to this? |
Tests have been added. Feel free to change the tolerances/number of iterations... in case they take too long though. The first test checks that priority zero is never sampled; the second test checks that the pdf of samples is what we would expect. The latter however requires many samples so I've added some multithreading to speed it up. Both tests are run with 100 different seeds for the rng. |
Sorry for the confusion with the tests. Should be good to go now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test! One more question and a couple of minor details
Codecov Report
@@ Coverage Diff @@
## main #60 +/- ##
==========================================
+ Coverage 73.21% 73.54% +0.32%
==========================================
Files 15 15
Lines 743 756 +13
==========================================
+ Hits 544 556 +12
- Misses 199 200 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@CasBex I'll let you merge in case you want to make a last minute change. |
I don't have permissions to merge @HenriDeh. Could you merge? |
For all the lofty talk about numerical rounding errors in #59, they are unavoidable even with the improved method. This fix simply checks whether the sampled priority happens to be zero, and if so it walks backwards over the leafs until it finds a nonzero priority node. If the backwards walk has not found anything, it performs a forward walk instead.
This has been tested against the
JuliaRL_PrioritizedDQN_CartPole
experiment in ReinforcementLearningExperiments.jl with 30 different seeds.