-
Notifications
You must be signed in to change notification settings - Fork 9
adding an EpisodesBuffer #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #43 +/- ##
==========================================
+ Coverage 69.94% 70.63% +0.68%
==========================================
Files 13 15 +2
Lines 579 630 +51
==========================================
+ Hits 405 445 +40
- Misses 174 185 +11
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I changed the approach. Regarding sampleable steps, the EB now simply keeps a bit array with 1s where steps are valid samples, 0s otherwise. This array is then used as weights with the weighted sampling functionalities of StatsBase, basically valid steps all have weight 1 and invalid steps weight 0. This certainly adds overhead to sampling but I'm not expecting it to be significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like a really elegant solution to a tricky problem. Just one comment about namespaces, above.
Latest commits include changes that were needed to work with the new run loop. These are
|
All is ready I think. Can you review this a second time ? Then I'll merge and release 0.2 in order to then work on RL.jl to integrate this. |
I would fix the naming now so it's accurate, given that we haven't had concerns about other breaking changes in the past (clearly we'll need to change the version number of downstream projects once we incorporate 0.2), but if all we need to do on the Experiments side is a find and replace, we'll thank ourselves later that we have an 'honest' / 'clear' API... ;) |
Fair enough. I applied that change. I'll merge as soon as tests are finished. Thanks for your comments. |
@JuliaRegistrator register |
Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue. |
This PR adds an EpisodesBuffer.
Why is it useful/necessary? First there is the issue raised in #42 that shows how necessary it is to track episodes. Second, this also allows the use of multi-step batch sampling non-episodic environments that do not stop at a terminal state (they don't have any); or the early stopping of any environment due to say a limited number of steps. Currently multisteps are truncated using the terminal flag.
This PR is not over though. It is just a concept implementation of a wrapper to Traces (EpisodesBuffer) that will store a queue of Episode objects. An Episode is just the start and end indices (kind of a View) of one episode, its length, and whether it is terminated.
Termination is automatically detected when partial insertion happens. Indeed, in the run loop, before acting for the first time, only a (state, action) pair is pushed to the trajectory, then it's (reward, terminal, next_state, next_action).
This is the choice I made at the moment BUT, it is not compatible with the current run loop as the partial inserting only happens for the very first episode. Then at the first step of any other episode, it pushes
(reward_prev_ep, terminal_prev_ep, state_new_ep, action_new_ep)
, that is, the last state of the previous episode is not pushed. This is related to the problem we discussed in #913.The other way I thought is instead of detecting with partial inserting, we add program PostEpisodeStage to send a termination status to the last episode (/!\ terminal != terminated). I did not choose this because it kinda means that RLTrajectories must rely on whatever package that uses it to send the correct information. Whereas the partial insertion detection is self-contained.
In any case, my proposal is to accompany this PR with another one in RL.jl that would change the run loop as follows.
This way, plan! is executed at the end of the loop and the trajectory is always complete. As explained in #42, the completeness of the trajectory is ensured by inserting dummy elements between episodes, at the indices that should be excluded from sampling.
This is a big PR and it already took me a lot of time, I'd like to know what you think @jeremiahpslewis before I move on with the...
...Remaining stuff to do:
How to make this work with MultiThreadedEnvs ?MTE does not even work with the current implementation of Trajectories.Only appending another trajectory can work as we cannot infer the valid indices from a named tupleAnswer is no because MultiplexTraces do not currently support appending. Appending is just not generally supported at the moment and needs to be implemented in a dedicated PR. Appending will be mostly useful for distributed agents that have their own trajectories to send to a main trajectory.All of the following are to be done in the RL.jl PR.