-
-
Notifications
You must be signed in to change notification settings - Fork 109
Refactor TRPO and VPG with EpisodesSampler #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI can't pass without 3.3 released |
I can't get buildkite to rerun now that Traj 0.3.3 is released. But tests pass. Note that this PR does not make any promise on the correctness of the two algorithms. |
Found some issues. Do not review yet. |
I concluded that the algorithms are broken but due to additional reasons that are not tied to the refactor. Hence, I kept the algorithms commented but at least the optimise! functions are refactored and may serve as an example for other refactors in the future. It's all I can do in this PR. Further fixing of policy gradient algorithm require contributors knowledgeable about how they work. |
need a new approval due to conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
Co-authored-by: Jeremiah <[email protected]>
Co-authored-by: Jeremiah <[email protected]>
Co-authored-by: Jeremiah <[email protected]>
PR Checklist