Skip to content

init #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 3, 2022
Merged

init #1

merged 7 commits into from
May 3, 2022

Conversation

findmyway
Copy link
Member

No description provided.

@HenriDeh
Copy link
Member

Hello ! I just wanted to pop in to discuss a certain point.

In Actor-Critic algorithms, it is often the case that we must sample from the trajectory to update the critic, and then a second time to update the actor. It may also be so for model-based methods if there's a sampling to train the model.
Depending on the algorithm, it may be that the critic needs a different kind of batch than the actor/policy. I think of retrace that needs N (states, actions, rewards) batched to update the critic, but the actor needs a single state. So that means that the Sampler should be different. It may be nice to think of how to implement the possibility to specify multiple samplers. Maybe Sampler could allow for a Dict like so:

:policy => BatchSampler
:qnetwork => NStepBatchSampler (3 steps)
:model => NStepBatchSampler (100 steps)

And then algorithms would call e.g. inds, batch = traj.sampler[:policy](traj) when updating the policy but inds, batch = traj.sampler:qnetwork` when updating a qnetwork.

Each algorithm could check that the required sampler is present in the Dict during the check phase.

@findmyway
Copy link
Member Author

Thanks for providing early feedback.

Yes, it should be easy to support. We just need a meta sampler 😉

@findmyway findmyway merged commit dc220ed into JuliaReinforcementLearning:main May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants