init #1

findmyway · 2022-04-16T10:12:45Z

No description provided.

HenriDeh · 2022-04-19T11:15:32Z

Hello ! I just wanted to pop in to discuss a certain point.

In Actor-Critic algorithms, it is often the case that we must sample from the trajectory to update the critic, and then a second time to update the actor. It may also be so for model-based methods if there's a sampling to train the model.
Depending on the algorithm, it may be that the critic needs a different kind of batch than the actor/policy. I think of retrace that needs N (states, actions, rewards) batched to update the critic, but the actor needs a single state. So that means that the Sampler should be different. It may be nice to think of how to implement the possibility to specify multiple samplers. Maybe Sampler could allow for a Dict like so:

:policy => BatchSampler
:qnetwork => NStepBatchSampler (3 steps)
:model => NStepBatchSampler (100 steps)

And then algorithms would call e.g. inds, batch = traj.sampler[:policy](traj) when updating the policy but inds, batch = traj.sampler:qnetwork` when updating a qnetwork.

Each algorithm could check that the required sampler is present in the Dict during the check phase.

findmyway · 2022-04-19T14:26:59Z

Thanks for providing early feedback.

Yes, it should be easy to support. We just need a meta sampler 😉

findmyway added 3 commits April 16, 2022 18:11

init

0d1bd22

pretty print

d99c577

sync

a942b1f

findmyway added 4 commits April 20, 2022 00:24

sync

efb92ca

fix rendering

c4c15e5

sync local changes

5ef918f

move parts into doc

04edcc3

findmyway merged commit dc220ed into JuliaReinforcementLearning:main May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

init #1

init #1

Uh oh!

findmyway commented Apr 16, 2022

Uh oh!

HenriDeh commented Apr 19, 2022

Uh oh!

findmyway commented Apr 19, 2022

Uh oh!

Uh oh!

init #1

init #1

Uh oh!

Conversation

findmyway commented Apr 16, 2022

Uh oh!

HenriDeh commented Apr 19, 2022

Uh oh!

findmyway commented Apr 19, 2022

Uh oh!

Uh oh!