Adjust to RL.jl #6

findmyway · 2022-05-04T04:19:17Z

No description provided.

HenriDeh · 2022-05-04T13:34:41Z

I looked at the implementation. I really like the design; I think it's clever. Regarding the API I don't see any obvious problem with your choices. Here are my comments and questions.

Episodes: wouldn't the field name is_terminated would be more consistent with the RLBase API? is_done might be confusing to users that may think they are named differently to highlight that they mean different things.
Trajectory: if traces is a Episodes, length(t) will return the number of episodes, not the number of steps stored. This may be problematic that the API does not return the same information.
Episode: why use a Ref{Bool} instead of a Bool? I assume you made that choice for a reason but I'm curious to know.

findmyway · 2022-05-04T14:36:36Z

Episodes: wouldn't the field name is_terminated would be more consistent with the RLBase API? is_done might be confusing to users that may think they are named differently to highlight that they mean different things.

Good point. I'll keep using is_terminated later.

Trajectory: if traces is a Episodes, length(t) will return the number of episodes, not the number of steps stored. This may be problematic that the API does not return the same information.

Nice catch, this is a mistake and I have removed that defintion in my local code. I'll update this PR soon. Essentially, each element in a Trajectory is one step of the experience. When traces is an Episodes, we may need some efficient data structure to allow quickly sampling a batch of experiences from it. This is of a lower priority right now. We can implement it later.

Episode: why use a Ref{Bool} instead of a Bool? I assume you made that choice for a reason but I'm curious to know.

Ah, that's simply because we need to update that field and Episode is defined as an immutable struct.

codecov-commenter · 2022-05-04T16:19:43Z

Codecov Report

❗ No coverage uploaded for pull request base (main@dc220ed). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main       #6   +/-   ##
=======================================
  Coverage        ?   30.06%           
=======================================
  Files           ?        9           
  Lines           ?      316           
  Branches        ?        0           
=======================================
  Hits            ?       95           
  Misses          ?      221           
  Partials        ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc220ed...016b57e. Read the comment docs.

HenriDeh · 2022-05-05T10:17:04Z

When traces is an Episodes, we may need some efficient data structure to allow quickly sampling a batch of experiences from it.

How about

length(e::Episodes) = sum(length(ep) for ep in e)

in the meantime. It's not efficient to do the sum everytime but I don't think it's horrible.
For the sampling we can use the StatsBase weighted sample function (https://juliastats.org/StatsBase.jl/stable/weights/) and (https://juliastats.org/StatsBase.jl/stable/sampling/) to do a two-stage sampling

function rand(e::Episodes, sampler) 
    weights = FrequencyWeights(length.(e))
    episode_samples = sample(e.episodes, weights, sampler.n)
    rand(episode_samples, sampler) #returns the batch with one sample from each sampled episode
end

This means that samplers must implement rand(::Vector{Episode}, sampler).

Regarding the API, I have this one new comment. As discussed here, sample is to get samples from a dataset/collection, while rand is for generating samples from a random variable. I think when we sample from a Trajectory, we are in the former case, so we should use sample.

findmyway · 2022-05-05T10:37:56Z

Both are good suggestions.

For the rand vs sample, I create issue #8 to remind myself.

findmyway · 2022-05-08T10:14:57Z

I think I've finished the API part. The rest things are to move existing trajectories in RL.jl here.

findmyway added 2 commits May 3, 2022 22:36

sync

555f1de

adjust to RL.jl

a72a473

findmyway added 2 commits May 4, 2022 23:27

sync local changes

ec7ac65

add tests for trajectory

98775e0

findmyway added 6 commits May 6, 2022 01:02

sync local change

f462e88

import MLUtils

9fb2994

remove AsyncTrajectory

d63fd75

fix tests

f574b3d

minor modification

5722d21

update readme

016b57e

update readme

edd4c52

findmyway merged commit cc688a2 into JuliaReinforcementLearning:main May 8, 2022

findmyway mentioned this pull request May 8, 2022

Use sample instead of rand #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust to RL.jl #6

Adjust to RL.jl #6

Uh oh!

findmyway commented May 4, 2022

Uh oh!

HenriDeh commented May 4, 2022

Uh oh!

findmyway commented May 4, 2022

Uh oh!

codecov-commenter commented May 4, 2022 •

edited

Loading

Uh oh!

HenriDeh commented May 5, 2022

Uh oh!

findmyway commented May 5, 2022

Uh oh!

findmyway commented May 8, 2022

Uh oh!

Uh oh!

Adjust to RL.jl #6

Adjust to RL.jl #6

Uh oh!

Conversation

findmyway commented May 4, 2022

Uh oh!

HenriDeh commented May 4, 2022

Uh oh!

findmyway commented May 4, 2022

Uh oh!

codecov-commenter commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HenriDeh commented May 5, 2022

Uh oh!

findmyway commented May 5, 2022

Uh oh!

findmyway commented May 8, 2022

Uh oh!

Uh oh!

codecov-commenter commented May 4, 2022 •

edited

Loading