Skip to content

Adjust to RL.jl #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 8, 2022
Merged

Conversation

findmyway
Copy link
Member

No description provided.

@HenriDeh
Copy link
Member

HenriDeh commented May 4, 2022

I looked at the implementation. I really like the design; I think it's clever. Regarding the API I don't see any obvious problem with your choices. Here are my comments and questions.

  • Episodes: wouldn't the field name is_terminated would be more consistent with the RLBase API? is_done might be confusing to users that may think they are named differently to highlight that they mean different things.

  • Trajectory: if traces is a Episodes, length(t) will return the number of episodes, not the number of steps stored. This may be problematic that the API does not return the same information.

  • Episode: why use a Ref{Bool} instead of a Bool? I assume you made that choice for a reason but I'm curious to know.

@findmyway
Copy link
Member Author

  • Episodes: wouldn't the field name is_terminated would be more consistent with the RLBase API? is_done might be confusing to users that may think they are named differently to highlight that they mean different things.

Good point. I'll keep using is_terminated later.

  • Trajectory: if traces is a Episodes, length(t) will return the number of episodes, not the number of steps stored. This may be problematic that the API does not return the same information.

Nice catch, this is a mistake and I have removed that defintion in my local code. I'll update this PR soon. Essentially, each element in a Trajectory is one step of the experience. When traces is an Episodes, we may need some efficient data structure to allow quickly sampling a batch of experiences from it. This is of a lower priority right now. We can implement it later.

  • Episode: why use a Ref{Bool} instead of a Bool? I assume you made that choice for a reason but I'm curious to know.

Ah, that's simply because we need to update that field and Episode is defined as an immutable struct.

@codecov-commenter
Copy link

codecov-commenter commented May 4, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@dc220ed). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main       #6   +/-   ##
=======================================
  Coverage        ?   30.06%           
=======================================
  Files           ?        9           
  Lines           ?      316           
  Branches        ?        0           
=======================================
  Hits            ?       95           
  Misses          ?      221           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc220ed...016b57e. Read the comment docs.

@HenriDeh
Copy link
Member

HenriDeh commented May 5, 2022

When traces is an Episodes, we may need some efficient data structure to allow quickly sampling a batch of experiences from it.

How about

length(e::Episodes) = sum(length(ep) for ep in e)

in the meantime. It's not efficient to do the sum everytime but I don't think it's horrible.
For the sampling we can use the StatsBase weighted sample function (https://juliastats.org/StatsBase.jl/stable/weights/) and (https://juliastats.org/StatsBase.jl/stable/sampling/) to do a two-stage sampling

function rand(e::Episodes, sampler) 
    weights = FrequencyWeights(length.(e))
    episode_samples = sample(e.episodes, weights, sampler.n)
    rand(episode_samples, sampler) #returns the batch with one sample from each sampled episode
end

This means that samplers must implement rand(::Vector{Episode}, sampler).

Regarding the API, I have this one new comment. As discussed here, sample is to get samples from a dataset/collection, while rand is for generating samples from a random variable. I think when we sample from a Trajectory, we are in the former case, so we should use sample.

@findmyway
Copy link
Member Author

Both are good suggestions.

For the rand vs sample, I create issue #8 to remind myself.

@findmyway
Copy link
Member Author

I think I've finished the API part. The rest things are to move existing trajectories in RL.jl here.

image

@findmyway findmyway merged commit cc688a2 into JuliaReinforcementLearning:main May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants