Skip to content

Normalizer Wrapper #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 31, 2022
Merged

Normalizer Wrapper #12

merged 27 commits into from
May 31, 2022

Conversation

HenriDeh
Copy link
Member

Hello,

I made the Normalizer that I described in #7. It is fairly straightforward to use but I think documentation will be welcome once this package is integrated to RL.jl. I'm opening an issue as a reminder to do that in due time. Tell me what you think.

Closes #7

@HenriDeh
Copy link
Member Author

In #7 I mention that this may pose problem for NStepSampler. With this implementation, NStepSampler will not work properly.
I have two propositions to deal with that:

  • We can make a discounted_sum_of_rewards_normalizer: by simply using a FTSeries wrapper to multiply rewards by (1-gamma^n)/(1-gamma) before fitting. The problem with this solution is that if a Trajectory has multiple samplers (remember we discussed the metasampler), say a NStepBatchSampler and a BatchSampler, it is now the latter that will be wrong. So this is not my favourite option.
  • What I think we should do is completely change NStepBatchSampler. To me, NStepBatchSampler should sample N consecutive experiences of all traces it is asked for, and that's it. Doing the discounted sum of rewards should be done by the algorithm, for example in q_targets.

@findmyway
Copy link
Member

Indeed, NStepBatchSampler should be changed in the next version.

@HenriDeh
Copy link
Member Author

I think I reached something close to a final API. Before merging, I still need to check how this fares with ElasticArrays and Episodes. But I need to learn how they work first.

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #12 (9e26766) into main (9095619) will decrease coverage by 1.10%.
The diff coverage is 60.49%.

@@            Coverage Diff             @@
##             main      #12      +/-   ##
==========================================
- Coverage   68.36%   67.25%   -1.11%     
==========================================
  Files           9       10       +1     
  Lines         373      452      +79     
==========================================
+ Hits          255      304      +49     
- Misses        118      148      +30     
Impacted Files Coverage Δ
src/rendering.jl 0.00% <0.00%> (ø)
src/traces.jl 85.23% <0.00%> (ø)
src/normalization.jl 61.53% <61.53%> (ø)
src/samplers.jl 86.66% <100.00%> (ø)
src/trajectory.jl 74.07% <0.00%> (+1.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9095619...9e26766. Read the comment docs.

@findmyway
Copy link
Member

Great!

I think after merging this, we'd better register this package first? So that RLCore and RLZoo could be updated to use this package.

(Kind of busy recently, but should have plenty of time to work on it in the following several weeks)

@HenriDeh
Copy link
Member Author

It's not an essential part of the package anyways. If you want to use Trajectories sooner it's fine. What I'm more concerned about is for the algorithms that need the meta sampler.

@HenriDeh
Copy link
Member Author

So I eventually went with a NormalizedTraces (with a s) wrapper. The reason is that otherwise we cannot use it with preconstructed traces such as CircularSARTTraces. I suspect this would have been problematic with Episodes too.
I removed fetch to settle with a simple overload of sample(::BatchSampler). Note that if we add a multistep batch sampler, we will have to implement a specific method for normalization.

One question that we might want to address: currently, the two names of MultiplexTrace share the same normalizer. Meaning that pushing a next_state will update the state normalizer and sampled next_state's will be normalized (as we want).
We must keep that in mind to avoid estimation errors. Currently in RL.jl, pushing a next_state is not a thing so it should be ok.
The other failure case is when pushing dummy states, this should be avoided when using normalization. Since this is already a problem for non-episodic environments (as discussed in #621).

This may be ready to merge. Though I have not tested if it works well with Episodes.

@findmyway
Copy link
Member

Great!

We can merge this first and then address some other corner cases with Episodes later.

@findmyway findmyway merged commit 8c6b304 into main May 31, 2022
@HenriDeh HenriDeh deleted the normalization branch May 31, 2022 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalization
3 participants