[OSPP-Week3] fix the problems in the NFSP
implementation
#386
peterchen96
announced in
Archive
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Thanks to @findmyway for pointing out the problems existing in the current
NFSP
implementation. I'll list them as the following and fix them sequentially. #375use reservoir_trajetory to collect data for sl_agent: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
CircularArrayBuffer may not suitable for sl_agent, and I should use the reservoir_trajectory which will randomly replace an old with a new element when the buffer capacity is full.
replace average_learner with BehaviourCloningPolicy: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
average_learner looks similar to BehaviorCloningPolicy. Also, sl_agent just needs to collect states and actions rather than SARTS. Maybe I just need to supplement some BehaviorCloningPolicy's functions is enough to use for sl_agent.
other problems about the convenience of reusing: Implementation of NFSP and NFSP_KuhnPoker experiment #402 (in progress)
the work of state encodes can move to the specific env file.
state_space
.check the
run
function forNFSPAgentManager
, including assertations about the available environment.modify the experiment file, including design a suitable hook and correct format errors.
Beta Was this translation helpful? Give feedback.
All reactions