Skip to content

Commit 32749f1

Browse files
committed
make a doc page
1 parent 1e051fe commit 32749f1

File tree

2 files changed

+31
-0
lines changed

2 files changed

+31
-0
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ makedocs(
4949
"How to implement a new algorithm?" => "How_to_implement_a_new_algorithm.md",
5050
"How to use hooks?" => "How_to_use_hooks.md",
5151
"Which algorithm should I use?" => "Which_algorithm_should_I_use.md",
52+
"Episodic vs. Non-episodic environments" => "non_episodic.md",
5253
],
5354
"FAQ" => "FAQ.md",
5455
experiments,

docs/src/non_episodic.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Episodic vs Non-episodic environments
2+
3+
## Episodic environments
4+
By default, `run(policy, env, stop_condition, hook)` will step through `env` until a terminal state is reached, signaling the end of an episode. To be able to do so, `env` must implement the `RLBase.is_terminated(::YourEnvironment)` function. This function is called after each step through the environment and when it returns `true`, the trajectory records the terminal state, then the `RLBase.reset!(::YourEnvironment)` function is called and the environment is set to (one of) its initial state(s).
5+
6+
Using this means that the value of the terminal state is set to 0 when learning its value via boostrapping.
7+
8+
## Non-episodic environment
9+
10+
Also called _Continuing tasks_ (Sutton & Barto, 2018), non-episodic environment do not have a terminal state and thus may run for ever, or until the `stop_condition` is reached. Sometimes however, one may want to periodically reset the environment to start fresh. A first possibility is to implement `RLBase.is_terminated(::YourEnvironment)` to reset according to an arbitrary condition. However this may not be a good idea because the value of the last state (note that it is not a _terminal_ state) will be bootstrapped to 0 during learning, even though it is not the true value of the state.
11+
12+
To manage this, we provide the `ResetAfterNSteps(n)` condition as an argument to `run(policy, env, stop_condition, hook, reset_condition = ResetAtTerminal())`. The default `ResetAtTerminal()` assumes an episodic environment, changing that to `ResetAfterNSteps(n)` will no longer check `is_terminated` but will instead call `reset!` every `n` steps. This way, the value of the last state will not be multiplied by 0 during bootstrapping and the correct value can be learned.
13+
14+
## Custom reset conditions
15+
16+
You can specify a custom `reset_condition` instead of using the built-in's. Your condition must be callable with the method `my_condition(policy, env)`. For example, here is how to implement a custom condition that checks for a terminal state but will also reset if the episode is too long:
17+
18+
```julia
19+
reset_n_steps = ResetAfterNSteps(10000)
20+
21+
function my_condition(policy, env)
22+
terminal = is_terminated(env)
23+
too_long = reset_n_steps(policy, env)
24+
return terminal || too_long
25+
end
26+
27+
run(agent, env, stop_condition, hook, my_condition)
28+
```
29+
30+
We could also have made a struct to avoid the global struct.

0 commit comments

Comments
 (0)