-
Notifications
You must be signed in to change notification settings - Fork 11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot.
Does it give the exact same results as the gym implementation?
Can you also add this environment to test/environments.jl
?
I have a few minor comments below. Performance improvements are not critical; feel free to ignore them.
@@ -0,0 +1,244 @@ | |||
using Random | |||
using OrdinaryDiffEq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrdinaryDiffEq
needs to be added to Project.toml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem.
|
||
mutable struct AcrobotEnv{T,R<:AbstractRNG} <: AbstractEnv | ||
params::AcrobotEnvParams{T} | ||
action_space::DiscreteSpace{UnitRange{Int64}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int64
-> Int
.
# augmented state for derivative function | ||
s_augmented = [env.state..., torque] | ||
|
||
ode = ODEProblem(dsdt, s_augmented, (0., env.params.dt), env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't really matter, if this environment isn't performance tuned.
But if you want to improve performance, you could replace state
by ode
in AcrobotEnv
and set ode.u0[1:end-1] .= ns
below and ode.u0[end] = torque
above.
ns = solve(ode, RK4()) | ||
# only care about final timestep of integration returned by integrator | ||
ns = ns.u[end] | ||
ns = ns[1:4] # omit action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to improve performance, I would avoid this extra allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had given this a thought as well. It was more because the eventual statements related to clipping the state would become cumbersome.
Regarding matching results with Python, I have compared the derivative functions both the Julia and Python implementation for a fixed, non-zero state and they match exactly to displayed precision. However, there are differences in how the timestep is chosen in the RK4 implementation between the two and thus, there was a slight discrepancy in the final states after integration |
I've added the suggested changes and provided the residual verification. |
Awesome, thanks a lot. |
Oh, one last thing: can you also include the new environment in the README.md? |
Done! |
Great, thanks! |
This is in reference to #9
Do let me know if you have any suggestions :)