Skip to content
This repository was archived by the owner on May 6, 2021. It is now read-only.

added acrobot code #59

Merged
merged 4 commits into from
Jun 2, 2020
Merged

added acrobot code #59

merged 4 commits into from
Jun 2, 2020

Conversation

gpavanb1
Copy link
Contributor

@gpavanb1 gpavanb1 commented Jun 1, 2020

This is in reference to #9

Do let me know if you have any suggestions :)

Copy link
Contributor

@jbrea jbrea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot.

Does it give the exact same results as the gym implementation?

Can you also add this environment to test/environments.jl?

I have a few minor comments below. Performance improvements are not critical; feel free to ignore them.

@@ -0,0 +1,244 @@
using Random
using OrdinaryDiffEq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrdinaryDiffEq needs to be added to Project.toml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem.


mutable struct AcrobotEnv{T,R<:AbstractRNG} <: AbstractEnv
params::AcrobotEnvParams{T}
action_space::DiscreteSpace{UnitRange{Int64}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int64 -> Int.

# augmented state for derivative function
s_augmented = [env.state..., torque]

ode = ODEProblem(dsdt, s_augmented, (0., env.params.dt), env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't really matter, if this environment isn't performance tuned.
But if you want to improve performance, you could replace state by ode in AcrobotEnv and set ode.u0[1:end-1] .= ns below and ode.u0[end] = torque above.

ns = solve(ode, RK4())
# only care about final timestep of integration returned by integrator
ns = ns.u[end]
ns = ns[1:4] # omit action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to improve performance, I would avoid this extra allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had given this a thought as well. It was more because the eventual statements related to clipping the state would become cumbersome.

@gpavanb1
Copy link
Contributor Author

gpavanb1 commented Jun 2, 2020

Nice, thanks a lot.

Does it give the exact same results as the gym implementation?

Can you also add this environment to test/environments.jl?

I have a few minor comments below. Performance improvements are not critical; feel free to ignore them.

Regarding matching results with Python, I have compared the derivative functions both the Julia and Python implementation for a fixed, non-zero state and they match exactly to displayed precision.

For state,
state

the Python derivative
python_du

the Julia derivative
julia_du

However, there are differences in how the timestep is chosen in the RK4 implementation between the two and thus, there was a slight discrepancy in the final states after integration

Python observation
python_obs

Julia observation
julia_obs

@gpavanb1
Copy link
Contributor Author

gpavanb1 commented Jun 2, 2020

I've added the suggested changes and provided the residual verification.
Do let me know your comments :)

@jbrea
Copy link
Contributor

jbrea commented Jun 2, 2020

Awesome, thanks a lot.

@jbrea
Copy link
Contributor

jbrea commented Jun 2, 2020

Oh, one last thing: can you also include the new environment in the README.md?

@gpavanb1
Copy link
Contributor Author

gpavanb1 commented Jun 2, 2020

Done!

@jbrea
Copy link
Contributor

jbrea commented Jun 2, 2020

Great, thanks!

@jbrea jbrea merged commit 8e3e271 into JuliaReinforcementLearning:master Jun 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants