added acrobot code #59

gpavanb1 · 2020-06-01T07:27:05Z

This is in reference to #9

Do let me know if you have any suggestions :)

jbrea

Nice, thanks a lot.

Does it give the exact same results as the gym implementation?

Can you also add this environment to test/environments.jl?

I have a few minor comments below. Performance improvements are not critical; feel free to ignore them.

jbrea · 2020-06-01T09:57:41Z

src/environments/classic_control/acrobot.jl

@@ -0,0 +1,244 @@
+using Random
+using OrdinaryDiffEq


OrdinaryDiffEq needs to be added to Project.toml.

Sure, no problem.

jbrea · 2020-06-01T09:58:09Z

src/environments/classic_control/acrobot.jl

+
+mutable struct AcrobotEnv{T,R<:AbstractRNG} <: AbstractEnv
+    params::AcrobotEnvParams{T}
+    action_space::DiscreteSpace{UnitRange{Int64}}


Int64 -> Int.

jbrea · 2020-06-01T10:04:25Z

src/environments/classic_control/acrobot.jl

+    # augmented state for derivative function
+    s_augmented = [env.state..., torque]
+
+    ode = ODEProblem(dsdt, s_augmented, (0., env.params.dt), env)


I guess it doesn't really matter, if this environment isn't performance tuned.
But if you want to improve performance, you could replace state by ode in AcrobotEnv and set ode.u0[1:end-1] .= ns below and ode.u0[end] = torque above.

jbrea · 2020-06-01T10:16:29Z

src/environments/classic_control/acrobot.jl

+    ns = solve(ode, RK4())
+    # only care about final timestep of integration returned by integrator
+    ns = ns.u[end]
+    ns = ns[1:4]  # omit action


If you want to improve performance, I would avoid this extra allocation.

Yes, I had given this a thought as well. It was more because the eventual statements related to clipping the state would become cumbersome.

gpavanb1 · 2020-06-02T07:00:59Z

Nice, thanks a lot.

Does it give the exact same results as the gym implementation?

Can you also add this environment to test/environments.jl?

I have a few minor comments below. Performance improvements are not critical; feel free to ignore them.

Regarding matching results with Python, I have compared the derivative functions both the Julia and Python implementation for a fixed, non-zero state and they match exactly to displayed precision.

For state,

the Python derivative

the Julia derivative

However, there are differences in how the timestep is chosen in the RK4 implementation between the two and thus, there was a slight discrepancy in the final states after integration

Python observation

Julia observation

gpavanb1 · 2020-06-02T07:10:42Z

I've added the suggested changes and provided the residual verification.
Do let me know your comments :)

jbrea · 2020-06-02T07:58:48Z

Awesome, thanks a lot.

jbrea · 2020-06-02T08:00:24Z

Oh, one last thing: can you also include the new environment in the README.md?

gpavanb1 · 2020-06-02T08:17:35Z

Done!

jbrea · 2020-06-02T08:19:57Z

Great, thanks!

added acrobot code

9fc4a90

findmyway assigned jbrea Jun 1, 2020

jbrea reviewed Jun 1, 2020

View reviewed changes

added OrdinaryDiffEq to Project.toml

86797d5

suggested changes, verified residuals, equations formatted

59bdcd9

modified README

9d90811

jbrea merged commit 8e3e271 into JuliaReinforcementLearning:master Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added acrobot code #59

added acrobot code #59

Uh oh!

gpavanb1 commented Jun 1, 2020

Uh oh!

jbrea left a comment

Uh oh!

jbrea Jun 1, 2020

Uh oh!

gpavanb1 Jun 1, 2020

Uh oh!

jbrea Jun 1, 2020

Uh oh!

jbrea Jun 1, 2020

Uh oh!

jbrea Jun 1, 2020

Uh oh!

gpavanb1 Jun 1, 2020

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

Uh oh!

added acrobot code #59

added acrobot code #59

Uh oh!

Conversation

gpavanb1 commented Jun 1, 2020

Uh oh!

jbrea left a comment

Choose a reason for hiding this comment

Uh oh!

jbrea Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

gpavanb1 Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

jbrea Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

jbrea Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

jbrea Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

gpavanb1 Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

gpavanb1 commented Jun 2, 2020

Uh oh!

jbrea commented Jun 2, 2020

Uh oh!

Uh oh!