add MADDPG algorithm #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

findmyway merged 6 commits into JuliaReinforcementLearning:master from peterchen96:MADDPG_

Aug 12, 2021

Member

peterchen96 commented Aug 9, 2021 •

edited

Loading

PR Checklist

Update NEWS.md?

The description of the implementation is in discussion #404.

Here MADDPG raises an unknown word error... How can I fix it? @findmyway

peterchen96 added 2 commits

August 9, 2021 16:28


          add maddpg

92a6103


          add experiment

0b724ea

Member

findmyway commented Aug 9, 2021

You can simply add those words after

ReinforcementLearning.jl/.cspell/cspell.json

Line 123 in 4973762

"Decompressor"


          update cspell.json

d5ed9ba

Member Author

peterchen96 commented Aug 9, 2021

Thanks! And ask for suggestions about the implementation and code mistakes/style.


          Merge branch 'master' into MADDPG_

7ad23c7

Member

findmyway commented Aug 10, 2021

I'll review it later tonight 😃

findmyway approved these changes

View reviewed changes

Member

findmyway left a comment

Looks fine to me

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl Outdated

Comment on lines 31 to 78

+              function (π::MADDPGManager)(::PreEpisodeStage, ::AbstractEnv)
+                  for (_, agent) in π.agents
+                      if length(agent.trajectory) > 0
+                          pop!(agent.trajectory[:state])
+                          pop!(agent.trajectory[:action])
+                          if haskey(agent.trajectory, :legal_actions_mask)
+                              pop!(agent.trajectory[:legal_actions_mask])
+                          end
+                      end
+                  end
+              end
+              function (π::MADDPGManager)(::PreActStage, env::AbstractEnv, actions)
+                  # update each agent's trajectory
+                  for (player, agent) in π.agents
+                      push!(agent.trajectory[:state], state(env, player))
+                      push!(agent.trajectory[:action], actions[player])
+                      if haskey(agent.trajectory, :legal_actions_mask)
+                          lasm = legal_action_space_mask(env, player)
+                          push!(agent.trajectory[:legal_actions_mask], lasm)
+                      end
+                  end
+                  # update policy
+                  update!(π)
+              end
+              function (π::MADDPGManager)(::PostActStage, env::AbstractEnv)
+                  for (player, agent) in π.agents
+                      push!(agent.trajectory[:reward], reward(env, player))
+                      push!(agent.trajectory[:terminal], is_terminated(env))
+                  end
+              end
+              function (π::MADDPGManager)(::PostEpisodeStage, env::AbstractEnv)
+                  # collect state and dummy action to each agent's trajectory
+                  for (player, agent) in π.agents
+                      push!(agent.trajectory[:state], state(env, player))
+                      push!(agent.trajectory[:action], rand(action_space(env)))
+                      if haskey(agent.trajectory, :legal_actions_mask)
+                          lasm = legal_action_space_mask(env, player)
+                          push!(agent.trajectory[:legal_actions_mask], lasm)
+                      end
+                  end
+                  # update policy
+                  update!(π)
+              end

Member

findmyway Aug 11, 2021

How about dispatching to the inner agent's corresponding methods?

Like calling agent(stage, env, action) in the for loop.

Member

findmyway Aug 11, 2021

Can you take a look at the NamedPolicy and see whether we can reuse existing code as much as possible? See also the MultiAgentManager

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl Outdated

Comment on lines 91 to 92

		temp_player = rand(keys(π.agents))
		t = π.agents[temp_player].trajectory

Member

findmyway Aug 11, 2021

Simply use the first agent?

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl

+                  temp_player = rand(keys(π.agents))
+                  t = π.agents[temp_player].trajectory
+                  inds = rand(π.rng, 1:length(t), π.batch_size)
+                  batches = Dict((player, RLCore.fetch!(BatchSampler{SARTS}(π.batch_size), agent.trajectory, inds))

Member

findmyway Aug 11, 2021

The hardcoded SARTS will make the algorithm work only on environments of MINIMAL_ACTION_SET.

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl Outdated

Comment on lines 98 to 100

+                  s = vcat((batches[player][1] for (player, _) in π.agents)...)
+                  a = vcat((batches[player][2] for (player, _) in π.agents)...)
+                  s′ = vcat((batches[player][5] for (player, _) in π.agents)...)

Member

findmyway Aug 11, 2021

vcat is not very efficient here. Try Flux.batch?

src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl Outdated

Comment on lines 167 to 169

+                      s, a, s′ = send_to_host((s, a, s′))
+                      mu_actions = send_to_host(mu_actions)
+                      new_actions = send_to_host(new_actions)

Member

findmyway Aug 11, 2021

Are they required here?

Member Author

peterchen96 Aug 11, 2021

Thanks for your kind reviews! I'll check and update my codes later today.


          update the algo

2fc2ee0

Member Author

peterchen96 commented Aug 11, 2021

Here is still a simple version of MADDPG, which only supports the envs of MINIMAL_ACTION_SET.


          Merge branch 'master' into MADDPG_

4f60aac

findmyway merged commit 4e5d258 into JuliaReinforcementLearning:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet