Skip to content

add MADDPG algorithm #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 12, 2021

Conversation

peterchen96
Copy link
Member

@peterchen96 peterchen96 commented Aug 9, 2021

PR Checklist

  • Update NEWS.md?

The description of the implementation is in discussion #404.

Here MADDPG raises an unknown word error... How can I fix it? @findmyway

@findmyway
Copy link
Member

You can simply add those words after

cc @pilgrimygy

@peterchen96
Copy link
Member Author

Thanks! And ask for suggestions about the implementation and code mistakes/style.

@findmyway
Copy link
Member

I'll review it later tonight 😃

Copy link
Member

@findmyway findmyway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

Comment on lines 31 to 78
function (π::MADDPGManager)(::PreEpisodeStage, ::AbstractEnv)
for (_, agent) in π.agents
if length(agent.trajectory) > 0
pop!(agent.trajectory[:state])
pop!(agent.trajectory[:action])
if haskey(agent.trajectory, :legal_actions_mask)
pop!(agent.trajectory[:legal_actions_mask])
end
end
end
end

function (π::MADDPGManager)(::PreActStage, env::AbstractEnv, actions)
# update each agent's trajectory
for (player, agent) in π.agents
push!(agent.trajectory[:state], state(env, player))
push!(agent.trajectory[:action], actions[player])
if haskey(agent.trajectory, :legal_actions_mask)
lasm = legal_action_space_mask(env, player)
push!(agent.trajectory[:legal_actions_mask], lasm)
end
end

# update policy
update!(π)
end

function (π::MADDPGManager)(::PostActStage, env::AbstractEnv)
for (player, agent) in π.agents
push!(agent.trajectory[:reward], reward(env, player))
push!(agent.trajectory[:terminal], is_terminated(env))
end
end

function (π::MADDPGManager)(::PostEpisodeStage, env::AbstractEnv)
# collect state and dummy action to each agent's trajectory
for (player, agent) in π.agents
push!(agent.trajectory[:state], state(env, player))
push!(agent.trajectory[:action], rand(action_space(env)))
if haskey(agent.trajectory, :legal_actions_mask)
lasm = legal_action_space_mask(env, player)
push!(agent.trajectory[:legal_actions_mask], lasm)
end
end

# update policy
update!(π)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about dispatching to the inner agent's corresponding methods?

Like calling agent(stage, env, action) in the for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at the NamedPolicy and see whether we can reuse existing code as much as possible? See also the MultiAgentManager

Comment on lines 91 to 92
temp_player = rand(keys(π.agents))
t = π.agents[temp_player].trajectory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply use the first agent?

temp_player = rand(keys(π.agents))
t = π.agents[temp_player].trajectory
inds = rand(π.rng, 1:length(t), π.batch_size)
batches = Dict((player, RLCore.fetch!(BatchSampler{SARTS}(π.batch_size), agent.trajectory, inds))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded SARTS will make the algorithm work only on environments of MINIMAL_ACTION_SET.

Comment on lines 98 to 100
s = vcat((batches[player][1] for (player, _) in π.agents)...)
a = vcat((batches[player][2] for (player, _) in π.agents)...)
s′ = vcat((batches[player][5] for (player, _) in π.agents)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vcat is not very efficient here. Try Flux.batch?

Comment on lines 167 to 169
s, a, s′ = send_to_host((s, a, s′))
mu_actions = send_to_host(mu_actions)
new_actions = send_to_host(new_actions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they required here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kind reviews! I'll check and update my codes later today.

@peterchen96
Copy link
Member Author

Here is still a simple version of MADDPG, which only supports the envs of MINIMAL_ACTION_SET.

@findmyway findmyway merged commit 4e5d258 into JuliaReinforcementLearning:master Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants