MarlGrid
We use kandouss/marlgrid commit
e88c40bad07653575ac11fe2f3a115e4de3d13e9 as the reference implementation.
MarlGrid is a multi-agent gridworld based on the original MiniGrid codebase.
Its GoalCycle environments were used in the 2021 paper by Ndousse and
coauthors, Emergent Social Learning via Multi-agent Reinforcement Learning.
Options
task_id (str): see the available tasks below;num_envs (int): how many environments you would like to create;batch_size (int): the expected batch size for returned environments, default tonum_envs;num_threads (int): the maximum thread number for executing the actualenv.step, default tobatch_size;seed (int | Sequence[int]): the environment seed. When a sequence is provided, it must contain exactly one seed per environment. Default to42;max_num_players (int): maximum number of players in one environment. Each registered task defaults this to its number of agents;prestige_coloring (bool): use thekandouss/marlgridprestige cue for agent rendering. This option defaults toFalseto preserve the fixed agent colors used by existing tasks. When enabled, positive rewards move an agent color from red toward blue, negative rewards reset it to red, and prestige decays after each active agent step;prestige_beta (float): per-step prestige decay factor, default to0.95;prestige_scale (float): reward-history scale used when mapping prestige to color, default to2.0.
Observation Space
MarlGrid returns one RGB partial-view image per player. The default registered
tasks use view_tile_size=8 and expose obs as a uint8 tensor with shape
(view_tile_size * view_size, view_tile_size * view_size, 3) per player.
Player metadata is returned under info["players"]:
id: player index inside the environment;done: per-player completion flag;active: whether the player currently renders and acts;pos: player position in the full grid;dir: player direction in[0, 3].
Action Space
Actions are per-player discrete values in [0, 6]:
0: turn left;1: turn right;2: move forward;3: pick up;4: drop;5: toggle / interact;6: done.
Multi-agent tasks accept EnvPool’s player-shaped action format, for example:
import envpool
import numpy as np
env = envpool.make_gymnasium("MarlGrid-3AgentCluttered11x11-v0", num_envs=2)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step({
"players": {
"env_id": info["players"]["env_id"],
"action": np.full(info["players"]["env_id"].shape, 2, dtype=np.int32),
}
})
Available Tasks
Task IDs follow the pinned upstream registry. Note that upstream names
MarlGrid-1AgentCluttered15x15-v0 as 15x15 even though that pinned
registry config uses grid_size=11.
MarlGrid-1AgentCluttered15x15-v0MarlGrid-3AgentCluttered11x11-v0MarlGrid-3AgentCluttered15x15-v0MarlGrid-2AgentEmpty9x9-v0MarlGrid-3AgentEmpty9x9-v0MarlGrid-4AgentEmpty9x9-v0Goalcycle-demo-solo-v0