MarlGrid ======== We use ``kandouss/marlgrid`` commit ``e88c40bad07653575ac11fe2f3a115e4de3d13e9`` as the reference implementation. MarlGrid is a multi-agent gridworld based on the original MiniGrid codebase. Its GoalCycle environments were used in the 2021 paper by Ndousse and coauthors, `Emergent Social Learning via Multi-agent Reinforcement Learning `_. .. image:: ../_static/render_samples/marlgrid_official_compare.png :align: center Options ------- * ``task_id (str)``: see the available tasks below; * ``num_envs (int)``: how many environments you would like to create; * ``batch_size (int)``: the expected batch size for returned environments, default to ``num_envs``; * ``num_threads (int)``: the maximum thread number for executing the actual ``env.step``, default to ``batch_size``; * ``seed (int | Sequence[int])``: the environment seed. When a sequence is provided, it must contain exactly one seed per environment. Default to ``42``; * ``max_num_players (int)``: maximum number of players in one environment. Each registered task defaults this to its number of agents; * ``prestige_coloring (bool)``: use the ``kandouss/marlgrid`` prestige cue for agent rendering. This option defaults to ``False`` to preserve the fixed agent colors used by existing tasks. When enabled, positive rewards move an agent color from red toward blue, negative rewards reset it to red, and prestige decays after each active agent step; * ``prestige_beta (float)``: per-step prestige decay factor, default to ``0.95``; * ``prestige_scale (float)``: reward-history scale used when mapping prestige to color, default to ``2.0``. Observation Space ----------------- MarlGrid returns one RGB partial-view image per player. The default registered tasks use ``view_tile_size=8`` and expose ``obs`` as a uint8 tensor with shape ``(view_tile_size * view_size, view_tile_size * view_size, 3)`` per player. Player metadata is returned under ``info["players"]``: * ``id``: player index inside the environment; * ``done``: per-player completion flag; * ``active``: whether the player currently renders and acts; * ``pos``: player position in the full grid; * ``dir``: player direction in ``[0, 3]``. Action Space ------------ Actions are per-player discrete values in ``[0, 6]``: * ``0``: turn left; * ``1``: turn right; * ``2``: move forward; * ``3``: pick up; * ``4``: drop; * ``5``: toggle / interact; * ``6``: done. Multi-agent tasks accept EnvPool's player-shaped action format, for example: .. code-block:: python import envpool import numpy as np env = envpool.make_gymnasium("MarlGrid-3AgentCluttered11x11-v0", num_envs=2) obs, info = env.reset() obs, reward, terminated, truncated, info = env.step({ "players": { "env_id": info["players"]["env_id"], "action": np.full(info["players"]["env_id"].shape, 2, dtype=np.int32), } }) Available Tasks --------------- Task IDs follow the pinned upstream registry. Note that upstream names ``MarlGrid-1AgentCluttered15x15-v0`` as ``15x15`` even though that pinned registry config uses ``grid_size=11``. * ``MarlGrid-1AgentCluttered15x15-v0`` * ``MarlGrid-3AgentCluttered11x11-v0`` * ``MarlGrid-3AgentCluttered15x15-v0`` * ``MarlGrid-2AgentEmpty9x9-v0`` * ``MarlGrid-3AgentEmpty9x9-v0`` * ``MarlGrid-4AgentEmpty9x9-v0`` * ``Goalcycle-demo-solo-v0``