Python Interface
================

envpool.make
------------

The main interface is ``envpool.make`` where we can provide a task id and some
other arguments (as specified below) to generate different configuration of
batched environments:

* ``task_id (str)``: task id, use ``envpool.list_all_envs()`` to see all
  support tasks;
* ``env_type (str)``: generate with the Gymnasium-compatible wrapper or
  ``dm_env.Environment`` interface, available options are ``dm``, ``gym``, and
  ``gymnasium``;
* ``num_envs (int)``: how many envs are in the envpool, default to ``1``;
* ``batch_size (int)``: async configuration, see the last section, default
  to ``num_envs``;
* ``num_threads (int)``: the maximum thread number for executing the actual
  ``env.step``, default to ``batch_size``;
* ``seed (int | Sequence[int])``: set seed over all environments. If an int is
  provided, the i-th environment seed will be set with i+seed. If a sequence
  is provided, it must contain exactly one seed per environment. The default is
  ``42``;
* ``max_episode_steps (int)``: set the max steps in one episode. This value is
  env-specific (27000 steps or 27000 * 4 = 108000 frames in Atari for
  example);
* ``max_num_players (int)``: the maximum number of player in one env, useful
  in multi-agent env. In single agent environment, it is always ``1``;
* ``thread_affinity_offset (int)``: the start id of binding thread. ``-1``
  means not to use thread affinity in thread pool, and this is the default
  behavior;
* ``reward_threshold (float)``: the reward threshold for solving this
  environment; this option comes from ``env.spec.reward_threshold`` in the
  Gymnasium API, while some environments may not have such an option;
* ``gym_reset_return_info (bool)``: a deprecated compatibility flag kept in the
  config schema. EnvPool's ``gym`` wrapper follows Gymnasium reset semantics and
  always returns ``(obs, info)``; passing ``False`` raises ``ValueError``;
* ``render_mode (str | None)``: render behavior exposed by the Python wrapper.
  Available options are ``None`` (default), ``"rgb_array"``, and ``"human"``;
* ``render_env_id (int)``: default env id used by ``env.render()`` when
  ``env_ids`` is omitted, default to ``0``;
* ``render_width`` / ``render_height``: fixed output size for ``render()``.
  If omitted, the environment-specific default render size is used;
* ``render_camera_id (int)``: default camera id used by ``render()``, default
  to ``-1``;
* ``from_pixels (bool)``: for MuJoCo tasks, expose native C++
  render-backed pixel observations through the normal observation API instead
  of the environment's state observations;
* other configurations such as ``img_height`` / ``img_width`` / ``stack_num``
  / ``frame_skip`` / ``noop_max`` in Atari env, ``reward_metric`` /
  ``lmp_save_dir`` in ViZDoom env, please refer to the corresponding pages.

The observation space and action space of resulted environment describe a
single environment's space, but each time the observation/action's first
dimension is always equal to ``num_envs``
(sync mode) or equal to ``batch_size`` (async mode).
For Gymnasium compatibility, the Gymnasium wrapper also exposes
``num_envs``, ``is_vector_env``, ``single_observation_space``, and
``single_action_space`` for vector-aware wrappers such as Gymnasium's vector
``NormalizeObservation``. EnvPool keeps ``observation_space`` and
``action_space`` as the single-environment spaces for backward compatibility.

``envpool.make_gym``, ``envpool.make_dm``, and ``envpool.make_gymnasium`` are
shortcuts for ``envpool.make(..., env_type="gym" | "dm" | "gymnasium")``,
respectively.

envpool.make_spec
-----------------

If you don't want to create a fake environment, meanwhile want to get the
observation / action space, ``envpool.make_spec`` would help. The argument is
the same as ``envpool.make``, and you can use

- ``spec.observation_space`` gym's observation space;
- ``spec.action_space`` gym's action space;
- ``spec.observation_spec()`` dm_env's observation spec;
- ``spec.action_spec()`` dm_env's action spec;

to get the desired spec.

Extended API
------------

We mainly change two functions' semantic: ``reset`` and ``step``, meanwhile
add another two primitives ``send`` and ``recv``:

* ``reset(id: Union[np.ndarray, None]) -> TimeStep``: reset the given ``id``
  envs and return the corresponding observation;
* ``async_reset() -> None``: it only sends the reset command to the executor
  and return nothing;
* ``send(action: Any, env_id: Optional[np.ndarray] = None) -> None``: send the
  action with corresponding env ids to executor (thread pool). ``action`` can
  be numpy array (single observation) or a dict (multiple observations);
* ``recv() -> Union[TimeStep, Tuple[Any, np.ndarray, np.ndarray, np.ndarray]]``
  : receive the finished env ids (in ``timestep.observation.obs.env_id`` (dm)
  or ``info["env_id"]`` (gym / gymnasium)) and corresponding result from
  executor;
* ``step(action: Any, env_id: Optional[np.ndarray] = None) -> Union[TimeStep,
  Tuple[Any, np.ndarray, np.ndarray, Any]]``: given an action, an env (maybe
  with player) id list where ``len(action) == len(env_id)``, the envpool will
  put these requests into a thread pool; then, if it reaches certain
  conditions (explain later), it will return the env id list ``env_id`` and
  result that finished stepping.

In short, ``step(action, env_id)`` == ``send(action, env_id); return recv()``


Rendering
---------

EnvPool exposes rendering through the Python wrapper. When creating an env with
``render_mode="rgb_array"``, calling ``render()`` returns a batch of RGB frames
with shape ``(B, H, W, 3)`` and data type ``uint8``. Even a single env render keeps
the batch dimension, so ``env.render()`` returns ``(1, H, W, 3)`` by default.

``render_mode="human"`` uses the same renderer, but displays the frame through
OpenCV in Python and returns ``None``. Human mode currently supports only a
single env id per call.

The render API is:

* ``render(env_ids: int | Sequence[int] | None = None, camera_id: int | None = None)``

If ``env_ids`` is omitted, EnvPool renders ``render_env_id``. The output size
is fixed when the env is created via ``render_width`` / ``render_height``;
``render()`` itself does not take a runtime resize argument.

Example:
::

    env = envpool.make(
        "Ant-v5",
        env_type="gymnasium",
        num_envs=4,
        render_mode="rgb_array",
        render_width=480,
        render_height=480,
    )
    env.reset()
    frames = env.render(env_ids=[0, 2])
    assert frames.shape == (2, 480, 480, 3)

    viewer = envpool.make(
        "WalkerWalk-v1",
        env_type="gymnasium",
        num_envs=1,
        render_mode="human",
        render_env_id=0,
    )
    viewer.reset()
    viewer.render()

Representative first-frame compares for EnvPool families that support
rendering. In each
panel, EnvPool is on the left and the reference output is on the right. For
Box2D, Classic Control, MiniGrid, MuJoCo, Gymnasium-Robotics, and MyoSuite,
the reference is the upstream Python renderer. For Atari, Procgen, and
VizDoom, the reference is the exact in-tree render oracle used by the test
suite. Google Research Football is intentionally excluded here because its
render API is unsupported.

.. image:: ../_static/render_samples/atari_oracle_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/box2d_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/classic_control_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/minigrid_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/procgen_oracle_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/mujoco_gym_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/gymnasium_robotics_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/mujoco_dmc_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/myosuite_myobase_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/myosuite_myochallenge_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/myosuite_myodm_official_compare.png
    :width: 900px
    :align: center

.. image:: ../_static/render_samples/vizdoom_oracle_compare.png
    :width: 900px
    :align: center

Pixel Observations
------------------

When ``from_pixels=True`` is passed to ``envpool.make`` / ``make_dm`` /
``make_gym`` / ``make_gymnasium`` for MuJoCo tasks, EnvPool uses a native C++
render path to populate the public observation API.

For Gym and Gymnasium wrappers, the returned observation becomes a ``uint8``
tensor in channel-first layout: ``(B, 3, H, W)`` when ``frame_stack == 1`` and
``(B, 3 * frame_stack, H, W)`` otherwise. The ``info`` dictionary remains
unchanged.

For the dm_env wrapper, the timestep observation keeps the usual info fields
such as ``env_id`` while replacing the state observation payload with a
``pixels`` field.

``frame_stack`` is applied on the channel dimension, so the observation surface
matches the usual PyTorch ``BCHW`` convention directly. If ``render_width`` /
``render_height`` are omitted, EnvPool defaults both to ``84`` so the
resulting observation spec is fully defined up front.


Action Input Format
-------------------

EnvPool supports two action formats in ``send`` and ``step``:

- ``(action: np.ndarray, env_id: Optional[np.ndarray] = None)``: for
  single-array action input;
- ``action: Dict[str, Any]``: for multi key-value action input, or in
  multi-player env some player's action space are not the same.

For example, in Atari games, we can use the following action formats:
::

    envpool.send(np.ones(batch_size))
    envpool.send(np.ones(batch_size), env_id=np.arange(batch_size))
    envpool.send({
      # note: please be careful with dtype here
      "action": np.ones(batch_size, dtype=np.int32),
      "env_id": np.arange(batch_size, dtype=np.int32),
     })

For the first and second cases, use ``env.step(action, env_id)``; for the
third case, use ``env.step(action)`` where action is a dictionary.


.. _output_format:

Data Output Format
------------------

.. list-table::
   :header-rows: 1

   * - function
     - gym / gymnasium
     - dm
   * - reset
     - ``(obs, info)`` where ``obs`` is an obs array or dict and
       ``info["env_id"]`` stores the finished env ids
     - env_id -> TimeStep(FIRST, obs|info|env_id, rew=0, discount or 1)
   * - step
     - ``(obs, rew, terminated, truncated, info)`` where
       ``info["env_id"]`` stores the finished env ids
     - TimeStep(StepType, obs|info|env_id, rew, discount or 1 - done)

Note: ``gym.reset()`` doesn't support async step setting because it cannot get
``env_id`` from ``reset()`` function, so it's better to use low-level APIs such
as ``send`` and ``recv``.


Batch Size
----------

In asynchronous setting, ``batch_size`` means when the finished stepping
thread number >= ``batch_size``, we return the result. The figure below
demonstrate this idea (``waitnum`` is the same as ``batch_size``):

.. image:: ../_static/images/async.png
    :width: 500px
    :align: center

The synchronous step is a special case by using the above API:
``batch_size == num_envs``, ``id`` is always all envs' id.


Auto Reset
----------

EnvPool enables auto-reset by default. Let's suppose an environment that has a
``max_episode_steps = 3``. When we call ``env.step(action)`` five consecutive
times, the following would happen:

1. the first call would trigger ``env.reset()`` and return with
   ``done = False`` and ``reward = 0``, i.e., the action will be discarded;
2. the second call would trigger ``env.step(action)`` and elapsed step is 1;
3. the third call would trigger ``env.step(action)`` and elapsed step is 2;
4. the fourth call would trigger ``env.step(action)`` and elapsed step is 3.
   At this time it returns ``truncated = True``;
5. the fifth call would trigger ``env.reset()`` since the last episode has
   finished, and return with ``done = False`` and ``reward = 0``, i.e., the
   action will be discarded.

+---+-------------+-------------+---------+-----------------------+
| # | User Call   | Actual      | Elapsed | Misc                  |
+===+=============+=============+=========+=======================+
| 1 | env.step(a) | env.reset() | 0       |                       |
+---+-------------+-------------+---------+-----------------------+
| 2 | env.step(a) | env.step(a) | 1       |                       |
+---+-------------+-------------+---------+-----------------------+
| 3 | env.step(a) | env.step(a) | 2       |                       |
+---+-------------+-------------+---------+-----------------------+
| 4 | env.step(a) | env.step(a) | 3       | Hit max_episode_steps |
+---+-------------+-------------+---------+-----------------------+
| 5 | env.step(a) | env.reset() | 0       |                       |
+---+-------------+-------------+---------+-----------------------+