ViZDoom
=======

We use ``vizdoom==1.3.0`` as the codebase. See
https://github.com/Farama-Foundation/ViZDoom/tree/1.3.0


Render Compare
--------------

Representative first-frame compares for VizDoom tasks. In each panel, EnvPool
is on the left and the exact screen-buffer oracle used by the test suite is on
the right. Scenarios that cannot be rendered stably in the current local test
setup are shown as placeholders.

.. image:: ../_static/render_samples/vizdoom_oracle_compare.png
    :width: 900px
    :align: center


Env Wrappers
------------

Currently it includes these wrappers: ``frame-skip`` / ``episodic-life`` /
``action-repeat`` / ``image-resize`` / ``reward-config``.


Options
-------

* ``task_id (str)``: see available tasks below;
* ``num_envs (int)``: how many environments you would like to create;
* ``batch_size (int)``: the expected batch size for return result, default to
  ``num_envs``;
* ``num_threads (int)``: the maximum thread number for executing the actual
  ``env.step``, default to ``batch_size``;
* ``seed (int | Sequence[int])``: the environment seed. When a sequence is
  provided, it must contain exactly one seed per environment. Default to
  ``42``;
* ``max_episode_steps (int)``: the maximum number of steps for one episode,
  default to ``525``;
* ``img_height (int)``: the desired observation image height, default to
  ``84``;
* ``img_width (int)``: the desired observation image width, default to ``84``;
* ``stack_num (int)``: the number of frames to stack for a single observation,
  default to ``4``;
* ``frame_skip (int)``: the number of frames to execute one repeated action,
  only the last frame would be kept, default to ``4``;
* ``use_inter_area_resize (bool)``: whether to use ``cv::INTER_AREA`` for
  image resize, default to ``True``;
* ``episodic_life (bool)``: make end-of-life == end-of-episode, but only reset
  on true game over. It helps the value estimation. Default to ``False``;
* ``use_combined_action (bool)``: whether to use a discrete action space as
  action input (for doom game engine), see :ref:`vizdoom_action_space`,
  default to ``False`` (use raw action space);
* ``force_speed (bool)``: if ``SPEED`` button is available, press it in every
  frame. Only available when using combined action. Default to ``False``;
* ``lmp_save_dir (str)``: the directory to save ``.lmp`` files for recording
  and replay (see tests/vizdoom/replay.py), default to ``""`` (no lmp saving);
* ``vzd_path (str)``: the vizdoom binary path, default to
  ``vizdoom/bin/vizdoom``;
* ``cfg_path (str)``: the ``.cfg`` file path, used in customized env setup,
  default to ``""``;
* ``wad_path (str)``: the ``.wad`` file path, used in customized env setup,
  default to ``""``;
* ``iwad_path (str)``: the rendering resource package path, default to
  ``vizdoom/bin/freedoom2``;
* ``map_id (str)``: the vizdoom map id, see `setDoomMap
  <https://github.com/Farama-Foundation/ViZDoom/blob/1.3.0/doc/DoomGame.md#setDoomMap>`_,
  available options are ``"map01", "map02", ...``, default to ``"map01"``;
* ``game_args (str)``: the args string for vizdoom game, see `addGameArgs
  <https://github.com/Farama-Foundation/ViZDoom/blob/1.3.0/doc/DoomGame.md#addGameArgs>`_,
  default to ``""``;
* ``reward_config (Dict[str, Tuple[float, float]])``: how to calculate the
  reward (see below), default to ``{"FRAGCOUNT": [1, -1.5], "KILLCOUNT": [1, 0],
  "DEATHCOUNT": [-0.75, 0.75], "HITCOUNT": [0.01, -0.01], "DAMAGECOUNT":
  [0.003, -0.003], "HEALTH": [0.005, -0.003], "ARMOR": [0.005, -0.001], "AMMO2":
  [0.0002, -0.0001]}``;

The original vizdoom env calculates reward with only living reward and death
penalty. Our preliminary result shows these two reward have negative effect on
agent training. Instead, other reward related to some game variable is very
useful. You can pass various reward config into vizdoom env. Each item in this
dictionary has format of ``NAME: [pos_reward, neg_reward]``. If we take
``HEALTH: [pos_health, neg_health]`` as an example:

::

    delta = current[HEALTH] - last[HEALTH]
    if delta >= 0:
        reward += delta * pos_health
    else:
        reward -= delta * neg_health

where ``last[*]`` is the corresponding value at the last timestep.

* ``weapon_duration`` see below, default to ``5``;
* ``selected_weapon_reward_config (Dict[int, float])``: the available keys are
  ``2 / 3 / ... / 7``, it means if the agent holds ``i``th weapon for at least
  ``weapon_duration`` timestep, the reward will be added by
  ``selected_weapon_reward_config[i]``;
* ``delta_button_config (Dict[str, Tuple[int, float, float]])``: see
  :ref:`vizdoom_action_space`.


Customized VizDoom Env
----------------------

Use ``VizdoomCustom-v1`` with ``cfg_path`` and ``wad_path``:
::

   env = envpool.make(
     "VizdoomCustom-v1",
     cfg_path="xxx.cfg",
     wad_path="xxx.wad",
     ...
    )


Observation Space
-----------------

The observation channel number is defined in configuration file (e.g.,
``GRAY8`` or ``CRCGCB``).

The game variables defined in configuration file are in observation (dm) / are
in info (gym). Each variable tags a key. Currently it includes:
``"AMMO2", "AMMO3", "AMMO4", "AMMO5", "AMMO6", "AMMO7", "ARMOR",
"DAMAGECOUNT", "DEATHCOUNT", "FRAGCOUNT", "HEALTH", "HITCOUNT",
"KILLCOUNT", "SELECTED_WEAPON", "SELECTED_WEAPON_AMMO", "USER2"``

.. _vizdoom_action_space:

Action Space
------------

If ``use_combined_action`` is set to ``False``, it only accepts the original
action input (which is a list of float); otherwise:

All of the buttons are in discrete space, including delta button. In EnvPool
we directly generate the combo action with the following rule:

1. Each time the agent can only select at most one weapon

   * can only select at most one of ``SELECT_WEAPON0`` ... ``SELECT_WEAPON9``
     buttons

2. Some buttons are pair-wised, they cannot be selected together (``FF, TF, FT``)

   * ``MOVE_LEFT`` and ``MOVE_RIGHT``, ``MOVE_FORWARD`` and ``MOVE_BACKWARD``,
     ``TURN_LEFT`` and ``TURN_RIGHT``, ``LOOK_UP`` and ``LOOK_DOWN``,
     ``MOVE_UP`` and ``MOVE_DOWN``, ``SELECT_PREV_WEAPON`` and
     ``SELECT_NEXT_WEAPON``, ``SELECT_PREV_ITEM`` and ``SELECT_NEXT_ITEM``

3. Other non-delta buttons have two choices: ``F`` or ``T``

4. For delta buttons, the given ``delta_button_config`` specifies how it builds
   the action set (with format ``[num, min, max]``). For example, if we pass
   ``delta_button_config={"TURN_LEFT_RIGHT_DELTA": [4, -2.0, 1.0]}``, it will
   build ``TURN_LEFT_RIGHT_DELTA = [-2.0, -1.0, 0.0, 1.0]`` 4 discrete choices.

For example, if we have ``MOVE_FORWARD``, ``TURN_LEFT`` and ``TURN_RIGHT``
three buttons (which is exactly health-gathering setting), we have 2x3=6
discrete actions according the above rule.


Available Tasks
---------------

* ``Basic-v1``
* ``Cig-v1``
* ``D1Basic-v1``
* ``D2Navigation-v1``
* ``D3Battle-v1``
* ``D4Battle2-v1``
* ``DeadlyCorridor-v1``
* ``Deathmatch-v1``
* ``DefendTheCenter-v1``
* ``DefendTheLine-v1``
* ``HealthGathering-v1``
* ``HealthGatheringSupreme-v1``
* ``MultiDuel-v1``
* ``MyWayHome-v1``
* ``PredictPosition-v1``
* ``RocketBasic-v1``
* ``SimplerBasic-v1``
* ``TakeCover-v1``
* ``VizdoomCustom-v1``