Mujoco (gym)

We use mujoco==2.2.1 as the codebase. See https://github.com/deepmind/mujoco/tree/2.2.1

The implementation follows OpenAI gym *-v4 environment, see reference.

You can set post_constraint to False to disable the bug fix with this issue, which is *-v3 environments’ standard approach.

Ant-v3/v4

gym Ant-v3 source code

gym Ant-v4 source code

  • Observation space (v3): (111), first 13 elements for qpos[2:], next 14 elements for qvel, other elements for clipped cfrc_ext (com-based external force on body, a.k.a. contact force);

  • Observation space (v4): (27), first 13 elements for qpos[2:], next 14 elements for qvel;

  • Action space: (8), with range [-1, 1];

  • frame_skip: 5;

  • max_episode_steps: 1000;

  • reward_threshold: 6000.0;

HalfCheetah-v3/v4

gym HalfCheetah-v3 source code

gym HalfCheetah-v4 source code

  • Observation space: (17), first 8 elements for qpos[1:], next 9 elements for qvel;

  • Action space: (6), with range [-1, 1];

  • frame_skip: 5;

  • max_episode_steps: 1000;

  • reward_threshold: 4800.0;

Hopper-v3/v4

gym Hopper-v3 source code

gym Hopper-v4 source code

  • Observation space: (11), first 5 elements for qpos[1:], next 6 elements for qvel;

  • Action space: (3), with range [-1, 1];

  • frame_skip: 4;

  • max_episode_steps: 1000;

  • reward_threshold: 6000.0;

Humanoid-v3/v4, HumanoidStandup-v2/v4

gym Humanoid-v3 source code

gym Humanoid-v4 source code

gym HumanoidStandup-v2 source code

gym HumanoidStandup-v4 source code

  • Observation space: (376), first 22 elements for qpos[2:], next 23 elements for qvel, next 140 elements for cinert (com-based body inertia and mass), next 84 elements for cvel (com-based velocity [3D rot; 3D tran]), next 23 elements for qfrc_actuator (actuator force), next 84 elements for cfrc_ext (com-based external force on body);

  • Action space: (17), with range [-0.4, 0.4];

  • frame_skip: 5;

  • max_episode_steps: 1000;

InvertedDoublePendulum-v2/v4

gym InvertedDoublePendulum-v2 source code

gym InvertedDoublePendulum-v4 source code

  • Observation space: (11), first 1 element for qpos[0], next 2 elements for sin(qpos[1:]), next 2 elements for cos(qpos[1:]), next 3 elements for qvel, next 3 elements for qfrc_constraint;

  • Action space: (1), with range [-1, 1];

  • frame_skip: 5;

  • max_episode_steps: 1000;

  • reward_threshold: 9100.0;

InvertedPendulum-v2/v4

gym InvertedPendulum-v2 source code

gym InvertedPendulum-v4 source code

  • Observation space: (4), first 2 elements for qpos, next 2 elements for qvel;

  • Action space: (1), with range [-3, 3];

  • frame_skip: 2;

  • max_episode_steps: 1000;

  • reward_threshold: 950.0;

Pusher-v2/v4

gym Pusher-v2 source code

gym Pusher-v4 source code

  • Observation space: (23), first 7 elements for qpos[:7], next 7 elements for qvel[:7], next 3 elements for tips_arm, next 3 elements for object, next 3 elements for goal;

  • Action space: (7), with range [-2, 2];

  • frame_skip: 5;

  • max_episode_steps: 100;

  • reward_threshold: 0.0;

Reacher-v2/v4

gym Reacher-v2 source code

gym Reacher-v4 source code

  • Observation space: (11), first 2 elements for cos(qpos[:2]), next 2 elements for sin(qpos[:2]), next 2 elements for qpos[2:], next 2 elements for qvel[:2], next 3 elements for dist, a.k.a. fingertip - target;

  • Action space: (2), with range [-1, 1];

  • frame_skip: 2;

  • max_episode_steps: 50;

  • reward_threshold: -3.75;

Swimmer-v3/v4

gym Swimmer-v3 source code

gym Swimmer-v4 source code

  • Observation space: (8), first 3 elements for qpos[2:], next 5 elements for qvel;

  • Action space: (2), with range [-1, 1];

  • frame_skip: 4;

  • max_episode_steps: 1000;

  • reward_threshold: 360.0;

Walker2d-v3/v4

gym Walker2d-v3 source code

gym Walker2d-v4 source code

  • Observation space: (17), first 8 elements for qpos[1:], next 9 elements for qvel;

  • Action space: (6), with range [-1, 1];

  • frame_skip: 4;

  • max_episode_steps: 1000;