Mujoco (gym)
We use mujoco==3.6.0 as the codebase.
See https://github.com/google-deepmind/mujoco/tree/3.6.0
The implementation follows Gymnasium *-v4/*-v5 environments, see reference.
EnvPool exposes the current official Gymnasium task IDs (*-v4 and *-v5) and
keeps the historical *-v2/*-v3 IDs as backward-compatible aliases where they
existed previously.
Set post_constraint=False to match the pre-v5 observation behavior where
applicable.
Render Compare
Representative first-frame compares for MuJoCo gym tasks that support rendering. In each panel, EnvPool is on the left and the Gymnasium reference renderer is on the right.
Ant-v4/v5
Observation space (v4/v5):
(27), first 13 elements forqpos[2:], next 14 elements forqvel;Action space:
(8), with range[-1, 1];frame_skip: 5;max_episode_steps: 1000;reward_threshold: 6000.0;
The legacy Ant-v3 alias keeps the historical 111-dimensional observation
that includes clipped cfrc_ext contact-force features.
HalfCheetah-v4/v5
gymnasium HalfCheetah-v4 source code
gymnasium HalfCheetah-v5 source code
Observation space:
(17), first 8 elements forqpos[1:], next 9 elements forqvel;Action space:
(6), with range[-1, 1];frame_skip: 5;max_episode_steps: 1000;reward_threshold: 4800.0;
Hopper-v4/v5
gymnasium Hopper-v4 source code
gymnasium Hopper-v5 source code
Observation space:
(11), first 5 elements forqpos[1:], next 6 elements forqvel;Action space:
(3), with range[-1, 1];frame_skip: 4;max_episode_steps: 1000;reward_threshold: 6000.0;
Humanoid-v4/v5, HumanoidStandup-v4/v5
gymnasium Humanoid-v4 source code
gymnasium Humanoid-v5 source code
gymnasium HumanoidStandup-v4 source code
gymnasium HumanoidStandup-v5 source code
Observation space:
(376), first 22 elements forqpos[2:], next 23 elements forqvel, next 140 elements forcinert(com-based body inertia and mass), next 84 elements forcvel(com-based velocity [3D rot; 3D tran]), next 23 elements forqfrc_actuator(actuator force), next 84 elements forcfrc_ext(com-based external force on body);Action space:
(17), with range[-0.4, 0.4];frame_skip: 5;max_episode_steps: 1000;
InvertedDoublePendulum-v4/v5
gymnasium InvertedDoublePendulum-v4 source code
gymnasium InvertedDoublePendulum-v5 source code
Observation space:
(11), first 1 element forqpos[0], next 2 elements forsin(qpos[1:]), next 2 elements forcos(qpos[1:]), next 3 elements forqvel, next 3 elements forqfrc_constraint;Action space:
(1), with range[-1, 1];frame_skip: 5;max_episode_steps: 1000;reward_threshold: 9100.0;
InvertedPendulum-v4/v5
gymnasium InvertedPendulum-v4 source code
gymnasium InvertedPendulum-v5 source code
Observation space:
(4), first 2 elements forqpos, next 2 elements forqvel;Action space:
(1), with range[-3, 3];frame_skip: 2;max_episode_steps: 1000;reward_threshold: 950.0;
Pusher-v4/v5
gymnasium Pusher-v4 source code
gymnasium Pusher-v5 source code
Observation space:
(23), first 7 elements forqpos[:7], next 7 elements forqvel[:7], next 3 elements fortips_arm, next 3 elements forobject, next 3 elements forgoal;Action space:
(7), with range[-2, 2];frame_skip: 5;max_episode_steps: 100;reward_threshold: 0.0;
Gymnasium’s official Pusher-v4 reference env raises ImportError under
mujoco>=3. EnvPool still exposes the official task ID for compatibility,
and the parity test only validates space alignment when Gymnasium’s reference
can be instantiated.
Reacher-v4/v5
gymnasium Reacher-v4 source code
gymnasium Reacher-v5 source code
Observation space:
(11), first 2 elements forcos(qpos[:2]), next 2 elements forsin(qpos[:2]), next 2 elements forqpos[2:], next 2 elements forqvel[:2], next 3 elements fordist, a.k.a.fingertip - target;Action space:
(2), with range[-1, 1];frame_skip: 2;max_episode_steps: 50;reward_threshold: -3.75;
Swimmer-v4/v5
gymnasium Swimmer-v4 source code
gymnasium Swimmer-v5 source code
Observation space:
(8), first 3 elements forqpos[2:], next 5 elements forqvel;Action space:
(2), with range[-1, 1];frame_skip: 4;max_episode_steps: 1000;reward_threshold: 360.0;
Walker2d-v4/v5
gymnasium Walker2d-v4 source code
gymnasium Walker2d-v5 source code
Observation space:
(17), first 8 elements forqpos[1:], next 9 elements forqvel;Action space:
(6), with range[-1, 1];frame_skip: 4;max_episode_steps: 1000;