Benchmark
The following results are generated from four types of machine:
Personal laptop: 12 core
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, GTX1060Personal workstation: 32 core
AMD Ryzen 9 5950X 16-Core Processor, 2x RTX3090TPU-VM: 96 core
Intel(R) Xeon(R) CPU @ 2.00GHz, 2 NUMA core, TPU v3-8DGX-A100: 256 core
AMD EPYC 7742 64-Core Processor, 8 NUMA core, 8x A100
The historical numbers below were produced with PongNoFrameskip-v4 and
Ant-v3 on envpool==0.6.1.post1. The current benchmark scripts in this
directory use Gymnasium’s ALE/Pong-v5 and Ant-v5, and the baseline
dependencies plus shared benchmark tooling are installed from
requirements.txt:
$ pip install -r requirements.txt
test_gym.py uses only the packages above. test_envpool.py
additionally expects an installed EnvPool build with native modules, so
install the EnvPool wheel you want to benchmark separately before running it.
To align with other baseline results, FPS is multiplied with frame_skip (4
for Atari and 5 for Mujoco).
Highest FPS Overview
Atari Highest FPS |
Laptop (12) |
Workstation (32) |
TPU-VM (96) |
DGX-A100 (256) |
|---|---|---|---|---|
For-loop |
4,893 |
7,914 |
3,993 |
4,640 |
Subprocess |
15,863 |
47,699 |
46,910 |
71,943 |
Sample-Factory |
28,216 |
138,847 |
222,327 |
707,494 |
EnvPool (sync) |
37,396 |
133,824 |
170,380 |
427,851 |
EnvPool (async) |
49,439 |
200,428 |
359,559 |
891,286 |
EnvPool (numa+async) |
/ |
/ |
373,169 |
1,069,922 |
Mujoco Highest FPS |
Laptop (12) |
Workstation (32) |
TPU-VM (96) |
DGX-A100 (256) |
|---|---|---|---|---|
For-loop |
12,861 |
20,298 |
10,474 |
11,569 |
Subprocess |
36,586 |
105,432 |
87,403 |
163,656 |
Sample-Factory |
62,510 |
309,264 |
461,515 |
1,573,262 |
EnvPool (sync) |
66,622 |
380,950 |
296,681 |
949,787 |
EnvPool (async) |
105,126 |
582,446 |
887,540 |
2,363,864 |
EnvPool (numa+async) |
/ |
/ |
896,830 |
3,134,287 |

Testing Method and Command
All of the scripts are under benchmark/ folder. When increasing the number of envs, we also adjust the total number of steps to make each test run for about one minute.
For-loop
Command to run:
# atari
python3 test_gym.py --env atari --num-envs 12 --total-step 6000
# mujoco
python3 test_gym.py --env mujoco --num-envs 12 --total-step 12000
Subprocess (gymnasium.vector)
Command to run:
# atari
python3 test_gym.py --env atari --async_ --num-envs 10 --total-step 20000
# mujoco
python3 test_gym.py --env mujoco --async_ --num-envs 10 --total-step 50000
Sample Factory
Sample Factory remains a historical reference for now. The latest upstream
release still depends on gymnasium<1.0, numpy<2, and an older
ale-py build that is not available in the current development package mirror,
so it is not included in requirements.txt.
Command to run:
# atari
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=atari_pong --env_frameskip=4 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1600000
# mujoco
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=mujoco_ant --env_frameskip=1 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1000000
We found that num_envs_per_worker == 1 is best for all scenarios.
EnvPool
Install an EnvPool wheel for the version you want to benchmark before running the commands below.
sync
# atari
python3 test_envpool.py --env atari --num-envs 12 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 12 --batch-size 12
async
# atari
python3 test_envpool.py --env atari --num-envs 36 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 36 --batch-size 12
numa+async
Use numactl -s to determine the number of NUMA cores.
# atari
./numa_test.sh 8 python3 test_envpool.py --env atari --num-envs 100 --batch-size 32 --thread-affinity-offset -1
# mujoco
./numa_test.sh 8 python3 test_envpool.py --env mujoco --num-envs 100 --batch-size 32 --thread-affinity-offset -1
Brax and Isaac-gym (Mujoco only)
TODO
Atari and Mujoco Single Environment Tests
Atari and Mujoco (gym) single env test is the same as above with --num-envs 1.
For dm_control suite environment, we provide another benchmark script:
python3 test_dmc.py --domain cheetah --task run --total-step 200000
Result
Single Environment Speedup Baseline
System |
Method |
Atari Pong-v5 |
Mujoco Ant-v3 |
dm_control cheetah run |
|---|---|---|---|---|
Laptop |
Python |
4891.65 |
12325.95 |
6235.09 |
Laptop |
EnvPool |
7887.51 |
15641.44 |
11636.45 |
Laptop |
Speedup |
1.61x |
1.27x |
1.87x |
Workstation |
Python |
7739.15 |
19472.04 |
9042.64 |
Workstation |
EnvPool |
12623.93 |
25725.25 |
16691.68 |
Workstation |
Speedup |
1.63x |
1.32x |
1.85x |
TPU-VM |
Python |
3830.19 |
9960.98 |
5369.07 |
TPU-VM |
EnvPool |
7213.41 |
13706.61 |
9987.73 |
TPU-VM |
Speedup |
1.88x |
1.38x |
1.86x |
DGX-A100 |
Python |
4449.38 |
11018.57 |
5024.84 |
DGX-A100 |
EnvPool |
7723.96 |
16024.43 |
10415.87 |
DGX-A100 |
Speedup |
1.74x |
1.45x |
2.07x |
Atari
Atari - Laptop |
1 |
2 |
3 |
4 |
6 |
8 |
10 |
12 |
|---|---|---|---|---|---|---|---|---|
For-loop |
4745.54 |
4796.03 |
4694.94 |
4776.76 |
4811.98 |
4892.70 |
4795.49 |
4830.31 |
Subprocess |
4006.04 |
7274.79 |
10028.28 |
11251.66 |
12235.83 |
13280.10 |
15863.42 |
15658.02 |
Sample-Factory |
5844.7 |
11148.0 |
15567.5 |
18236.7 |
25879.3 |
26695.2 |
28216.4 |
28034.7 |
EnvPool (sync) |
7887.51 |
14605.92 |
20288.29 |
26427.86 |
33587.28 |
28602.50 |
34311.75 |
37395.68 |
EnvPool (async) |
10213.75 |
18880.65 |
26599.45 |
36375.89 |
48390.40 |
46921.23 |
47184.54 |
49438.56 |

Atari - Workstation |
1 |
2 |
4 |
8 |
12 |
16 |
20 |
24 |
28 |
32 |
|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
7739.15 |
7900.56 |
7853.82 |
7865.10 |
7914.04 |
7855.68 |
7587.67 |
7857.92 |
7635.10 |
7868.14 |
Subprocess |
7126.57 |
13086.18 |
23402.05 |
33733.84 |
39766.60 |
42567.05 |
30384.52 |
37224.14 |
46132.40 |
47699.40 |
Sample-Factory |
9259.5 |
18429.2 |
36776.8 |
71435.0 |
101555.5 |
106382.5 |
127522.5 |
131653.0 |
136605.7 |
138847.2 |
EnvPool (sync) |
12623.93 |
23416.68 |
44527.99 |
78612.10 |
105459.54 |
126382.48 |
106088.13 |
117524.07 |
127986.00 |
133824.37 |
EnvPool (async) |
14577.17 |
28383.39 |
55106.44 |
106992.10 |
153258.47 |
188554.16 |
192034.45 |
196540.73 |
200427.90 |
199684.50 |

Atari - TPU-VM |
1 |
2 |
4 |
8 |
16 |
24 |
32 |
48 |
64 |
80 |
96 |
|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
3830.19 |
3942.33 |
3993.01 |
3987.62 |
3967.83 |
3990.12 |
3976.47 |
3986.15 |
3946.44 |
3964.18 |
3973.26 |
Subprocess |
3361.86 |
6586.32 |
12341.66 |
21547.19 |
34152.83 |
34864.23 |
38675.01 |
45471.75 |
41927.33 |
45893.35 |
46910.45 |
Sample-Factory |
4906.3 |
9751.2 |
19450.3 |
38828.2 |
76206.7 |
108471.7 |
137571.6 |
203113.6 |
210596.9 |
217512.9 |
222327.4 |
EnvPool (sync) |
7213.41 |
13827.95 |
27057.69 |
47143.35 |
71660.49 |
98892.99 |
123136.03 |
148110.55 |
141873.23 |
159635.70 |
170380.26 |
EnvPool (async) |
8836.44 |
17815.91 |
35524.72 |
69888.53 |
127106.74 |
184798.27 |
246497.85 |
352195.40 |
354203.40 |
356793.59 |
359558.61 |
EnvPool (numa+async) |
/ |
17976.26 |
35761.01 |
71967.27 |
136663.09 |
196424.25 |
253789.56 |
368680.81 |
371798.47 |
373169.33 |
362744.14 |

Atari - DGX-A100 |
1 |
2 |
4 |
8 |
16 |
32 |
64 |
96 |
128 |
160 |
192 |
224 |
256 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
4449.38 |
4587.37 |
4620.44 |
4635.26 |
4617.21 |
4639.16 |
4618.30 |
4594.96 |
4629.90 |
4616.15 |
4640.20 |
4596.57 |
4620.50 |
Subprocess |
4052.06 |
7832.98 |
12460.71 |
18306.28 |
24754.34 |
33336.38 |
43208.56 |
52435.64 |
42449.85 |
32958.90 |
45312.39 |
45767.11 |
71942.74 |
Sample-Factory |
5563.2 |
11003.0 |
21976.3 |
43891.1 |
87702.0 |
175408.8 |
350855.5 |
476048.4 |
505494.8 |
616958.7 |
651428.8 |
679186.5 |
707494.3 |
EnvPool (sync) |
7723.96 |
14865.81 |
28499.79 |
52681.02 |
91970.45 |
155386.07 |
243231.45 |
304423.24 |
358549.95 |
367559.69 |
388419.70 |
427851.27 |
427395.89 |
EnvPool (async) |
8790.69 |
17866.75 |
36089.43 |
70749.63 |
139540.29 |
278186.45 |
451858.26 |
677504.68 |
817738.45 |
838174.97 |
881210.42 |
891286.00 |
874802.04 |
EnvPool (numa+async) |
/ |
/ |
/ |
70629.88 |
140528.93 |
279113.15 |
555426.41 |
762417.99 |
936443.47 |
955620.20 |
998668.02 |
1032953.80 |
1069921.98 |

Mujoco
Mujoco - Laptop |
1 |
2 |
3 |
4 |
6 |
8 |
10 |
12 |
|---|---|---|---|---|---|---|---|---|
For-loop |
12325.95 |
12453.54 |
12861.30 |
12517.09 |
12467.92 |
12447.57 |
12631.33 |
12576.39 |
Subprocess |
8377.65 |
14851.20 |
18479.33 |
23137.12 |
26667.67 |
29260.77 |
36586.01 |
31952.74 |
Sample-Factory |
13270.0 |
25452.0 |
34882.0 |
41666.5 |
58892.0 |
60657.5 |
62509.5 |
60847.0 |
EnvPool (sync) |
15641.44 |
30409.65 |
40063.78 |
43126.54 |
58395.28 |
53269.71 |
63424.83 |
66622.24 |
EnvPool (async) |
20922.70 |
41279.93 |
57362.56 |
73119.43 |
95542.45 |
105126.36 |
100771.24 |
101603.31 |

Mujoco - Workstation |
1 |
2 |
4 |
8 |
12 |
16 |
20 |
24 |
28 |
32 |
|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
19472.04 |
19251.41 |
19902.03 |
20076.99 |
19959.82 |
19513.40 |
19460.23 |
19724.42 |
20297.76 |
19797.03 |
Subprocess |
14428.85 |
26943.13 |
48700.27 |
71303.02 |
89901.77 |
102833.40 |
93676.48 |
97473.05 |
105432.15 |
102533.10 |
Sample-Factory |
20854.0 |
40113.5 |
78408.5 |
156563.0 |
225075.0 |
268005.5 |
284237.5 |
296082.5 |
305235.0 |
309264.5 |
EnvPool (sync) |
25725.25 |
50531.72 |
90808.85 |
180372.40 |
212389.98 |
309341.24 |
282954.27 |
326454.83 |
357376.48 |
380950.25 |
EnvPool (async) |
34500.65 |
68382.03 |
133496.84 |
265710.65 |
383015.28 |
478845.88 |
511142.63 |
538558.16 |
566014.54 |
582445.50 |

Mujoco - TPU-VM |
1 |
2 |
4 |
8 |
16 |
24 |
32 |
48 |
64 |
80 |
96 |
|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
9960.98 |
10239.58 |
10186.08 |
10473.73 |
10201.70 |
10370.85 |
10454.78 |
10460.48 |
10455.71 |
10360.71 |
10386.68 |
Subprocess |
7236.32 |
13788.93 |
25054.73 |
40668.40 |
64148.06 |
60409.58 |
70747.21 |
78947.79 |
87403.16 |
79734.62 |
81964.35 |
Sample-Factory |
11008.0 |
21368.0 |
42730.0 |
83475.5 |
153976.0 |
222311.5 |
280664.5 |
406916.5 |
432212.0 |
449143.0 |
461515.0 |
EnvPool (sync) |
13706.61 |
26587.92 |
49074.86 |
92444.28 |
155288.26 |
181397.00 |
231293.39 |
283748.86 |
250586.54 |
268296.99 |
296680.68 |
EnvPool (async) |
18195.81 |
37359.25 |
78337.13 |
148284.57 |
259915.75 |
386448.09 |
512987.78 |
745083.58 |
801768.88 |
857586.18 |
887539.80 |
EnvPool (numa+async) |
/ |
35804.57 |
75467.72 |
147281.29 |
284323.79 |
412165.16 |
516120.17 |
755509.66 |
816405.50 |
868455.12 |
896830.21 |

Mujoco - DGX-A100 |
1 |
2 |
4 |
8 |
16 |
32 |
64 |
96 |
128 |
160 |
192 |
224 |
256 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop |
11018.57 |
11269.45 |
11059.39 |
11250.06 |
11505.15 |
11328.79 |
11568.72 |
11485.74 |
11245.55 |
11478.49 |
11430.16 |
11151.71 |
11199.28 |
Subprocess |
8814.10 |
17201.64 |
27106.27 |
44383.63 |
62785.60 |
83054.19 |
151352.88 |
158797.86 |
148815.92 |
116200.41 |
163656.36 |
147653.41 |
161599.97 |
Sample-Factory |
11870.0 |
24602.0 |
48577.0 |
96826.5 |
193800.5 |
381208.5 |
761752.0 |
985909.0 |
1249369.5 |
1332128.5 |
1397427.5 |
1318249.0 |
1573262.0 |
EnvPool (sync) |
16024.43 |
31899.44 |
61605.04 |
114488.28 |
228492.88 |
388624.94 |
656277.80 |
832101.96 |
949787.15 |
858298.85 |
945808.57 |
813799.36 |
849410.96 |
EnvPool (async) |
21177.71 |
44025.65 |
92312.35 |
176135.82 |
354006.02 |
700052.08 |
1167838.03 |
1678787.71 |
1730102.62 |
2052844.58 |
2185146.77 |
2355604.96 |
2363863.67 |
EnvPool (numa+async) |
/ |
/ |
/ |
170348.47 |
340269.34 |
693793.45 |
1388410.00 |
1920762.84 |
2341562.20 |
2569997.03 |
2776143.15 |
2964886.91 |
3134286.77 |
