Benchmark

The following results are generated from four types of machine:

  1. Personal laptop: 12 core Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, GTX1060

  2. Personal workstation: 32 core AMD Ryzen 9 5950X 16-Core Processor, 2x RTX3090

  3. TPU-VM: 96 core Intel(R) Xeon(R) CPU @ 2.00GHz, 2 NUMA core, TPU v3-8

  4. DGX-A100: 256 core AMD EPYC 7742 64-Core Processor, 8 NUMA core, 8x A100

We use PongNoFrameskip-v4 (with environment wrappers from OpenAI baselines) and Ant-v3 for Atari/Mujoco environment benchmark test with envpool==0.6.1.post1. Other packages’ versions are all in requirements.txt:

$ pip install -r requirements.txt

To align with other baseline results, FPS is multiplied with frame_skip (4 for PongNoFrameskip-v4 and 5 for Ant-v3).

Highest FPS Overview

Atari Highest FPS

Laptop (12)

Workstation (32)

TPU-VM (96)

DGX-A100 (256)

For-loop

4,893

7,914

3,993

4,640

Subprocess

15,863

47,699

46,910

71,943

Sample-Factory

28,216

138,847

222,327

707,494

EnvPool (sync)

37,396

133,824

170,380

427,851

EnvPool (async)

49,439

200,428

359,559

891,286

EnvPool (numa+async)

/

/

373,169

1,069,922

Mujoco Highest FPS

Laptop (12)

Workstation (32)

TPU-VM (96)

DGX-A100 (256)

For-loop

12,861

20,298

10,474

11,569

Subprocess

36,586

105,432

87,403

163,656

Sample-Factory

62,510

309,264

461,515

1,573,262

EnvPool (sync)

66,622

380,950

296,681

949,787

EnvPool (async)

105,126

582,446

887,540

2,363,864

EnvPool (numa+async)

/

/

896,830

3,134,287

image0

Testing Method and Command

All of the scripts are under benchmark/ folder. When increasing the number of envs, we also adjust the total number of steps to make each test run for about one minute.

For-loop

Command to run:

# atari
python3 test_gym.py --env atari --num-envs 12 --total-step 6000
# mujoco
python3 test_gym.py --env mujoco --num-envs 12 --total-step 12000

Subprocess (gym.vector_env)

Command to run:

# atari
python3 test_gym.py --env atari --async_ --num-envs 10 --total-step 20000
# mujoco
python3 test_gym.py --env mujoco --async_ --num-envs 10 --total-step 50000

Sample Factory

To run with Ant-v3 in Sample Factory, add one line in sample_factory/envs/mujoco/mujoco_utils.py:

 MUJOCO_ENVS = [
+    MujocoSpec('mujoco_ant', 'Ant-v3'),
     MujocoSpec('mujoco_hopper', 'Hopper-v2'),
     MujocoSpec('mujoco_halfcheetah', 'HalfCheetah-v2'),
     MujocoSpec('mujoco_humanoid', 'Humanoid-v2'),
 ]

and finally use FPS * 5 as the result.

Command to run:

# atari
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=atari_pong --env_frameskip=4 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1600000
# mujoco
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=mujoco_ant --env_frameskip=1 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1000000

We found that num_envs_per_worker == 1 is best for all scenarios.

EnvPool

sync

# atari
python3 test_envpool.py --env atari --num-envs 12 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 12 --batch-size 12

async

# atari
python3 test_envpool.py --env atari --num-envs 36 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 36 --batch-size 12

numa+async

Use numactl -s to determine the number of NUMA cores.

# atari
./numa_test.sh 8 python3 test_envpool.py --env atari --num-envs 100 --batch-size 32 --thread-affinity-offset -1
# mujoco
./numa_test.sh 8 python3 test_envpool.py --env mujoco --num-envs 100 --batch-size 32 --thread-affinity-offset -1

Brax and Isaac-gym (Mujoco only)

TODO

Atari and Mujoco Single Environment Tests

Atari and Mujoco (gym) single env test is the same as above with --num-envs 1.

For dm_control suite environment, we provide another benchmark script:

python3 test_dmc.py --domain cheetah --task run --total-step 200000

Result

Single Environment Speedup Baseline

System

Method

Atari Pong-v5

Mujoco Ant-v3

dm_control cheetah run

Laptop

Python

4891.65

12325.95

6235.09

Laptop

EnvPool

7887.51

15641.44

11636.45

Laptop

Speedup

1.61x

1.27x

1.87x

Workstation

Python

7739.15

19472.04

9042.64

Workstation

EnvPool

12623.93

25725.25

16691.68

Workstation

Speedup

1.63x

1.32x

1.85x

TPU-VM

Python

3830.19

9960.98

5369.07

TPU-VM

EnvPool

7213.41

13706.61

9987.73

TPU-VM

Speedup

1.88x

1.38x

1.86x

DGX-A100

Python

4449.38

11018.57

5024.84

DGX-A100

EnvPool

7723.96

16024.43

10415.87

DGX-A100

Speedup

1.74x

1.45x

2.07x

Atari

Atari - Laptop

1

2

3

4

6

8

10

12

For-loop

4745.54

4796.03

4694.94

4776.76

4811.98

4892.70

4795.49

4830.31

Subprocess

4006.04

7274.79

10028.28

11251.66

12235.83

13280.10

15863.42

15658.02

Sample-Factory

5844.7

11148.0

15567.5

18236.7

25879.3

26695.2

28216.4

28034.7

EnvPool (sync)

7887.51

14605.92

20288.29

26427.86

33587.28

28602.50

34311.75

37395.68

EnvPool (async)

10213.75

18880.65

26599.45

36375.89

48390.40

46921.23

47184.54

49438.56

image1

Atari - Workstation

1

2

4

8

12

16

20

24

28

32

For-loop

7739.15

7900.56

7853.82

7865.10

7914.04

7855.68

7587.67

7857.92

7635.10

7868.14

Subprocess

7126.57

13086.18

23402.05

33733.84

39766.60

42567.05

30384.52

37224.14

46132.40

47699.40

Sample-Factory

9259.5

18429.2

36776.8

71435.0

101555.5

106382.5

127522.5

131653.0

136605.7

138847.2

EnvPool (sync)

12623.93

23416.68

44527.99

78612.10

105459.54

126382.48

106088.13

117524.07

127986.00

133824.37

EnvPool (async)

14577.17

28383.39

55106.44

106992.10

153258.47

188554.16

192034.45

196540.73

200427.90

199684.50

image2

Atari - TPU-VM

1

2

4

8

16

24

32

48

64

80

96

For-loop

3830.19

3942.33

3993.01

3987.62

3967.83

3990.12

3976.47

3986.15

3946.44

3964.18

3973.26

Subprocess

3361.86

6586.32

12341.66

21547.19

34152.83

34864.23

38675.01

45471.75

41927.33

45893.35

46910.45

Sample-Factory

4906.3

9751.2

19450.3

38828.2

76206.7

108471.7

137571.6

203113.6

210596.9

217512.9

222327.4

EnvPool (sync)

7213.41

13827.95

27057.69

47143.35

71660.49

98892.99

123136.03

148110.55

141873.23

159635.70

170380.26

EnvPool (async)

8836.44

17815.91

35524.72

69888.53

127106.74

184798.27

246497.85

352195.40

354203.40

356793.59

359558.61

EnvPool (numa+async)

/

17976.26

35761.01

71967.27

136663.09

196424.25

253789.56

368680.81

371798.47

373169.33

362744.14

image3

Atari - DGX-A100

1

2

4

8

16

32

64

96

128

160

192

224

256

For-loop

4449.38

4587.37

4620.44

4635.26

4617.21

4639.16

4618.30

4594.96

4629.90

4616.15

4640.20

4596.57

4620.50

Subprocess

4052.06

7832.98

12460.71

18306.28

24754.34

33336.38

43208.56

52435.64

42449.85

32958.90

45312.39

45767.11

71942.74

Sample-Factory

5563.2

11003.0

21976.3

43891.1

87702.0

175408.8

350855.5

476048.4

505494.8

616958.7

651428.8

679186.5

707494.3

EnvPool (sync)

7723.96

14865.81

28499.79

52681.02

91970.45

155386.07

243231.45

304423.24

358549.95

367559.69

388419.70

427851.27

427395.89

EnvPool (async)

8790.69

17866.75

36089.43

70749.63

139540.29

278186.45

451858.26

677504.68

817738.45

838174.97

881210.42

891286.00

874802.04

EnvPool (numa+async)

/

/

/

70629.88

140528.93

279113.15

555426.41

762417.99

936443.47

955620.20

998668.02

1032953.80

1069921.98

image4

Mujoco

Mujoco - Laptop

1

2

3

4

6

8

10

12

For-loop

12325.95

12453.54

12861.30

12517.09

12467.92

12447.57

12631.33

12576.39

Subprocess

8377.65

14851.20

18479.33

23137.12

26667.67

29260.77

36586.01

31952.74

Sample-Factory

13270.0

25452.0

34882.0

41666.5

58892.0

60657.5

62509.5

60847.0

EnvPool (sync)

15641.44

30409.65

40063.78

43126.54

58395.28

53269.71

63424.83

66622.24

EnvPool (async)

20922.70

41279.93

57362.56

73119.43

95542.45

105126.36

100771.24

101603.31

image5

Mujoco - Workstation

1

2

4

8

12

16

20

24

28

32

For-loop

19472.04

19251.41

19902.03

20076.99

19959.82

19513.40

19460.23

19724.42

20297.76

19797.03

Subprocess

14428.85

26943.13

48700.27

71303.02

89901.77

102833.40

93676.48

97473.05

105432.15

102533.10

Sample-Factory

20854.0

40113.5

78408.5

156563.0

225075.0

268005.5

284237.5

296082.5

305235.0

309264.5

EnvPool (sync)

25725.25

50531.72

90808.85

180372.40

212389.98

309341.24

282954.27

326454.83

357376.48

380950.25

EnvPool (async)

34500.65

68382.03

133496.84

265710.65

383015.28

478845.88

511142.63

538558.16

566014.54

582445.50

image6

Mujoco - TPU-VM

1

2

4

8

16

24

32

48

64

80

96

For-loop

9960.98

10239.58

10186.08

10473.73

10201.70

10370.85

10454.78

10460.48

10455.71

10360.71

10386.68

Subprocess

7236.32

13788.93

25054.73

40668.40

64148.06

60409.58

70747.21

78947.79

87403.16

79734.62

81964.35

Sample-Factory

11008.0

21368.0

42730.0

83475.5

153976.0

222311.5

280664.5

406916.5

432212.0

449143.0

461515.0

EnvPool (sync)

13706.61

26587.92

49074.86

92444.28

155288.26

181397.00

231293.39

283748.86

250586.54

268296.99

296680.68

EnvPool (async)

18195.81

37359.25

78337.13

148284.57

259915.75

386448.09

512987.78

745083.58

801768.88

857586.18

887539.80

EnvPool (numa+async)

/

35804.57

75467.72

147281.29

284323.79

412165.16

516120.17

755509.66

816405.50

868455.12

896830.21

image7

Mujoco - DGX-A100

1

2

4

8

16

32

64

96

128

160

192

224

256

For-loop

11018.57

11269.45

11059.39

11250.06

11505.15

11328.79

11568.72

11485.74

11245.55

11478.49

11430.16

11151.71

11199.28

Subprocess

8814.10

17201.64

27106.27

44383.63

62785.60

83054.19

151352.88

158797.86

148815.92

116200.41

163656.36

147653.41

161599.97

Sample-Factory

11870.0

24602.0

48577.0

96826.5

193800.5

381208.5

761752.0

985909.0

1249369.5

1332128.5

1397427.5

1318249.0

1573262.0

EnvPool (sync)

16024.43

31899.44

61605.04

114488.28

228492.88

388624.94

656277.80

832101.96

949787.15

858298.85

945808.57

813799.36

849410.96

EnvPool (async)

21177.71

44025.65

92312.35

176135.82

354006.02

700052.08

1167838.03

1678787.71

1730102.62

2052844.58

2185146.77

2355604.96

2363863.67

EnvPool (numa+async)

/

/

/

170348.47

340269.34

693793.45

1388410.00

1920762.84

2341562.20

2569997.03

2776143.15

2964886.91

3134286.77

image8