Benchmark ========= The following results are generated from four types of machine: 1. Personal laptop: 12 core ``Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz``, GTX1060 2. Personal workstation: 32 core ``AMD Ryzen 9 5950X 16-Core Processor``, 2x RTX3090 3. TPU-VM: 96 core ``Intel(R) Xeon(R) CPU @ 2.00GHz``, 2 NUMA core, TPU v3-8 4. DGX-A100: 256 core ``AMD EPYC 7742 64-Core Processor``, 8 NUMA core, 8x A100 The historical numbers below were produced with ``PongNoFrameskip-v4`` and ``Ant-v3`` on ``envpool==0.6.1.post1``. The current benchmark scripts in this directory use Gymnasium's ``ALE/Pong-v5`` and ``Ant-v5``, and the baseline dependencies plus shared benchmark tooling are installed from ``requirements.txt``: .. code:: bash $ pip install -r requirements.txt ``test_gym.py`` uses only the packages above. ``test_envpool.py`` additionally expects an installed EnvPool build with native modules, so install the EnvPool wheel you want to benchmark separately before running it. To align with other baseline results, FPS is multiplied with ``frame_skip`` (4 for Atari and 5 for Mujoco). Highest FPS Overview -------------------- ==================== =========== ================ =========== ============== Atari Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256) ==================== =========== ================ =========== ============== For-loop 4,893 7,914 3,993 4,640 Subprocess 15,863 47,699 46,910 71,943 Sample-Factory 28,216 138,847 222,327 707,494 EnvPool (sync) 37,396 133,824 170,380 427,851 EnvPool (async) **49,439** **200,428** 359,559 891,286 EnvPool (numa+async) / / **373,169** **1,069,922** ==================== =========== ================ =========== ============== ==================== =========== ================ =========== ============== Mujoco Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256) ==================== =========== ================ =========== ============== For-loop 12,861 20,298 10,474 11,569 Subprocess 36,586 105,432 87,403 163,656 Sample-Factory 62,510 309,264 461,515 1,573,262 EnvPool (sync) 66,622 380,950 296,681 949,787 EnvPool (async) **105,126** **582,446** 887,540 2,363,864 EnvPool (numa+async) / / **896,830** **3,134,287** ==================== =========== ================ =========== ============== |image0| Testing Method and Command -------------------------- All of the scripts are under `benchmark/ `__ folder. When increasing the number of envs, we also adjust the total number of steps to make each test run for about one minute. For-loop ~~~~~~~~ Command to run: .. code:: bash # atari python3 test_gym.py --env atari --num-envs 12 --total-step 6000 # mujoco python3 test_gym.py --env mujoco --num-envs 12 --total-step 12000 Subprocess (gymnasium.vector) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Command to run: .. code:: bash # atari python3 test_gym.py --env atari --async_ --num-envs 10 --total-step 20000 # mujoco python3 test_gym.py --env mujoco --async_ --num-envs 10 --total-step 50000 Sample Factory ~~~~~~~~~~~~~~ Sample Factory remains a historical reference for now. The latest upstream release still depends on ``gymnasium<1.0``, ``numpy<2``, and an older ``ale-py`` build that is not available in the current development package mirror, so it is not included in ``requirements.txt``. Command to run: .. code:: bash # atari python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=atari_pong --env_frameskip=4 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1600000 # mujoco python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=mujoco_ant --env_frameskip=1 --num_workers=12 --num_envs_per_worker=1 --sample_env_frames=1000000 We found that ``num_envs_per_worker == 1`` is best for all scenarios. .. raw:: html EnvPool ~~~~~~~ Install an EnvPool wheel for the version you want to benchmark before running the commands below. .. raw:: html sync ^^^^ .. code:: bash # atari python3 test_envpool.py --env atari --num-envs 12 --batch-size 12 # mujoco python3 test_envpool.py --env mujoco --num-envs 12 --batch-size 12 async ^^^^^ .. code:: bash # atari python3 test_envpool.py --env atari --num-envs 36 --batch-size 12 # mujoco python3 test_envpool.py --env mujoco --num-envs 36 --batch-size 12 numa+async ^^^^^^^^^^ Use ``numactl -s`` to determine the number of NUMA cores. .. code:: bash # atari ./numa_test.sh 8 python3 test_envpool.py --env atari --num-envs 100 --batch-size 32 --thread-affinity-offset -1 # mujoco ./numa_test.sh 8 python3 test_envpool.py --env mujoco --num-envs 100 --batch-size 32 --thread-affinity-offset -1 Brax and Isaac-gym (Mujoco only) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TODO Atari and Mujoco Single Environment Tests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Atari and Mujoco (gym) single env test is the same as above with ``--num-envs 1``. For dm_control suite environment, we provide another benchmark script: .. code:: bash python3 test_dmc.py --domain cheetah --task run --total-step 200000 Result ------ Single Environment Speedup Baseline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html =========== ======= ============= ============= ====================== System Method Atari Pong-v5 Mujoco Ant-v3 dm_control cheetah run =========== ======= ============= ============= ====================== Laptop Python 4891.65 12325.95 6235.09 Laptop EnvPool 7887.51 15641.44 11636.45 Laptop Speedup 1.61x 1.27x 1.87x Workstation Python 7739.15 19472.04 9042.64 Workstation EnvPool 12623.93 25725.25 16691.68 Workstation Speedup 1.63x 1.32x 1.85x TPU-VM Python 3830.19 9960.98 5369.07 TPU-VM EnvPool 7213.41 13706.61 9987.73 TPU-VM Speedup 1.88x 1.38x 1.86x DGX-A100 Python 4449.38 11018.57 5024.84 DGX-A100 EnvPool 7723.96 16024.43 10415.87 DGX-A100 Speedup 1.74x 1.45x 2.07x =========== ======= ============= ============= ====================== .. raw:: html Atari ~~~~~ .. raw:: html =============== ======== ======== ======== ======== ======== ======== ======== ======== Atari - Laptop 1 2 3 4 6 8 10 12 =============== ======== ======== ======== ======== ======== ======== ======== ======== For-loop 4745.54 4796.03 4694.94 4776.76 4811.98 4892.70 4795.49 4830.31 Subprocess 4006.04 7274.79 10028.28 11251.66 12235.83 13280.10 15863.42 15658.02 Sample-Factory 5844.7 11148.0 15567.5 18236.7 25879.3 26695.2 28216.4 28034.7 EnvPool (sync) 7887.51 14605.92 20288.29 26427.86 33587.28 28602.50 34311.75 37395.68 EnvPool (async) 10213.75 18880.65 26599.45 36375.89 48390.40 46921.23 47184.54 49438.56 =============== ======== ======== ======== ======== ======== ======== ======== ======== .. raw:: html |image1| .. raw:: html =================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= Atari - Workstation 1 2 4 8 12 16 20 24 28 32 =================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= For-loop 7739.15 7900.56 7853.82 7865.10 7914.04 7855.68 7587.67 7857.92 7635.10 7868.14 Subprocess 7126.57 13086.18 23402.05 33733.84 39766.60 42567.05 30384.52 37224.14 46132.40 47699.40 Sample-Factory 9259.5 18429.2 36776.8 71435.0 101555.5 106382.5 127522.5 131653.0 136605.7 138847.2 EnvPool (sync) 12623.93 23416.68 44527.99 78612.10 105459.54 126382.48 106088.13 117524.07 127986.00 133824.37 EnvPool (async) 14577.17 28383.39 55106.44 106992.10 153258.47 188554.16 192034.45 196540.73 200427.90 199684.50 =================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= .. raw:: html |image2| .. raw:: html ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= Atari - TPU-VM 1 2 4 8 16 24 32 48 64 80 96 ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= For-loop 3830.19 3942.33 3993.01 3987.62 3967.83 3990.12 3976.47 3986.15 3946.44 3964.18 3973.26 Subprocess 3361.86 6586.32 12341.66 21547.19 34152.83 34864.23 38675.01 45471.75 41927.33 45893.35 46910.45 Sample-Factory 4906.3 9751.2 19450.3 38828.2 76206.7 108471.7 137571.6 203113.6 210596.9 217512.9 222327.4 EnvPool (sync) 7213.41 13827.95 27057.69 47143.35 71660.49 98892.99 123136.03 148110.55 141873.23 159635.70 170380.26 EnvPool (async) 8836.44 17815.91 35524.72 69888.53 127106.74 184798.27 246497.85 352195.40 354203.40 356793.59 359558.61 EnvPool (numa+async) / 17976.26 35761.01 71967.27 136663.09 196424.25 253789.56 368680.81 371798.47 373169.33 362744.14 ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= .. raw:: html |image3| .. raw:: html ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========== ========== Atari - DGX-A100 1 2 4 8 16 32 64 96 128 160 192 224 256 ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========== ========== For-loop 4449.38 4587.37 4620.44 4635.26 4617.21 4639.16 4618.30 4594.96 4629.90 4616.15 4640.20 4596.57 4620.50 Subprocess 4052.06 7832.98 12460.71 18306.28 24754.34 33336.38 43208.56 52435.64 42449.85 32958.90 45312.39 45767.11 71942.74 Sample-Factory 5563.2 11003.0 21976.3 43891.1 87702.0 175408.8 350855.5 476048.4 505494.8 616958.7 651428.8 679186.5 707494.3 EnvPool (sync) 7723.96 14865.81 28499.79 52681.02 91970.45 155386.07 243231.45 304423.24 358549.95 367559.69 388419.70 427851.27 427395.89 EnvPool (async) 8790.69 17866.75 36089.43 70749.63 139540.29 278186.45 451858.26 677504.68 817738.45 838174.97 881210.42 891286.00 874802.04 EnvPool (numa+async) / / / 70629.88 140528.93 279113.15 555426.41 762417.99 936443.47 955620.20 998668.02 1032953.80 1069921.98 ==================== ======= ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========== ========== .. raw:: html |image4| Mujoco ~~~~~~ .. raw:: html =============== ======== ======== ======== ======== ======== ========= ========= ========= Mujoco - Laptop 1 2 3 4 6 8 10 12 =============== ======== ======== ======== ======== ======== ========= ========= ========= For-loop 12325.95 12453.54 12861.30 12517.09 12467.92 12447.57 12631.33 12576.39 Subprocess 8377.65 14851.20 18479.33 23137.12 26667.67 29260.77 36586.01 31952.74 Sample-Factory 13270.0 25452.0 34882.0 41666.5 58892.0 60657.5 62509.5 60847.0 EnvPool (sync) 15641.44 30409.65 40063.78 43126.54 58395.28 53269.71 63424.83 66622.24 EnvPool (async) 20922.70 41279.93 57362.56 73119.43 95542.45 105126.36 100771.24 101603.31 =============== ======== ======== ======== ======== ======== ========= ========= ========= .. raw:: html |image5| .. raw:: html ==================== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= Mujoco - Workstation 1 2 4 8 12 16 20 24 28 32 ==================== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= For-loop 19472.04 19251.41 19902.03 20076.99 19959.82 19513.40 19460.23 19724.42 20297.76 19797.03 Subprocess 14428.85 26943.13 48700.27 71303.02 89901.77 102833.40 93676.48 97473.05 105432.15 102533.10 Sample-Factory 20854.0 40113.5 78408.5 156563.0 225075.0 268005.5 284237.5 296082.5 305235.0 309264.5 EnvPool (sync) 25725.25 50531.72 90808.85 180372.40 212389.98 309341.24 282954.27 326454.83 357376.48 380950.25 EnvPool (async) 34500.65 68382.03 133496.84 265710.65 383015.28 478845.88 511142.63 538558.16 566014.54 582445.50 ==================== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= .. raw:: html |image6| .. raw:: html ==================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= Mujoco - TPU-VM 1 2 4 8 16 24 32 48 64 80 96 ==================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= For-loop 9960.98 10239.58 10186.08 10473.73 10201.70 10370.85 10454.78 10460.48 10455.71 10360.71 10386.68 Subprocess 7236.32 13788.93 25054.73 40668.40 64148.06 60409.58 70747.21 78947.79 87403.16 79734.62 81964.35 Sample-Factory 11008.0 21368.0 42730.0 83475.5 153976.0 222311.5 280664.5 406916.5 432212.0 449143.0 461515.0 EnvPool (sync) 13706.61 26587.92 49074.86 92444.28 155288.26 181397.00 231293.39 283748.86 250586.54 268296.99 296680.68 EnvPool (async) 18195.81 37359.25 78337.13 148284.57 259915.75 386448.09 512987.78 745083.58 801768.88 857586.18 887539.80 EnvPool (numa+async) / 35804.57 75467.72 147281.29 284323.79 412165.16 516120.17 755509.66 816405.50 868455.12 896830.21 ==================== ======== ======== ======== ========= ========= ========= ========= ========= ========= ========= ========= .. raw:: html |image7| .. raw:: html ==================== ======== ======== ======== ========= ========= ========= ========== ========== ========== ========== ========== ========== ========== Mujoco - DGX-A100 1 2 4 8 16 32 64 96 128 160 192 224 256 ==================== ======== ======== ======== ========= ========= ========= ========== ========== ========== ========== ========== ========== ========== For-loop 11018.57 11269.45 11059.39 11250.06 11505.15 11328.79 11568.72 11485.74 11245.55 11478.49 11430.16 11151.71 11199.28 Subprocess 8814.10 17201.64 27106.27 44383.63 62785.60 83054.19 151352.88 158797.86 148815.92 116200.41 163656.36 147653.41 161599.97 Sample-Factory 11870.0 24602.0 48577.0 96826.5 193800.5 381208.5 761752.0 985909.0 1249369.5 1332128.5 1397427.5 1318249.0 1573262.0 EnvPool (sync) 16024.43 31899.44 61605.04 114488.28 228492.88 388624.94 656277.80 832101.96 949787.15 858298.85 945808.57 813799.36 849410.96 EnvPool (async) 21177.71 44025.65 92312.35 176135.82 354006.02 700052.08 1167838.03 1678787.71 1730102.62 2052844.58 2185146.77 2355604.96 2363863.67 EnvPool (numa+async) / / / 170348.47 340269.34 693793.45 1388410.00 1920762.84 2341562.20 2569997.03 2776143.15 2964886.91 3134286.77 ==================== ======== ======== ======== ========= ========= ========= ========== ========== ========== ========== ========== ========== ========== .. raw:: html |image8| .. |image0| image:: ../_static/images/throughput/throughput.png .. |image1| image:: ../_static/images/throughput/Atari_Laptop.png .. |image2| image:: ../_static/images/throughput/Atari_Workstation.png .. |image3| image:: ../_static/images/throughput/Atari_TPU-VM.png .. |image4| image:: ../_static/images/throughput/Atari_DGX-A100.png .. |image5| image:: ../_static/images/throughput/Mujoco_Laptop.png .. |image6| image:: ../_static/images/throughput/Mujoco_Workstation.png .. |image7| image:: ../_static/images/throughput/Mujoco_TPU-VM.png .. |image8| image:: ../_static/images/throughput/Mujoco_DGX-A100.png