The technical community has been looking forward to seeing how NVIDIA’s monster Hopper H100 Tensor Core GPU would perform ever since its March announcement at GTC2022. No one was disappointed when the latest round of its artificial intelligence tests performed under MLPerf v2.1 were published last week.
MLPerf v2.1 is a product of MLCommons. It provides benchmarking for machine learning models, software, and hardware (and energy consumption as an option). It is the industry benchmark for deep learning, AI training, AI inference, and HPC. This specific test, MLPerf Inference v2.1, measures inference performance and how fast a system can process inputs and produce results using a trained model.
Each benchmark test is defined by its dataset and quality target. The table above summarizes benchmarks used in MLPerf v2.1. Also shown in the NVIDIA results is DLRM (Deep Learning Recommendation Model), a recommendations model introduced by Facebook.
The NVIDIA H100 is NVIDIA’s ninth-generation data center GPU. Compared to NVIDIA’s previous generation, the A100 GPU, the H100 provides an order-of-magnitude greater performance for large-scale AI and HPC. Despite substantial software improvements in the architectural efficiency of the H100, its major design focus has been carried over from the A100.
H100 Supercharges NVIDIA AI
In the Data Center category, the NVIDIA H100 Tensor Core GPU delivered the highest per-accelerator performance across every workload for both the Server and Offline tests. It had up to 4.5x more performance in the Offline scenario and up to 3.9x more in the Server scenario than the A100 Tensor Core GPU.
NVIDIA attributes part of the superior performance of the H100 on the BERT NLP model to its Transformer Engine. The new engine, combined with NVIDIA Hopper FP8 Tensor Cores, delivers up to 9x faster AI training and 30x faster AI inference speedups on large language models as compared to the prior generation.
Speed is crucial because huge AI models can have trillions of parameters. The models are so large, it may require months to train one with that amount of data. NVIDIA’s transformer engine provides additional speed by using 16-bit floating-point precision and a new 8-bit floating-point data format that increases Tensor Core throughput by 2x and reduces memory requirements by 2x compared to 16-bit floating-point.
Those improvements, plus advanced Hopper software algorithms, speeds up AI performance and capabilities and allow it to train models within days or hours instead of months. The faster a model becomes operational, the earlier its ROI returns begin, and operational improvements can be implemented.
NVIDIA A100 continues high-level superior performance
Although the H100 is the latest GPU generation, a check of MLPerf v2.1 results confirms that NIVIDIA’s prior generation A100 GPU is still producing record results and high performance:
Orin at the edge continues energy improvements
NVIDIA Jetson AGX Orin Gets a Big Efficiency Boost
Edge computing is key to the success of many emerging applications with exponential growth. Orin is built for edge AI and robotic applications. In the previous MLPerf round, Orin performed up to 5x faster than its prior-generation Jetson AGX Xavier module. At the same time, Orin delivered an average of 2x better energy efficiency.
For MLPerf v2.1, NVIDIA Orin ran every MLPerf benchmark in edge computing, winning more tests than any other low-power system-on-a-chip. And although it was not the most energy-efficient, thanks to full-stack improvements, Orin has shown additional energy efficiency improvements of up to 50% compared to its earlier MLPerf results in April. Its efficiency is expected to continue to improve.
In MLPerf Inference v2.1, despite several significant model and dataset changes from v2.0, the first NVIDIA H100 submission set new per-accelerator performance records on all workloads in the data center scenario and delivered up to 4.5x higher performance than the A100. Its increased performance resulted from many Hopper architectural breakthroughs and software optimizations that leveraged the new capabilities. We look forward to seeing its results in the next round of MLPerf testing.
Paul Smith-Goodson is Vice President and Principal Analyst for quantum computing, artificial intelligence and space at Moor Insights and Strategy. You can follow him on Twitter for more current information on quantum, AI, and space.
Note: Moor Insights & Strategy writers and editors may have contributed to this article.
Moor Insights & Strategy, like all research and tech industry analyst firms, provides or has provided paid services to technology companies. These services include research, analysis, advising, consulting, benchmarking, acquisition matchmaking, and speaking sponsorships. The company has had or currently has paid business relationships with 8×8, Accenture, A10 Networks, Advanced Micro Devices, Amazon, Amazon Web Services, Ambient Scientific, Anuta Networks, Applied Brain Research, Applied Micro, Apstra, Arm, Aruba Networks (now HPE), Atom Computing, AT&T, Aura, Automation Anywhere, AWS, A-10 Strategies, Bitfusion, Blaize, Box, Broadcom, C3.AI, Calix, Campfire, Cisco Systems, Clear Software, Cloudera, Clumio, Cognitive Systems, CompuCom, Cradlepoint, CyberArk, Dell, Dell EMC, Dell Technologies, Diablo Technologies, Dialogue Group, Digital Optics, Dreamium Labs, D-Wave, Echelon, Ericsson, Extreme Networks, Five9, Flex, Foundries.io, Foxconn, Frame (now VMware), Fujitsu, Gen Z Consortium, Glue Networks, GlobalFoundries, Revolve (now Google), Google Cloud, Graphcore, Groq, Hiregenics, Hotwire Global, HP Inc., Hewlett Packard Enterprise, Honeywell, Huawei Technologies, IBM, Infinidat, Infosys, Inseego, IonQ, IonVR, Inseego, Infosys, Infiot, Intel, Interdigital, Jabil Circuit, Keysight, Konica Minolta, Lattice Semiconductor, Lenovo, Linux Foundation, Lightbits Labs, LogicMonitor, Luminar, MapBox, Marvell Technology, Mavenir, Marseille Inc, Mayfair Equity, Meraki (Cisco), Merck KGaA, Mesophere, Micron Technology, Microsoft, MiTEL, Mojo Networks, MongoDB, MulteFire Alliance, National Instruments, Neat, NetApp, Nightwatch, NOKIA (Alcatel-Lucent), Nortek, Novumind, NVIDIA, Nutanix, Nuvia (now Qualcomm), onsemi, ONUG, OpenStack Foundation, Oracle, Palo Alto Networks, Panasas, Peraso, Pexip, Pixelworks, Plume Design, PlusAI, Poly (formerly Plantronics), Portworx, Pure Storage, Qualcomm, Quantinuum, Rackspace, Rambus, Rayvolt E-Bikes, Red Hat, Renesas, Residio, Samsung Electronics, Samsung Semi, SAP, SAS, Scale Computing, Schneider Electric, SiFive, Silver Peak (now Aruba-HPE), SkyWorks, SONY Optical Storage, Splunk, Springpath (now Cisco), Spirent, Splunk, Sprint (now T-Mobile), Stratus Technologies, Symantec, Synaptics, Syniverse, Synopsys, Tanium, Telesign,TE Connectivity, TensTorrent, Tobii Technology, Teradata,T-Mobile, Treasure Data, Twitter, Unity Technologies, UiPath, Verizon Communications, VAST Data, Ventana Micro Systems, Vidyo, VMware, Wave Computing, Wellsmith, Xilinx, Zayo, Zebra, Zededa, Zendesk, Zoho, Zoom, and Zscaler. Moor Insights & Strategy founder, CEO, and Chief Analyst Patrick Moorhead is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX, and Movandi.
Moor Insights & Strategy founder, CEO, and Chief Analyst Patrick Moorhead is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX, and Movand