Gaudi 3 Competes On Inference Efficiency

Author: Ayush Jain

In the datacenter AI market, NVIDIA has taken several leaps forward—first with Ampere, then Hopper, and now Blackwell. Other AI accelerator vendors have attempted to keep pace, with some coming close, such as Intel’s Gaudi 2, which offered performance comparable to NVIDIA’s A100. To gain a foothold in the AI accelerator market, Intel is back with its next-generation Gaudi accelerator, the Gaudi 3. In its maximum configuration, the chip delivers twice the FP8 performance of its predecessor while pushing the open accelerator module (OAM) power consumption to a relatively modest, 900 W. Gaudi 3 is expected to ship in Q3 2024, and Intel expects it to generate $500 million in sales this year.

Utilizing a heterogeneous architecture, Gaudi 3 features two compute dies, connected via an interconnect, on a 5nm process. Between the two dies, the new accelerator quadruples the number of matrix multiplication engines (MMEs) engines and contains 2.7 times the core count of Gaudi 2.

Gaudi 3 contains 96 MB of on-die SRAM and increases the amount of high bandwidth memory (HBM) by 33% to 128 GB, while providing 1.5x more HBM bandwidth. The increased compute and memory capacity also results in power rising to 900 W TDP in standard mezzanine form, 50% more than Gaudi 2. The networking subsystem is upgraded to twenty-four 200-GbE network interface controller (NIC) ports offering an aggregated bandwidth of 4.8 TB/s in each direction, optimized for training large networks. The NIC ports in the accelerator module provide scale-out support connecting multiple Intel Gaudi 3 accelerators in a server.

The company offers Gaudi 3 in a custom OAM module or a standard PCIe card; the latter delivers similar peak performance for all supported datatypes at 600 W. The OAM card can combine with up to 8,192 other modules over a 1,028-node cluster using standard Ethernet switches. MLPerf projected benchmarks show Gaudi 3 OAM clusters achieving 25%-40% faster time-to-train than H100s on large-scale pretraining of large language models (LLMs). The Gaudi 3 accelerator will reach the market about the same time as NVIDIA’s Blackwell GPUs.

Read Full Article Start My Free Trial

July 22, 2026

Samsung Galaxy Z Fold8 Analysis | Foldable Technology & Upcoming Teardown

Learn what's new in the Samsung Galaxy Z Fold8 and preview the upcoming TechInsights teardown. Access the Galaxy Z Fold7 Teardown Report and explore expert BOM analysis, supplier insights, and reverse engineering.

Learn More

June 30, 2026

Apple M5 Pro Package Analysis: TSMC's SoIC-X F2F Hybrid Bonding in Consumer Computing

TechInsights analyzes the Apple M5 Pro APL1X15 package, revealing TSMC SoIC-X F2F hybrid bonding, CPU and GPU chiplets, silicon interposer routing, and verified die costs.

Learn More

June 26, 2026

Why the AI Memory Shortage Could Keep DRAM and NAND Prices High for Years

AI-driven demand is creating the biggest memory shortage in history. Discover why DRAM and NAND prices are expected to remain elevated through the rest of the decade.

Learn More