Habana Gaudi2 Triples Performance
The latest chip from the Intel subsidiary offers a big performance jump from the initial Gaudi, putting it into the same class as Nvidia’s new Hopper GPU.
With its recent Hopper announcement, Nvidia dramatically raised the bar to compete in the data-center AI market, more than doubling its lead relative to other vendors. Intel’s Habana subsidiary is the first to reach this new height, delivering its second-generation AI accelerator with performance and power efficiency similar to Hopper’s. Habana expects Gaudi2 systems to arrive by the end of this year, only a few months after Hopper. It also previewed its next-generation low-power accelerator, Greco.
Taking advantage of a shrink to 7nm manufacturing, Gaudi2 delivers huge improvements over the first-generation design. It triples the core count and features 48MB of on-chip SRAM, twice as much as Gaudi. The new accelerator triples the amount of High Bandwidth Memory (HBM) to 96GB and provides 2.5x more bandwidth. Gaudi2 adds features such as video decoding and support for emerging FP8 data types. The greater compute and memory capacity triples Gaudi’s ResNet performance but also raises the power to 600W TDP, which is still less than Hopper’s.
Habana announced the original Gaudi in 2019, just before its acquisition by Intel, but the 16nm training chip didn’t reach the market until last year; Amazon now offers it in an AWS instance. Gaudi outperforms Nvidia’s 12nm V100 but falls well behind the 7nm A100. Yet the A100 uses more power than Gaudi and, on the basis of AWS pricing, costs more than double, giving Habana the lead in performance per dollar.
Gaudi2 is sampling. According to Habana’s initial tests, the new chip doubles the A100’s training throughput on ResNet-50. This increase should put Gaudi2 within range of the Hopper H100. Habana will sell Gaudi2 on an OAM module; it also offers a server baseboard that can hold eight accelerators.