Untether Boqueria Targets AI Lead
The startup’s unique at-memory architecture targets an eye-popping 30 teraflop/s per watt and 12,000 teraflops in a single card. The second-generation chip is due to sample in 1H23.
Nothing succeeds like excess. Untether expects its second-generation AI-accelerator card to deliver a stunning 12,000 teraflops per second at an efficiency of 30Tflop/s per watt. That would be 6x more performance than Nvidia’s new Hopper H100 and 10x better power efficiency. To achieve this feat, the new card will have six chips. The new chip design is branded SpeedAI240 but code-named Boqueria after Barcelona’s popular public market. Customers will have to be patient, however, as Untether plans to sample the new parts in 1H23 with production late that year.
Boqueria extends the company’s first-generation architecture, more than tripling performance per watt. Crucial to its efficiency gain is a move to 8-bit floating-point (FP8), which requires less power than 8-bit integers (INT8) but delivers nearly the same results for neural-network inference. To further reduce power, the chip will employ ultra-low voltages, custom circuit design, and a shrink from 16nm to 7nm. The finer process also enables a physically larger design with more compute cores and more memory. New RISC-V CPUs make the compute cores more flexible, and new interfaces boost core-to-core bandwidth to support the greater compute power.
Untether disclosed the design at this week’s Hot Chips conference. To fund the new design, the Toronto-based startup raised $125 million last summer from investors including venture firm Tracker Capital, the Canada Pension Plan, General Motors, and Intel. The chip builds on Untether’s unique at-memory architecture, which tightly knits compute units and SRAM to minimize the power necessary to move data around the chip. The company’s first product, called RunAI200, reached production last quarter, about a year behind the original schedule. Now that its software is fully functional, Untether expects the second-generation chip to reach production more smoothly.