TPUv4 Adds Large On-Chip Memory

The TPUv4 is now generally available through Google Cloud, although the company has used it internally for a year. The ASIC doubles the number of matrix units relative to the TPUv3.

Linley Gwennap

For more than a year, Google has broadly deployed the fourth-generation TPU chip for its internal workloads, but it only recently made the AI accel¬erator generally available to cloud customers. The company has trickled out details of the TPUv4 design, which delivers much better performance per watt than its predecessors.

The TPUv4 doubles the number of matrix units per core relative to the TPUv3, raising peak performance to 275 trillion operations per second (TOPS) using the Bfloat16 format. It also adds a large shared memory, reducing the number of power-wasting accesses to the external High Bandwidth Memory (HBM). This change, along with a shrink to 7nm, helps slash the chip’s power.

Google began designing its own AI accelerators in 2014, using a small team to quickly produce the first TPU. The success of that design led the company to deploy in 2017 the TPUv2, a more complex chip that could handle both training and inference. The TPUv3 was a fast upgrade, doubling the number of matrix units per core. Google debuted its next architecture in the TPUv4i, a single-core chip optimized for inference, and then in the dual-core TPUv4, which mainly targets training. It offered the TPUv4 to cloud customers “in preview” for several months before making it broadly available.

The company employs TPUs for all its AI training and inference work, eliminating purchases of Nvidia GPUs. According to MLPerf results, however, Nvidia’s A100 performs similarly to the TPUv4 in both large and small clusters, and its new H100 sets a high bar for the next TPU.

Free Newsletter

Get the latest analysis of new developments in semiconductor market and research analysis.

Subscribers can view the full article in the TechInsights Platform.

Subscriber Login

You must be a subscriber to access the Manufacturing Analysis reports & services.

If you are not a subscriber, you should be! Enter your email below to contact us about access.

Manufacturing Analysis

Subscriber Login

Analysis Insights

July 23, 2025

Inside the Future of Wearables | Teardown Insights & Market Trends eBook

Discover what's powering next-gen wearables. Get teardown insights, sensor trends, and strategic analysis in our free TechInsights eBook—built for tech leaders.

Learn More

June 23, 2025

Huawei Matebook Fold Uses Kirin X90 Built on SMIC’s 7nm (N+2) Node

TechInsights confirms Huawei's Matebook Fold | Ultimate Design features the Kirin X90 SoC built on SMIC’s 7nm (N+2) process—debunking rumors of a breakthrough 5nm node.

Learn More

June 20, 2025

Chip Observer June 2025

Stay informed on the latest shifts in semiconductor policy, AI, packaging, and market dynamics in the June 2025 Chip Observer, featuring insights on Qualcomm, OpenAI, Huawei, and more.

Learn More