Google Discloses TPUv4 Details

Google’s TPUv4 excels at AI models employing embeddings owing to its sea of SparseCores that supplement its two main cores. Targeting inference, the TPUv4i has only a single larger core to reduce power.

Joseph Byrne

Flowers are a sign of spring, and Google’s TPUv4 disclosure is a sign that it’s replacing the chip with its successor. A recent paper, to be presented in June at the International Symposium on Computer Architecture (ICSA), sheds light on how the company developed the AI processor and the supercomputer based on it.

For the fourth generation, Google developed two 7 nm chips: the TPUv4i for inference and the TPUv4 for training. The big VLIW TensorCores these chips employ have more matrix units than those in the TPUv3 and the new AI accelerators add a large common memory. The main difference between the TPUv4i and the TPUv4 is that the latter integrates two TensorCores like the TPUv2 and TPUv3, whereas the TPUv4i implements only one to enable air cooling. In contrast to competing inference-focused accelerators that emphasize INT8 throughput, Google sees accuracy benefits from eschewing quantization and sticking with the same floating-point formats for inference as for training.

Google’s ICSA paper also discusses the TPU’s SparseCores. First included in the TPUv2, these engines have proven more useful as Google has used them to process recommendation and language models that employ embeddings (vectors representing items such as words in a text block or videos watched). They’re simpler than the main VLIW core, enabling a TPU to instantiate many of them in a sea of parallel cores. The company reports that SparseCores accelerate models that employ them by 5x–7x but use only 5% of a chip’s area and power.

Google hasn’t yet disclosed information about the TPUv5. The recent paper alludes to it and hints that it’s a 4 nm chip first deployed in 2023, three years after the TPUv4. By contrast, the TPUv4, TPUv3, and TPUv2 each followed its predecessor by only one year. The long life of TPUv4 demonstrates it was flexible enough to adapt to the company’s evolving workload and left little room to improve performance, efficiency, and scalability.

Free Newsletter

Get the latest analysis of new developments in semiconductor market and research analysis.

Subscribers can view the full article in the TechInsights Platform.

Subscriber Login

You must be a subscriber to access the Manufacturing Analysis reports & services.

If you are not a subscriber, you should be! Enter your email below to contact us about access.

Manufacturing Analysis

Subscriber Login

Analysis Insights

July 23, 2025

Inside the Future of Wearables | Teardown Insights & Market Trends eBook

Discover what's powering next-gen wearables. Get teardown insights, sensor trends, and strategic analysis in our free TechInsights eBook—built for tech leaders.

Learn More

June 23, 2025

Huawei Matebook Fold Uses Kirin X90 Built on SMIC’s 7nm (N+2) Node

TechInsights confirms Huawei's Matebook Fold | Ultimate Design features the Kirin X90 SoC built on SMIC’s 7nm (N+2) process—debunking rumors of a breakthrough 5nm node.

Learn More

June 20, 2025

Chip Observer June 2025

Stay informed on the latest shifts in semiconductor policy, AI, packaging, and market dynamics in the June 2025 Chip Observer, featuring insights on Qualcomm, OpenAI, Huawei, and more.

Learn More