A Decade in the Making: Google TPU v7 Reshapes AI Computing, Meta Embraces, Nvidia Responds
Worth Watching — 2025-12-01 11:57 Beijing

Overview
TPU is Google’s longest-standing, deepest, and most strategically significant asset in the AI era. It’s now the main engine driving:
- Google’s market cap growth
- The rise of its cloud business
- The reshaping of AI business models
In 2025, Google introduced the seventh-generation TPU chip — Ironwood. This chip:
- Directly challenges NVIDIA’s flagship products in performance
- Reshapes AI infrastructure competition with ultra-large-scale system advantages
Ten years ago, TPU was born as a “self-rescue chip” to counter data center compute and power crises. Today, it is an economic pillar that even companies like Meta plan to deploy in the near future.
---
NVIDIA vs. Google TPU — The Current Flashpoint
> "We are ahead of the entire industry by one generation." — NVIDIA
Key background events:
- Warren Buffett’s Berkshire Hathaway buys Alphabet stock for the first time.
- Market rumors emerge that Meta will:
- Deploy Google TPUs in data centers in 2027
- Rent TPU compute via Google Cloud in 2026
- NVIDIA responds:
- Asserts GPUs beat ASICs in performance, generality, and portability
- States TPUs cannot replace GPUs’ flexibility
- Google states:
- Continues cooperation with NVIDIA
- Commits to supporting both TPUs and NVIDIA GPUs
Outcome: TPU evolves from a crisis-born project into Google’s backbone — aiming not for single-board dominance but a wholly different ultra-large-scale system philosophy.
---
01 — TPU’s Past and Present

TPU v1 — Origins
- Initiated in 2015 to address an impending crisis:
- Rising deep learning adoption threatened to make global data center power costs 10× higher.
- GPUs were ideal for training, but inefficient for real-time inference.
- Goal: Create energy-efficient ASIC accelerators, not general-purpose chips.
- TPU v1 launched in 2016 — used for Google Translate and partial Search tasks.
- TPU matched perfectly with Transformer architecture introduced in 2017.
Full-Stack Philosophy
Google pursued a closed-loop integration model:
- Software frameworks
- Compilers
- Chip architecture
- Network topology
- Cooling systems
Commercialization & Scale
- TPU v2/v3: Opened TPU access to Google Cloud customers.
- TPU v4 (2021):
- 4,096 chips in a 2D/3D torus topology
- Trained PaLM 540B model
- Enabled near-lossless communication
- TPU v5p:
- Doubled v4 performance
- Flexible node design for 9,000-chip scale
- Attracted Meta and Anthropic interest
- TPU v6 (Trillium, 2024):
- Optimized for inference era workloads
- 67% energy efficiency improvement
- FP8 throughput boost, doubled SRAM, KV Cache optimization
---
> This deep integration approach parallels tools for AI creators such as AiToEarn官网, enabling multi-platform publishing with analytics and model ranking (AI模型排名).
---
02 — TPU v7 Ironwood: Entering the “Offensive Era”

Architectural Highlights:
- First dedicated inference chip in TPU history.
- Designed for ultra-large-scale online inference.
- Competes directly with NVIDIA Blackwell in flagship metrics.
Single-chip specs:
- FP8 dense compute: 4.6 petaFLOPS (> NVIDIA B200’s 4.5)
- Memory: 192GB HBM3e, 7.4 TB/s bandwidth
- Inter-chip bandwidth: 9.6 Tbps (lower than Blackwell’s 14.4) — but Google’s system design compensates.
Scalability:
- One Ironwood Pod: 9,216 chips
- FP8 peak: 42.5 exaFLOPS
- 118× performance over nearest competitor in specific FP8 loads
- Uses refined 2D/3D torus + Optical Circuit Switching (OCS) networks
OCS advantages:
- Latency-free MEMS optical reconfiguration
- Fault bypass in milliseconds
- Availability: 99.999% (≈ six minutes downtime/year)
Inference efficiency:
- Shared 1.77 PB high-bandwidth HBM across nodes
- 30–40% lower inference costs vs GPU systems
- Software optimizations: MaxText, GKE topology scheduling, prefix-cache-aware routing
---
03 — Three-Way Crossroads: Google, NVIDIA, Amazon

NVIDIA:
- General-purpose GPU dominance via CUDA ecosystem control
- Weak in inference efficiency
- High “NVIDIA tax” pricing
Google:
- Specialized for Transformer workloads
- Vertical full-stack integration — chips, models, frameworks, compilers, networks, cooling, data centers
- Optimized for system-level efficiency
Amazon:
- Cost-reduction focus for AWS via Trainium & Inferentia
- Not unified system — targeted at bringing AWS infra costs down
---
04 — Escaping the “CUDA Tax”

CUDA Tax Definition:
- NVIDIA GPU production cost ≈ few thousand USD
- Sold at tens of thousands USD
- Gross margins > 80% — unavoidable for companies relying on GPUs
Google’s TPU edge:
- Self-developed, full-chain control: design → manufacturing → networking → software → data center
- Internal cost savings + lower prices for Google Cloud clients
- Services offered at ~20% of OpenAI’s cost structure
TPU@Premises:
- Deploying TPUs directly in enterprise data centers for local inference
- Further latency reduction and cost optimization
---
05 — TPU as Google’s “Economic Pillar”
- TPU co-evolved with Gemini series models for both training & inference.
- Cloud AI revenue boosted to $44B annualized.
- Foundational in shifting Google from cloud laggard to AI infrastructure leader.
- Inference era will be cost-driven, scale-focused, and integration-led.
---
Broader Ecosystem Parallel:
Platforms like AiToEarn mirror TPU’s integration logic for AI creators:
- Generate → Publish → Analyze → Monetize across multiple platforms
- Open-source: AiToEarn开源地址
- Docs: AiToEarn文档
- Model ranking: AI模型排名
---
Related Reading
- Zhang Ying of Matrix Partners: Internal Speech — 2024, Four Key Decisions
- Xu Chuansheng of Matrix Partners: After So Many Years in VC, People Still Ask Me the Same Question
- Zhang Ying of Matrix Partners: Four Important Predictions for 2025
- Next “China” Still China — Xu Chuansheng, Matrix Partners
