Comparator · chips

Frontier accelerator chips, side by side.

Cross-vendor comparison normalised to dense FLOPS (no structured-sparsity inflation). HBM bandwidth gates inference; scale-up domain size gates large-model training. The two together explain why NVL72 and TPU v7p are not interchangeable units even when their per-chip throughput looks similar.

9 of 9 chips

Chip
TPU 8I (Inference)	Google	2026-Q1	5.5	10.5	256	9.0	12288	750
TPU v7p (Ironwood)	Google	2025-Q4	4.6	—	192	7.4	9216	720
MI355X	AMD	2025-Q4	5.0	10.1	288	8.0	8	1,400
B200	NVIDIA	2024-Q4	4.5	9.0	192	8.0	72	1,000
GB200 (Grace+B200)	NVIDIA	2024-Q4	9.0	18.0	384	16.0	72	2,700
TPU v6 (Trillium)	Google	2024-Q4	0.92	—	32	1.6	256	350
MI325X	AMD	2024-Q4	2.6	—	256	6.0	8	1,000
H200	NVIDIA	2024-Q1	2.0	—	141	4.8	8	700
H100	NVIDIA	2022-Q3	2.0	—	80	3.4	8	700

HBM bandwidth

Memory bandwidth, not compute throughput, is the gating bottleneck for inference. A chip with twice the FP8 PFLOPS but the same HBM bandwidth runs decode at the same speed. See Precision and Bandwidth for why.

Scale-up domain

The number of chips that can be treated as one logical accelerator via NVLink, ICI, or Infinity Fabric. NVL72 = 72 GPUs as one. TPU v7p pod = 9,216 chips as one. That ceiling shapes what models you can train without crossing slow scale-out networks.

Power Silicon Infra Models Apps

Back to ideas

Comparator · chips

Frontier accelerator chips, side by side.

9 of 9 chips

Chip
TPU 8I (Inference)	Google	2026-Q1	5.5	10.5	256	9.0	12288	750
TPU v7p (Ironwood)	Google	2025-Q4	4.6	—	192	7.4	9216	720
MI355X	AMD	2025-Q4	5.0	10.1	288	8.0	8	1,400
B200	NVIDIA	2024-Q4	4.5	9.0	192	8.0	72	1,000
GB200 (Grace+B200)	NVIDIA	2024-Q4	9.0	18.0	384	16.0	72	2,700
TPU v6 (Trillium)	Google	2024-Q4	0.92	—	32	1.6	256	350
MI325X	AMD	2024-Q4	2.6	—	256	6.0	8	1,000
H200	NVIDIA	2024-Q1	2.0	—	141	4.8	8	700
H100	NVIDIA	2022-Q3	2.0	—	80	3.4	8	700

HBM bandwidth

Scale-up domain