Chips · 6 of 6

Which AI chip should I care about?

The answer depends on the job: frontier training, cheap inference, long context, software portability, supply assurance, or power-limited deployment.

Where the binding constraint sits today

There is no universal best chip. There is a best chip for a constraint set, and the constraint set changes as the bottleneck moves across power, memory, fabric, and software.

Start with the workload

A frontier training run wants scale-up bandwidth, reliable collectives, mature kernels, and enough software flexibility for researchers to change the model. A high-volume inference service wants cost per token, HBM bandwidth, predictable latency, and power efficiency.

Those are different jobs. Treating every accelerator as a generic "GPU equivalent" erases the decision that actually matters.

NVIDIA is the default because the system works

NVIDIA sells more than silicon. It sells CUDA, NVLink, rack designs, reference systems, libraries, debugging tools, and an ecosystem that lets teams move quickly.

That default can be expensive and still rational. When the cost of delay is a missed frontier cycle, software risk is not a footnote.

Custom silicon is how hyperscalers claw back margin

Google TPUs, AWS Trainium, Meta silicon, and other custom programs are attempts to own the cost curve for repeatable internal workloads. They do not need to win every workload. They need to win the workloads their owners serve at massive scale.

The more inference stabilizes, the more custom chips matter. The more model architecture changes, the more flexible accelerators keep their premium.

The comparator is a constraint map

Read the rows as trade-offs. HBM bandwidth points at inference. Scale-up domain points at training shape. TDP points at site power and cooling. Release date points at supply risk and software maturity.

The best buyer asks which line item binds their roadmap first, then works backward from that line item to the chip.

9 of 9 chips

Chip
TPU 8I (Inference)	Google	2026-Q1	5.5	10.5	256	9.0	12,288	750
TPU v7p (Ironwood)	Google	2025-Q4	4.6	—	192	7.4	9,216	720
MI355X	AMD	2025-Q4	5.0	10.1	288	8.0	8	1,400
B200	NVIDIA	2024-Q4	4.5	9.0	192	8.0	72	1,000
GB200 (Grace+B200)	NVIDIA	2024-Q4	9.0	18.0	384	16.0	72	2,700
TPU v6 (Trillium)	Google	2024-Q4	0.92	—	32	1.6	256	350
MI325X	AMD	2024-Q4	2.6	—	256	6.0	8	1,000
H200	NVIDIA	2024-Q1	2.0	—	141	4.8	8	700
H100	NVIDIA	2022-Q3	2.0	—	80	3.4	8	700

Back to Chips