LAYER 4: COMPUTE

The Engines.

From the H100 to Blackwell.
The silicon that eats power and spits out intelligence.

NVIDIA

H100

Hopper Architecture

The current gold standard. The chip that started the generative AI boom. Built on TSMC 4N.

Memory

80 GB

Bandwidth

3.3 TB/s

NVIDIA

B200

Blackwell Architecture

The new king. Two reticle-sized dies connected by a 10TB/s link. Designed to run trillion-parameter models.

Memory

192 GB

Bandwidth

8.0 TB/s

KING

AMD

MI300X

CDNA 3 Architecture

The contender. Massive memory capacity and bandwidth advantages. Ideal for inference on large models.

Memory

192 GB

Bandwidth

5.3 TB/s

Google

TPU v5p

TPU Architecture

The cloud native. Built for massive pod-scale training with proprietary optical interconnects (ICI).

Memory

95 GB

Networking

4.8 Tbps

ELI5: The Concept

The Bus vs. The Ferrari

A CPU (The Ferrari) is designed to do one complex thing very quickly. It's great for running your operating system or opening an app (sequential logic).

A GPU (The Bus) is slower at any single task, but it can transport 1,000 people (pixels or numbers) at the exact same time.

Neural Networks are just massive matrices of numbers. To update them, you need to do billions of tiny math problems simultaneously. This is why the "Bus" won.

Matrix A

Matrix B

Result

The Spec Sheet

Model	Vendor	Memory	Bandwidth	TFLOPS (FP8)	TDP
H100 SXM	NVIDIA	80 GB (HBM3)	3.35 TB/s	3,958 (FP8)	700W
H200 SXM	NVIDIA	141 GB (HBM3e)	4.8 TB/s	3,958 (FP8)	700W
B200	NVIDIA	192 GB (HBM3e)	8.0 TB/s	20,000 (FP4)	1000W
MI300X	AMD	192 GB (HBM3)	5.3 TB/s	5,229 (FP8)	750W
TPU v5p	Google	95 GB (HBM)	4.8 Tbps (ICI)	459 (BF16)	-
Trainium 2	AWS	96 GB (HBM3e)	2.9 TB/s	1,299 (FP8)	-

* FP4 precision introduced with Blackwell architecture. ** ICI bandwidth for TPUs.

Back to Intelligence

LAYER 4: COMPUTE

The Engines.

From the H100 to Blackwell.
The silicon that eats power and spits out intelligence.

NVIDIA

H100

Hopper Architecture

The current gold standard. The chip that started the generative AI boom. Built on TSMC 4N.

Memory

80 GB

Bandwidth

3.3 TB/s

NVIDIA

B200

Blackwell Architecture

The new king. Two reticle-sized dies connected by a 10TB/s link. Designed to run trillion-parameter models.

Memory

192 GB

Bandwidth

8.0 TB/s

KING

AMD

MI300X

CDNA 3 Architecture

The contender. Massive memory capacity and bandwidth advantages. Ideal for inference on large models.

Memory

192 GB

Bandwidth

5.3 TB/s

Google

TPU v5p

TPU Architecture

The cloud native. Built for massive pod-scale training with proprietary optical interconnects (ICI).

Memory

95 GB

Networking

4.8 Tbps

ELI5: The Concept

The Bus vs. The Ferrari

A CPU (The Ferrari) is designed to do one complex thing very quickly. It's great for running your operating system or opening an app (sequential logic).

A GPU (The Bus) is slower at any single task, but it can transport 1,000 people (pixels or numbers) at the exact same time.

Neural Networks are just massive matrices of numbers. To update them, you need to do billions of tiny math problems simultaneously. This is why the "Bus" won.

Matrix A

Matrix B

Result

The Spec Sheet

Model	Vendor	Memory	Bandwidth	TFLOPS (FP8)	TDP
H100 SXM	NVIDIA	80 GB (HBM3)	3.35 TB/s	3,958 (FP8)	700W
H200 SXM	NVIDIA	141 GB (HBM3e)	4.8 TB/s	3,958 (FP8)	700W
B200	NVIDIA	192 GB (HBM3e)	8.0 TB/s	20,000 (FP4)	1000W
MI300X	AMD	192 GB (HBM3)	5.3 TB/s	5,229 (FP8)	750W
TPU v5p	Google	95 GB (HBM)	4.8 Tbps (ICI)	459 (BF16)	-
Trainium 2	AWS	96 GB (HBM3e)	2.9 TB/s	1,299 (FP8)	-

* FP4 precision introduced with Blackwell architecture. ** ICI bandwidth for TPUs.