Precision and Bandwidth
The questionA 70B model is 140 GB.
A 70B model is 140 GB.
A B200 GPU has 192 GB of HBM.
So why does serving it sometimes need a rack of GPUs? And why has every frontier launch in 2025 quietly shipped at FP8 or FP4 instead of FP16?
The answer is bandwidth, not capacity. Tap to walk through it →
1 / 9Tap to advance