3D Torus interconnect
A torus wires every chip to its six nearest neighbours, with wraparound on each axis. Google's TPU pods have used this shape since v3. Drag to rotate; the cyan ring shows the path of one all-reduce step.
Why a torus?
For collective operations like all-reduce — which dominate large-model training — bandwidth is what matters, not lowest hop count. A torus arranges chips so each one has six identical neighbours and every collective can be split across multiple disjoint rings simultaneously, saturating the available bandwidth.
A 3D mesh would do the same, but without wraparound the corner chips have fewer neighbours than the centre. The wraparound is what makes it a torus rather than a mesh: every node sees the same fabric.
What scaling each axis does
Doubling one axis doubles the chip count and doubles diameter along that axis. So an 8×8×16 pod has worse diameter (16 hops) than 8×8×8 (12 hops), but twice the chips and twice the per-axis ring bandwidth. For training, more bandwidth usually beats fewer hops.
That's why TPU pods grew lopsided: 8×8×16 became the standard pod shape because the bandwidth-vs-diameter trade flipped in favour of bandwidth at modern model sizes.
Where it shows up
See the chip comparator for scale-up domain sizes. Anything above ~64 chips in a single domain almost certainly uses a torus, dragonfly, or fat-tree underneath. NVIDIA's NVL72 is the exception; it uses a custom switch fabric (NVLink Switch) closer to a fat-tree.