Back to ideas
Explorer · topology

3D Torus interconnect

A torus wires every chip to its six nearest neighbours, with wraparound on each axis. Google's TPU pods have used this shape since v3. Drag to rotate; the cyan ring shows the path of one all-reduce step.

512 chips · TPU v5p slice
torus
512 chips
8×8×8 mesh, wraparound on each axis
Drag to rotate · scroll to zoom
Topology
3D Torus
Each chip wired to its 6 nearest neighbours; edges wrap around
Diameter
12 hops
Worst-case node-to-node hop count
All-reduce
2 directions
Bidirectional ring along each axis maximises bandwidth

Why a torus?

For collective operations like all-reduce — which dominate large-model training — bandwidth is what matters, not lowest hop count. A torus arranges chips so each one has six identical neighbours and every collective can be split across multiple disjoint rings simultaneously, saturating the available bandwidth.

A 3D mesh would do the same, but without wraparound the corner chips have fewer neighbours than the centre. The wraparound is what makes it a torus rather than a mesh: every node sees the same fabric.

What scaling each axis does

Doubling one axis doubles the chip count and doubles diameter along that axis. So an 8×8×16 pod has worse diameter (16 hops) than 8×8×8 (12 hops), but twice the chips and twice the per-axis ring bandwidth. For training, more bandwidth usually beats fewer hops.

That's why TPU pods grew lopsided: 8×8×16 became the standard pod shape because the bandwidth-vs-diameter trade flipped in favour of bandwidth at modern model sizes.

Where it shows up

See the chip comparator for scale-up domain sizes. Anything above ~64 chips in a single domain almost certainly uses a torus, dragonfly, or fat-tree underneath. NVIDIA's NVL72 is the exception; it uses a custom switch fabric (NVLink Switch) closer to a fat-tree.