Storage · 8 of 8

Where do hard drives still matter in AI infrastructure?

Hard drives sit at the bottom of the storage hierarchy: slowest per access, cheapest per terabyte, and where the cold corpus of a frontier lab actually lives.

Where the binding constraint sits today

HDDs are slow, but for petabyte-scale archives the cost gap with NAND is still real. The interesting move is whether HAMR and MAMR keep the density curve alive long enough.

Magnetic platters and moving heads

A hard drive stores bits as tiny magnetic regions on a spinning platter. Read and write heads fly above the surface and sense or flip the magnetic state.

Access time is bounded by physics: the platter has to rotate to the right sector, and the head has to seek the right track. That makes HDDs orders of magnitude slower than NAND for random access, but the cost per terabyte is still several times cheaper at scale.

Where they live in an AI cluster

A large training run keeps the hot dataset on NAND and the cold dataset on HDD. Crawl archives, video corpora, raw sensor logs, historical model checkpoints, and replay buffers for offline reinforcement learning are all HDD-friendly workloads.

For inference, HDDs hold the cold weight library: every previous model version a lab might want to revive, plus the raw data behind the evaluations.

The density curve is alive

For years HDD areal density growth slowed because magnetic grains hit physical limits. Heat-assisted magnetic recording and microwave-assisted recording solve that by briefly weakening the magnetic field at write time, allowing smaller and more stable grains.

Seagate has shipped HAMR drives at 30 terabytes and above. Western Digital uses MAMR and is moving toward HAMR. That keeps the cost-per-terabyte gap against NAND open for cold-data workloads.

The vendor map

  • Seagate. Pure-play HDD vendor, leading on HAMR.
  • Western Digital. HDD-focused after the SanDisk spin-out, primary MAMR vendor.
  • Toshiba. Third HDD player, primarily enterprise.

Strategic read

HDDs are not interesting at the frontier of speed, but they are still interesting at the frontier of cost. For petabyte-scale cold storage, the gap with NAND is roughly four to five times per terabyte today.

For Pere, the useful frame is this: any analysis of where lab data lives, how training corpora are stored, or how long-term checkpoint history is preserved has to include the HDD line. It is the layer where multi-petabyte stays affordable.