What is NAND flash, and where does it sit in the AI Stack?
NAND flash is non-volatile storage built from charge-trapping transistors stacked vertically. It holds the datasets, checkpoints, and weights that DRAM can never fit at once.
NAND is the layer where the working set ends and the dataset begins. Capacity matters more than peak speed, but write endurance and stacking economics drive the cost curve.
A transistor that remembers
A NAND cell traps electrons in an insulated region near the gate of a transistor. Once trapped, the charge stays for years without power. That is what makes the storage non-volatile.
Modern NAND uses charge-trap flash arranged in 3D stacks, where over two hundred layers are built vertically. Density now grows by adding layers rather than by shrinking the cell.
Where it lives in AI
Inside an AI cluster, NAND shows up as the SSDs in each server and as larger storage tiers in dedicated nodes. Training checkpoints, intermediate activations during long runs, and the working slice of the training dataset all live here.
For inference, NAND holds model weights that are not currently loaded into HBM and the cached state of long-running agents. Cold inference start times depend on how fast the model can move from NAND into DRAM and then into HBM.
The vendor map
- Samsung. The largest NAND maker by share, vertically integrated.
- Kioxia. Toshiba memory spin-off, partnered with Western Digital on fabs.
- SK Hynix and Solidigm. SK Hynix bought the Intel NAND business and now operates as Hynix plus Solidigm.
- Micron. The smaller of the big four, but a credible alternative supplier.
- SanDisk. Standalone NAND and SSD vendor after the Western Digital split.
The HBF question
High-bandwidth flash is the idea of stacking NAND like HBM and putting it close to the compute die, expanding the memory tier between HBM and SSD.
Irrational Analysis is skeptical for endurance reasons: NAND wears out under writes, and a layer that close to the compute would see workloads it cannot survive at scale. The argument is that if flash needs to sit near compute, it should be socketable next to a CXL controller, not stacked.
Source: Irrational Analysis interview, Chris Barber, May 2026
Strategic read
NAND is in shortage today, but Irrational Analysis warns it is more prone to oversupply cycles than DRAM. The capex story is more cyclical, the moat is shallower, and demand is more elastic.
For Pere, the useful frame is that NAND supply maps onto long-context and agent workloads more than onto frontier training. The layer matters most when applications keep state for hours or days.