Power · 2 of 6

How much power does AI actually take?

A single query is a fraction of a watt-hour. A frontier training run is a hundred-megawatt-month. A 2030 cluster is a small US state running flat-out.

One inference call

The most-cited number in this space — “a ChatGPT query uses ten times more energy than a Google search” — is roughly right and roughly useless on its own. The right way to think about it: a typical chat completion of a few hundred tokens, on a frontier model, costs on the order of 1–10 watt-hours. That is one or two seconds of running a microwave.

Per query the number is small. The fleet effect is not. ChatGPT alone serves on the order of a billion queries per day in 2026. At ~3 Wh/query that is ~3 GWh/day for one product, before you count Anthropic, Google, and the rest of the field, before you count the much-larger video and code-agent loads, and before you count the much-larger training side of the equation.

One frontier training run

Training a frontier model is a single continuous process that runs for months on a tightly-coupled cluster. Roughly:

~25 MW

GPT-4-class run, ~3 months (2023 vintage)

~150 MW

Frontier training, ~6 months (2026 vintage)

~1 GW

Memphis-class run (sustained, 2026)

5–10 GW

Per-cluster target on the 2030 roadmap

A 150 MW run for six months consumes roughly 650 GWh of energy. That is the annual electricity consumption of about 60,000 US households, used to train a single model. The trend line is what matters more than the exact number: each generation has used between 4× and 10× more energy than the one before, and there is no public roadmap that bends that curve in the next two cycles.

One hyperscale data center

A modern hyperscale AI data center is the unit of buildout. Today’s sites cluster around 100–500 MW of IT load. The newest announced builds — Stargate, Hyperion, Microsoft Atlanta, Meta Hyperion-class campuses — are designed around 1–2 GW per site. The 2030 roadmaps target 5–10 GW per cluster.

For scale: a single nuclear reactor produces about 1 GW continuous. So a 5 GW cluster needs the output of 5 dedicated reactors, or a comparable mix of gas, fuel cell, and solar+battery, and the high-voltage wires that move that energy a few hundred metres without turning the substation into a fireball.

The fleet

Adding it up across the four big US hyperscalers plus xAI, Meta, and the lab-internal builds, current public commitments imply that AI-dedicated compute will draw roughly 10–15% of US electricity by 2030, up from ~3% today. Independent grid operators (PJM, ERCOT, MISO) have begun re-baselining their long-term load forecasts after a decade of flat demand. The new line is steep.

Globally the picture is more uneven. Ireland already has data center load above 20% of national consumption. Singapore has paused new DC permits. Northern Virginia has rate cases pending in 2026 that explicitly target who pays for AI's transmission upgrades. Each of these is a forward indicator for what other jurisdictions will face in 2027–2028.

The doubling curve and where it bumps physics

AI compute has been doubling roughly every 6–10 months since 2018, faster than Moore's Law. The energy behind that compute has been doubling every 18–24 months — chips have absorbed some of the demand by improving performance-per-watt, but not all of it.

Naive extrapolation of the energy curve hits problems by 2032: AI alone would want more electricity than the entire US grid produces today. Long before that, three things will bend the curve in some combination:

Algorithmic efficiency. MoE, sparsity, distillation, and post-training improvements reduce energy per useful token by 2–10× per generation. This is the biggest historical bender.
Inference-side efficiency. Speculative decoding, KV-cache compression, and quantisation already deliver step-function improvements every 6 months.
Hard limits on power and wires. If neither algorithms nor hardware bend the curve fast enough, the build rate is capped by transmission and generation buildout — the slowest variable in the system.

The strategic read: the curve will bend. The interesting question is which mechanism bends it. If algorithms do most of the work, the model layer becomes the binding constraint. If hardware does, the chip layer does. If neither does fast enough, power stays the bottleneck and capability growth slows. Each of those worlds has different winners.

One inference call

One frontier training run

Training a frontier model is a single continuous process that runs for months on a tightly-coupled cluster. Roughly:

~25 MW

GPT-4-class run, ~3 months (2023 vintage)

~150 MW

Frontier training, ~6 months (2026 vintage)

~1 GW

Memphis-class run (sustained, 2026)

5–10 GW

Per-cluster target on the 2030 roadmap

One hyperscale data center

The fleet

The doubling curve and where it bumps physics

Algorithmic efficiency. MoE, sparsity, distillation, and post-training improvements reduce energy per useful token by 2–10× per generation. This is the biggest historical bender.

Inference-side efficiency. Speculative decoding, KV-cache compression, and quantisation already deliver step-function improvements every 6 months.

Hard limits on power and wires. If neither algorithms nor hardware bend the curve fast enough, the build rate is capped by transmission and generation buildout — the slowest variable in the system.