Calculator

What does an AI query cost?

Tokens, watts, dollars. Plug in workload shape (model size, precision, monthly tokens) and get back the MW, capex, annual GPU rental, water draw, and cost per million tokens. Transparent planning math, useful for comparisons and dangerous as fake precision.

Workload assumptions

Lab preset

Tokens per day

Model type

Total parameters (B)Active parameters (B)

Precision

Expected utilization (0–1)

Latency tier

Power cost ($/MWh)Capex per MW ($)GPU rental per MW-year ($)Cooling water (gal/MWh)

Estimated load

0.01 MW

Memory footprint

1.6K GB

Cost / 1M tokens

$0.424

Annual water

21.9K gal

Parameter reading

1.6T total parameters is the full expert weight pool. 49B active parameters is the approximate path used per token. The total number matters for memory and deployment footprint; the active number is closer to the per-token compute bill. This implies about 32.7x total-to-active sparsity.

Economics

Annual energy87.6 MWh

Annual power cost$4.82K

Annual GPU rental$150K

Capex estimate$600K

Serving posture

GPU tierreserved high-memory GPU pool

Hosting tiermanaged inference with committed capacity

Active compute1.134 PF-days/day

Active params49.0B

Assumptions

1,000,000,000 tokens per day.
MOE model with 1.6T total and 49B active parameters.
FP8 precision at 45% expected utilization.
interactive latency target in Generic US site.
$60,000,000 capex per MW and $15,000,000 GPU rental per MW-year.

Caveats

This is a planning model, not a vendor quote. It estimates order of magnitude economics from workload shape and infrastructure assumptions.
It excludes networking, storage, engineering labor, redundancy, margin, reserved-capacity discounts, and model-specific kernel efficiency.
For MoE models, total parameters drive resident memory footprint while active parameters approximate per-token compute.