Loading
Loading
Tokens, watts, dollars. Plug in workload shape (model size, precision, monthly tokens) and get back the MW, capex, annual GPU rental, water draw, and cost per million tokens. Transparent planning math, useful for comparisons and dangerous as fake precision.
1.6T total parameters is the full expert weight pool. 49B active parameters is the approximate path used per token. The total number matters for memory and deployment footprint; the active number is closer to the per-token compute bill. This implies about 32.7x total-to-active sparsity.