Which model should I use for which job?
A model choice is a workflow choice: capability, latency, cost, context, privacy, tool use, modality, and failure tolerance all matter.
The best model is the smallest, cheapest, most reliable model that clears the workflow threshold. Anything above that is usually margin donated to the inference bill.
Start with the failure mode
For coding, the failure mode may be a broken build. For finance, it may be a subtle assumption error. For support, it may be an angry customer. For robotics, it may be physical damage.
The model choice should start with the cost of being wrong, not with the leaderboard.
Use frontier models where judgment is scarce
The strongest models earn their premium when tasks require planning, ambiguity resolution, tool orchestration, or high-stakes synthesis.
They are wasteful when the task is extraction, classification, formatting, or a narrow workflow with cheap verification.
Use smaller models where volume dominates
At high volume, small cost differences compound. A model that is slightly weaker but much cheaper can win if the task is constrained and failures are caught downstream.
This is why routing matters. One product may need several models: a cheap classifier, a mid-tier workhorse, and a frontier escalation path.
Open weights change control, not automatically cost
Open-weight models give teams more control over hosting, privacy, fine-tuning, and deployment geography. They can also impose operations burden.
The right comparison includes engineering labor, serving stack maturity, utilization, and risk, not just API price.
The comparator is a starting point
Context window, architecture, open-weight status, and release cadence help narrow the field. The final choice still needs task evals on your own data.
The strategic lesson is simple: model selection is no longer brand selection. It is capacity allocation at the workflow level.
| Model | Weights | |||||||
|---|---|---|---|---|---|---|---|---|
MythosMoE · 45× compression | Anthropic | 2026-04 | 10.0T? | 220B? | 1M | — | closed | |
DeepSeek V4MoE · 20× compression | DeepSeek | 2026-04 | 1.6T | 80B | 256k | 18T | open | |
Claude 4.6 Sonnetdense | Anthropic | 2026-03 | 175B? | 175B? | 200k | — | closed | |
Qwen 3 235BMoE · 11× compression | Alibaba | 2026-03 | 235B | 22B | 128k | 36T | open | |
Gemini 3 ProMoE · 25× compression | 2026-02 | 2.0T? | 80B? | 2M | — | closed | ||
Grok 4dense | xAI | 2026-01 | 314B? | 314B | 128k | — | closed | |
Mistral Large 3dense | Mistral | 2026-01 | 123B | 123B | 128k | — | closed | |
GPT-5MoE · 20× compression | OpenAI | 2025-12 | 5.0T? | 250B? | 256k | — | closed | |
Llama 4MoE · 4× compression | Meta | 2025-08 | 400B | 100B | 1M | 30T | open | |
Kimi K2MoE · 31× compression | Moonshot | 2025-07 | 1.0T | 32B | 200k | 15T | open | |
DeepSeek V3MoE · 18× compression | DeepSeek | 2024-12 | 671B | 37B | 128k | 14.8T | open |