Models · 6 of 6

Which model should I use for which job?

A model choice is a workflow choice: capability, latency, cost, context, privacy, tool use, modality, and failure tolerance all matter.

Where the binding constraint sits today

The best model is the smallest, cheapest, most reliable model that clears the workflow threshold. Anything above that is usually margin donated to the inference bill.

Start with the failure mode

For coding, the failure mode may be a broken build. For finance, it may be a subtle assumption error. For support, it may be an angry customer. For robotics, it may be physical damage.

The model choice should start with the cost of being wrong, not with the leaderboard.

Use frontier models where judgment is scarce

The strongest models earn their premium when tasks require planning, ambiguity resolution, tool orchestration, or high-stakes synthesis.

They are wasteful when the task is extraction, classification, formatting, or a narrow workflow with cheap verification.

Use smaller models where volume dominates

At high volume, small cost differences compound. A model that is slightly weaker but much cheaper can win if the task is constrained and failures are caught downstream.

This is why routing matters. One product may need several models: a cheap classifier, a mid-tier workhorse, and a frontier escalation path.

Open weights change control, not automatically cost

Open-weight models give teams more control over hosting, privacy, fine-tuning, and deployment geography. They can also impose operations burden.

The right comparison includes engineering labor, serving stack maturity, utilization, and risk, not just API price.

The comparator is a starting point

Context window, architecture, open-weight status, and release cadence help narrow the field. The final choice still needs task evals on your own data.

The strategic lesson is simple: model selection is no longer brand selection. It is capacity allocation at the workflow level.

11 of 11 models

Model							Weights
MythosMoE · 45× compression	Anthropic	2026-04	10.0T?	220B?	1M	—	closed
DeepSeek V4MoE · 20× compression	DeepSeek	2026-04	1.6T	80B	256k	18T	open
Claude 4.6 Sonnetdense	Anthropic	2026-03	175B?	175B?	200k	—	closed
Qwen 3 235BMoE · 11× compression	Alibaba	2026-03	235B	22B	128k	36T	open
Gemini 3 ProMoE · 25× compression	Google	2026-02	2.0T?	80B?	2M	—	closed
Grok 4dense	xAI	2026-01	314B?	314B	128k	—	closed
Mistral Large 3dense	Mistral	2026-01	123B	123B	128k	—	closed
GPT-5MoE · 20× compression	OpenAI	2025-12	5.0T?	250B?	256k	—	closed
Llama 4MoE · 4× compression	Meta	2025-08	400B	100B	1M	30T	open
Kimi K2MoE · 31× compression	Moonshot	2025-07	1.0T	32B	200k	15T	open
DeepSeek V3MoE · 18× compression	DeepSeek	2024-12	671B	37B	128k	14.8T	open

Use smaller models where volume dominates

At high volume, small cost differences compound. A model that is slightly weaker but much cheaper can win if the task is constrained and failures are caught downstream.

This is why routing matters. One product may need several models: a cheap classifier, a mid-tier workhorse, and a frontier escalation path.

Model

Weights

MythosMoE · 45× compression

Anthropic

2026-04

10.0T?

220B?

—

closed

DeepSeek V4MoE · 20× compression

DeepSeek

2026-04

1.6T

80B

256k

18T

open

Claude 4.6 Sonnetdense

Anthropic

2026-03

175B?

200k

—

closed

Qwen 3 235BMoE · 11× compression

Alibaba

2026-03

235B

22B

128k

36T

open

Gemini 3 ProMoE · 25× compression

Google

2026-02

2.0T?

80B?

—

closed

Grok 4dense

xAI

2026-01

314B?

314B

128k

—

closed

Mistral Large 3dense

Mistral

2026-01

123B

128k

—

closed

GPT-5MoE · 20× compression

OpenAI

2025-12

5.0T?

250B?

256k

—

closed

Llama 4MoE · 4× compression