Build without repeating these

Past AI Fails Checklist

How building with AI goes wrong — named incidents and the reusable anti-patterns behind them. Read it yourself, or point your agent at it, before you ship. Run each card's checklist.

8 principles · 0 verified incidents

PrincipleUnreviewed outputgeneral

Never ship un-reviewed model output as a deliverable

Generated text, code, or analysis goes out the door — to a client, a regulator, production, or the public — without a human verifying it first.

Root cause

Treating a model’s draft as a finished work product. Fluency reads as authority, so the review step quietly gets skipped under deadline pressure.

How to avoid it

Require human sign-off before any external-facing or high-stakes artifact ships.
Verify every citation, figure, and quote actually resolves to a real source.
Make the reviewer accountable for the output as if they wrote it.
Log which artifacts were model-generated so they get extra scrutiny.

PrinciplePrompt injectionsecurity

Treat everything the model reads as untrusted input

A model that reads web pages, emails, documents, or user text can be steered by instructions hidden in that content (prompt injection), then made to leak data or take actions.

Root cause

Assuming the model only follows your instructions. Anything in its context window can act as an instruction.

How to avoid it

Never let model output trigger a privileged or destructive action without a deterministic check in between.
Don’t concatenate untrusted content into system prompts; isolate it and label it as data.
Constrain tools to least privilege; sandbox anything the model can call.
Assume retrieved/third-party content is adversarial and validate any action it implies.

PrincipleCapability overtrusthealthcare

Put a human gate on precision-critical, low-verifiability tasks

Tasks where an error is silent and expensive — medical record transcription, legal citations, financial figures — are handed to a model and trusted directly.

Root cause

Ignoring jagged intelligence: AI is reliable in verifiable domains (code with tests) and unreliable where mistakes are subtle and hard to catch — exactly the domains that demand zero error.

Classic jagged edge: superhuman at some verifiable tasks, quietly wrong on precision-critical ones a careful human would get right.

How to avoid it

Score every task by two axes: how verifiable is the output, and how costly is a silent error.
Require human verification for medical, legal, and financial outputs — no exceptions.
Prefer extraction-with-citation over free generation when accuracy is non-negotiable.
Design the UI so verifying the answer is cheaper than producing it.

PrincipleHallucinationgeneral

Fluent is not the same as correct

A confident, well-written answer is taken as authoritative when it contains fabricated facts, citations, or numbers.

Root cause

Confidence and correctness are independent in language models. The more polished the prose, the more the reader’s guard drops.

How to avoid it

Ground answers in retrieved sources and show the citations inline.
Surface uncertainty instead of hiding it; let the model say "I don’t know".
Never present generated facts as authoritative without a verifiable source.
Spot-check numbers — models interpolate plausible-looking figures.

PrincipleEval gapdev-tools

Test on adversarial and long-tail inputs before you ship

An AI feature is evaluated only on the happy path, then fails on the edge cases and hostile inputs that real users hit.

Root cause

Demos are built on representative inputs; production is dominated by the long tail. No red-teaming, no edge-case evals, no staged rollout.

How to avoid it

Red-team the system before launch; try to make it fail.
Build an eval set that over-weights edge cases and known failure modes.
Roll out in stages with canaries and a kill switch.
Monitor for incidents and wire alerts to the people who can roll back.

PrincipleOver-automationdev-tools

Don’t give an agent destructive access without guardrails

An autonomous agent is granted write, deploy, or delete permissions beyond what its reliability justifies, and one bad action causes real damage.

Root cause

Granting autonomy that outruns the model’s reliability curve, with no blast-radius limits.

How to avoid it

Least privilege by default; expand scope only as reliability is proven.
Plan-then-apply: have the agent propose, a human or deterministic check approve.
Require explicit confirmation for irreversible operations.
Cap blast radius (rate limits, scoped credentials, staged rollout) and keep a rollback path.

PrincipleData leakgeneral

Assume the model can leak whatever you feed it

Sensitive data is sent to a model or used in training, then surfaces in outputs, logs, or to the wrong tenant.

Root cause

Treating the model as a sealed box. Data in context can come back out; data in training can be memorized.

How to avoid it

Minimize PII and secrets in prompts; redact before sending.
Never train on secrets or one tenant’s data that another could surface.
Filter outputs for leaked secrets/PII; enforce tenant isolation.
Review what gets logged — prompts and completions often contain sensitive data.

PrincipleCapability overtrustgeneral

Match the job to the model’s reliability curve

AI is deployed where it’s weak (subtle, precision-critical, low-feedback tasks) instead of where it’s strong (verifiable, fast-feedback tasks).

Root cause

Jagged intelligence: capability is uneven. Picking the wrong task is the failure, before any bug is written.

The whole checklist read through one lens — automate confidently where output is cheap to verify; demand a human where it isn’t.

How to avoid it

Lean on AI in verifiable, consistent domains (e.g. coding behind tests, structured extraction).
Demand verification where precision matters and errors are silent.
Where reliability is unproven, ship assistive (suggests) before autonomous (acts).
Re-evaluate as models improve — the jagged edge moves.

How this list works

Principles are the reusable anti-patterns — the checklist you can run before shipping anything.
Incidents are real, named, public failures. We only publish one once it has a primary source — a checklist about AI mistakes shouldn't contain unverified claims.
This grows over time. The jagged-intelligence lens behind it — why AI is reliable in verifiable domains and risky in precision-critical ones — gets its own explainer soon.

More builder tools

Build without repeating these

Past AI Fails Checklist

How building with AI goes wrong — named incidents and the reusable anti-patterns behind them. Read it yourself, or point your agent at it, before you ship. Run each card's checklist.

8 principles · 0 verified incidents

PrincipleUnreviewed outputgeneral

Never ship un-reviewed model output as a deliverable

Generated text, code, or analysis goes out the door — to a client, a regulator, production, or the public — without a human verifying it first.

Root cause

Treating a model’s draft as a finished work product. Fluency reads as authority, so the review step quietly gets skipped under deadline pressure.

How to avoid it

Require human sign-off before any external-facing or high-stakes artifact ships.
Verify every citation, figure, and quote actually resolves to a real source.
Make the reviewer accountable for the output as if they wrote it.
Log which artifacts were model-generated so they get extra scrutiny.

PrinciplePrompt injectionsecurity

Treat everything the model reads as untrusted input

A model that reads web pages, emails, documents, or user text can be steered by instructions hidden in that content (prompt injection), then made to leak data or take actions.

Root cause

Assuming the model only follows your instructions. Anything in its context window can act as an instruction.

How to avoid it

Never let model output trigger a privileged or destructive action without a deterministic check in between.
Don’t concatenate untrusted content into system prompts; isolate it and label it as data.
Constrain tools to least privilege; sandbox anything the model can call.
Assume retrieved/third-party content is adversarial and validate any action it implies.

PrincipleCapability overtrusthealthcare

Put a human gate on precision-critical, low-verifiability tasks

Tasks where an error is silent and expensive — medical record transcription, legal citations, financial figures — are handed to a model and trusted directly.

Root cause

Ignoring jagged intelligence: AI is reliable in verifiable domains (code with tests) and unreliable where mistakes are subtle and hard to catch — exactly the domains that demand zero error.

Classic jagged edge: superhuman at some verifiable tasks, quietly wrong on precision-critical ones a careful human would get right.

How to avoid it

Score every task by two axes: how verifiable is the output, and how costly is a silent error.
Require human verification for medical, legal, and financial outputs — no exceptions.
Prefer extraction-with-citation over free generation when accuracy is non-negotiable.
Design the UI so verifying the answer is cheaper than producing it.

PrincipleHallucinationgeneral

Fluent is not the same as correct

A confident, well-written answer is taken as authoritative when it contains fabricated facts, citations, or numbers.

Root cause

Confidence and correctness are independent in language models. The more polished the prose, the more the reader’s guard drops.

How to avoid it

Ground answers in retrieved sources and show the citations inline.
Surface uncertainty instead of hiding it; let the model say "I don’t know".
Never present generated facts as authoritative without a verifiable source.
Spot-check numbers — models interpolate plausible-looking figures.

PrincipleEval gapdev-tools

Test on adversarial and long-tail inputs before you ship

An AI feature is evaluated only on the happy path, then fails on the edge cases and hostile inputs that real users hit.

Root cause

Demos are built on representative inputs; production is dominated by the long tail. No red-teaming, no edge-case evals, no staged rollout.

How to avoid it

Red-team the system before launch; try to make it fail.
Build an eval set that over-weights edge cases and known failure modes.
Roll out in stages with canaries and a kill switch.
Monitor for incidents and wire alerts to the people who can roll back.

PrincipleOver-automationdev-tools

Don’t give an agent destructive access without guardrails

An autonomous agent is granted write, deploy, or delete permissions beyond what its reliability justifies, and one bad action causes real damage.

Root cause

Granting autonomy that outruns the model’s reliability curve, with no blast-radius limits.

How to avoid it

Least privilege by default; expand scope only as reliability is proven.
Plan-then-apply: have the agent propose, a human or deterministic check approve.
Require explicit confirmation for irreversible operations.
Cap blast radius (rate limits, scoped credentials, staged rollout) and keep a rollback path.

PrincipleData leakgeneral

Assume the model can leak whatever you feed it

Sensitive data is sent to a model or used in training, then surfaces in outputs, logs, or to the wrong tenant.

Root cause

Treating the model as a sealed box. Data in context can come back out; data in training can be memorized.

How to avoid it

Minimize PII and secrets in prompts; redact before sending.
Never train on secrets or one tenant’s data that another could surface.
Filter outputs for leaked secrets/PII; enforce tenant isolation.
Review what gets logged — prompts and completions often contain sensitive data.

PrincipleCapability overtrustgeneral

Match the job to the model’s reliability curve

AI is deployed where it’s weak (subtle, precision-critical, low-feedback tasks) instead of where it’s strong (verifiable, fast-feedback tasks).

Root cause

Jagged intelligence: capability is uneven. Picking the wrong task is the failure, before any bug is written.

The whole checklist read through one lens — automate confidently where output is cheap to verify; demand a human where it isn’t.

How to avoid it

Lean on AI in verifiable, consistent domains (e.g. coding behind tests, structured extraction).
Demand verification where precision matters and errors are silent.
Where reliability is unproven, ship assistive (suggests) before autonomous (acts).
Re-evaluate as models improve — the jagged edge moves.

How this list works

Principles are the reusable anti-patterns — the checklist you can run before shipping anything.
Incidents are real, named, public failures. We only publish one once it has a primary source — a checklist about AI mistakes shouldn't contain unverified claims.
This grows over time. The jagged-intelligence lens behind it — why AI is reliable in verifiable domains and risky in precision-critical ones — gets its own explainer soon.

More builder tools