How building with AI goes wrong — named incidents and the reusable anti-patterns behind them. Read it yourself, or point your agent at it, before you ship. Run each card's checklist.
8 principles · 0 verified incidents
PrincipleUnreviewed outputgeneral
Never ship un-reviewed model output as a deliverable
Generated text, code, or analysis goes out the door — to a client, a regulator, production, or the public — without a human verifying it first.
Root cause
Treating a model’s draft as a finished work product. Fluency reads as authority, so the review step quietly gets skipped under deadline pressure.
How to avoid it
Require human sign-off before any external-facing or high-stakes artifact ships.
Verify every citation, figure, and quote actually resolves to a real source.
Make the reviewer accountable for the output as if they wrote it.
Log which artifacts were model-generated so they get extra scrutiny.
PrinciplePrompt injectionsecurity
Treat everything the model reads as untrusted input
A model that reads web pages, emails, documents, or user text can be steered by instructions hidden in that content (prompt injection), then made to leak data or take actions.
Root cause
Assuming the model only follows your instructions. Anything in its context window can act as an instruction.
How to avoid it
Never let model output trigger a privileged or destructive action without a deterministic check in between.
Don’t concatenate untrusted content into system prompts; isolate it and label it as data.
Constrain tools to least privilege; sandbox anything the model can call.
Assume retrieved/third-party content is adversarial and validate any action it implies.
PrincipleCapability overtrusthealthcare
Put a human gate on precision-critical, low-verifiability tasks
Tasks where an error is silent and expensive — medical record transcription, legal citations, financial figures — are handed to a model and trusted directly.
Root cause
Ignoring jagged intelligence: AI is reliable in verifiable domains (code with tests) and unreliable where mistakes are subtle and hard to catch — exactly the domains that demand zero error.
Classic jagged edge: superhuman at some verifiable tasks, quietly wrong on precision-critical ones a careful human would get right.
How to avoid it
Score every task by two axes: how verifiable is the output, and how costly is a silent error.
Require human verification for medical, legal, and financial outputs — no exceptions.
Prefer extraction-with-citation over free generation when accuracy is non-negotiable.
Design the UI so verifying the answer is cheaper than producing it.
PrincipleHallucinationgeneral
Fluent is not the same as correct
A confident, well-written answer is taken as authoritative when it contains fabricated facts, citations, or numbers.
Root cause
Confidence and correctness are independent in language models. The more polished the prose, the more the reader’s guard drops.
How to avoid it
Ground answers in retrieved sources and show the citations inline.
Surface uncertainty instead of hiding it; let the model say "I don’t know".
Never present generated facts as authoritative without a verifiable source.
Test on adversarial and long-tail inputs before you ship
An AI feature is evaluated only on the happy path, then fails on the edge cases and hostile inputs that real users hit.
Root cause
Demos are built on representative inputs; production is dominated by the long tail. No red-teaming, no edge-case evals, no staged rollout.
How to avoid it
Red-team the system before launch; try to make it fail.
Build an eval set that over-weights edge cases and known failure modes.
Roll out in stages with canaries and a kill switch.
Monitor for incidents and wire alerts to the people who can roll back.
PrincipleOver-automationdev-tools
Don’t give an agent destructive access without guardrails
An autonomous agent is granted write, deploy, or delete permissions beyond what its reliability justifies, and one bad action causes real damage.
Root cause
Granting autonomy that outruns the model’s reliability curve, with no blast-radius limits.
How to avoid it
Least privilege by default; expand scope only as reliability is proven.
Plan-then-apply: have the agent propose, a human or deterministic check approve.
Require explicit confirmation for irreversible operations.
Cap blast radius (rate limits, scoped credentials, staged rollout) and keep a rollback path.
PrincipleData leakgeneral
Assume the model can leak whatever you feed it
Sensitive data is sent to a model or used in training, then surfaces in outputs, logs, or to the wrong tenant.
Root cause
Treating the model as a sealed box. Data in context can come back out; data in training can be memorized.
How to avoid it
Minimize PII and secrets in prompts; redact before sending.
Never train on secrets or one tenant’s data that another could surface.
Filter outputs for leaked secrets/PII; enforce tenant isolation.
Review what gets logged — prompts and completions often contain sensitive data.
PrincipleCapability overtrustgeneral
Match the job to the model’s reliability curve
AI is deployed where it’s weak (subtle, precision-critical, low-feedback tasks) instead of where it’s strong (verifiable, fast-feedback tasks).
Root cause
Jagged intelligence: capability is uneven. Picking the wrong task is the failure, before any bug is written.
The whole checklist read through one lens — automate confidently where output is cheap to verify; demand a human where it isn’t.
How to avoid it
Lean on AI in verifiable, consistent domains (e.g. coding behind tests, structured extraction).
Demand verification where precision matters and errors are silent.
Where reliability is unproven, ship assistive (suggests) before autonomous (acts).
Re-evaluate as models improve — the jagged edge moves.
How this list works
Principles are the reusable anti-patterns — the checklist you can run before shipping anything.
Incidents are real, named, public failures. We only publish one once it has a primary source — a checklist about AI mistakes shouldn't contain unverified claims.
This grows over time. The jagged-intelligence lens behind it — why AI is reliable in verifiable domains and risky in precision-critical ones — gets its own explainer soon.