Digital AI

Growing Google One

A full-funnel playbook for driving paid Google One subscriptions, cohort by cohort, from awareness through monetization.

Metrics that matter (intelligence)

Real evals: task completion over benchmark scores

Jensen at Stanford CS153: design real evals, because otherwise teams optimise the number rather than the capability. Agent task completion rate beats MMLU.

Pass@k: calibrated capability sampling

How often the model solves a task in k attempts. Pass@1 measures reliability; pass@k measures whether the capability exists at all.

Benchmark saturation: when scores stop carrying signal

MMLU saturated. GPQA approaching. HLE and ARC-AGI are the current frontier evals. Each cohort of benchmarks lives roughly 18 months before it saturates.

Tokens-per-task: the cost-of-intelligence metric

How many tokens the model consumes to complete a typical task. Predicts inference cost. Reasoning models 10-100x conventional ones at the same accuracy.

Time-on-task: the latency metric

End-to-end latency to complete a real workflow. Includes prefill, decode, tool calls, retries. The metric a human user actually feels.

Digital AI, in five questions

When the model is the software

For 60 years, software was prerecorded: write code, compile, ship, run.

Tokens as thought and action

Every token a model generates is either consumed by the model itself (thought) or sent to the world (action).

Domain representations: language is one example, not the only one

A language model works because it learns a representation of words, characters, and syntax.

The present-tense case for AI: where value compounds today

AI is not just a futuristic promise or a simple corporate headcount trim.

Where does AI actually replace work?

AI replaces work where the task has enough digital context, cheap verification, clear handoff, and economic value to justify the failure handling.

Ch. 06

Why do some AI agents stick?

Agents stick when they own a repeated job with context, tools, permissions, feedback, and a visible business outcome.

Ch. 07

Is AI going to take my job?

The honest answer is: probably not your whole job, but almost certainly some of your tasks.

Ch. 08

Where do AI app moats come from?

The moat is rarely the model call. It is workflow ownership, proprietary context, distribution, trust, feedback data, and the cost of switching the operating loop.

Ch. 09

Which AI apps should I care about?

Care about the apps that own painful workflows, improve with use, route models intelligently, and move from experiment budget to operating budget.

Building agents, in seven questions

What is an agent — and what isn't?

Most things called 'agents' in 2026 are chatbots with extra steps.

How do you scope what an agent should do?

The hardest part of building a useful agent is picking the right job for it.

An agent without tools is a chatbot. An agent with too many tools is paralyzed. Tool design is the most underrated engineering decision in agent building: each tool is a small API the agent has to understand, call correctly, handle errors from, and combine with others. The principles below are what separate agents that work in production from agents that work only in demos.

Memory — the agent's notebook

An agent that forgets what it did yesterday is severely limited.

Evals — how do you know it's good?

A demo proves the agent can succeed once. An eval proves the agent succeeds reliably, at known rates, on known kinds of inputs. Most agent projects fail because the team did not invest in evals early enough. Evals are the engineering discipline that separates a working agent from an agent that works.

Ch. 06

Failure modes and guardrails

Every agent in production will fail. The question is not whether but what shape the failure takes, who notices, and how the system recovers. The taxonomy below covers the failures that matter; ignore them and your agent will discover them for you in front of a customer.

Ch. 07

Worked examples — three agents, end to end

Three real agent designs, walked through using the framework from chapters 1-6.

Agentic commerce, in five questions

API Infrastructure: Picks and Shovels of the Agentic Economy

If models keep getting cheaper to deploy, the rate-limiting step for the AI-built economy is not generating code or content — it is coordinating transaction workflows with the real world.

What does Stripe's data say about AI-built businesses?

Stripe is one of the only entities with a near-real-time view of how many AI-built businesses are forming, where they cluster, and how fast they monetize.

Multi-agent commerce — when buyers send agents to negotiate

The shift from human buyers to agent buyers is not a UX change.

How do sellers price when agents do the buying?

When a buyer is a human, sellers can learn the buyer's customer lifetime value over months of behavior.

Who else owns rails in the agent era?

Stripe is not the only candidate to win agentic payments.