Foundations · 3 of 5

What is a CPU? From desktop to cloud to AI agent

A CPU has done three very different jobs in 40 years. The desktop CPU sped up serial code. The cloud CPU stacked cores for parallel tenants. The AI-era CPU is being redesigned around tool-call latency from billion-dollar GPU systems.

What a CPU actually does

A central processing unit is the part of a computer that executes general program code. It fetches an instruction from memory, decodes it into electrical control signals, executes the operation on a small set of registers, writes the result back, and moves to the next instruction. Modern CPUs do this billions of times per second with elaborate machinery for branch prediction, caching, and parallel issue.

A consumer CPU in 2026 has 4 to 16 cores. Each core is highly optimised for serial throughput: keep the pipeline full, predict branches accurately, and hide memory latency with hierarchical caches. The design goal is "make one thread go as fast as possible."

The cloud CPU: many cores, tenant isolation

A cloud server CPU looks very different. AMD's top EPYC parts ship with 96 to 192 cores. Intel's top Xeon parts ship with 200+ cores in their Sierra Forest and Granite Rapids generations. The job is no longer to make one thread fast. The job is to host many independent workloads, each renting a slice of the chip.

Each core can be slightly slower than a consumer CPU as long as the total throughput across all cores is high. Cache hierarchies, memory channels, and inter-socket interconnects are sized for fan-out rather than for single-thread depth. This is the architecture that built the public cloud business.

The AI-era CPU: low latency for tool dispatch

The new shape is different again. In an agentic system, a multi-billion-dollar GPU cluster generates a request to call a tool — execute a database query, run a function, scrape a page — and the tool runs on a CPU. While that CPU executes the tool, the entire GPU system is idle waiting for the answer.

The bottleneck is single-thread latency on the CPU, not core count. NVIDIA designed the Vera CPU explicitly for this: fewer cores than a cloud CPU, optimised for multi-core single-threaded throughput, paired tightly with the GPU memory hierarchy so tool results return fast. Intel and AMD are pivoting the same way for their AI-rack offerings.

Source: Jensen Huang, Stanford CS153 Frontier Systems lecture, May 13, 2026; Intel Q1 2026 earnings call commentary on CPU:GPU ratio shift.

Why the CPU is back as an investable story

For most of the last decade, the CPU was a commodity. Cloud workloads compressed margins; ARM ate the low end; GPUs ate the AI-relevant compute. The narrative was "CPUs are over."

That narrative is being revised. In agentic systems, the CPU:GPU ratio is climbing back from 1:8 toward 1:3-4 because tool dispatch and orchestration are CPU-bound. Whichever vendor wins the AI-era CPU socket reclaims a structurally larger share of rack BOM. The CPU's economic role is shifting from "background compute" to "latency-critical companion to the accelerator."