Semiconductors · 12 of 12

What does a modern AI chip look like, end-to-end?

The capstone. We walk through one real product — NVIDIA's GB200 Grace-Blackwell superchip and the NVL72 rack it ships in — and name every major component, where it was designed, where it was fabbed, where it was packaged, who supplies it, and what could disrupt it. By the end the reader should be able to read any chip datasheet and place every line in its actual supply chain.

Where the binding constraint sits today

Every component on a modern AI chip is the output of a different oligopoly. Understanding which oligopoly controls which piece is how you understand the actual chokepoints — which are rarely 'the chip is hard to make' and almost always 'one supplier is the bottleneck for one component'.

What you are actually buying

A NVIDIA GB200 is not a chip. It is an assembled module. Each module contains two Blackwell B200 GPUs (the compute), one Grace CPU (the host), 192 GB of HBM3e memory (the working set), and an interposer that ties them together. The module then sits on a board called the Bianca; the board sits in a rack called the NVL72, which contains 72 of these modules connected by NVLink fabric switches and cooled by liquid running through cold plates. The whole rack draws ~120 kW and costs in the low millions of dollars at list. Order lead times in 2025 were six to twelve months for the early customers, longer for everyone else.

The thing to internalize is that each of those components — the compute dies, the CPU, the memory, the interposer, the NVLink switches, the optics, the board, the cold plate, the power-conversion stack — is sourced from a different supplier ecosystem. The chip you read about in the press is the visible name on the package. The supply-chain reality is a stack of two-to-four-supplier oligopolies, any one of which can become the binding constraint in a given quarter.

Component-by-component

Walking from compute outward:

Compute die (Blackwell B200, 2× per module). Designed in Santa Clara by NVIDIA. Fabbed by TSMC on its 4NP process — an NVIDIA-customized variant of the N4P 4nm node (a refined 5nm node, despite the marketing name). Each die is about 800 mm². The package combines two such dies as a single logical accelerator via a 10 TB/s die-to-die interconnect. Substitutes: none. AMD MI300X and the hyperscaler custom chips are competitive on paper but not interchangeable in software.
Host CPU (Grace, 1× per module). Also designed by NVIDIA, also fabbed at TSMC on the N4 (TSMC 4N) variant. 72 Arm Neoverse cores, optimized for the high-bandwidth coherent interconnect to the Blackwell dies. Substitute would be x86 (Intel Xeon, AMD EPYC) but the coherent interconnect on the Grace is part of why the module performs the way it does; the substitute is a different product.
HBM3e memory (192 GB per module across 8 stacks). Supplied by SK Hynix, Micron, and Samsung in different proportions. SK Hynix is the largest supplier in 2025; Micron has been ramping; Samsung has had yield issues that limited its share. Each stack is a 12-high tower of DRAM dies bonded with through-silicon vias. HBM is the most-supply-constrained input in the whole assembly; pricing rose more than 50% during 2024 and contracts now book 12-18 months out. This is the single most likely component to bottleneck an Nvidia roadmap.
Silicon interposer (CoWoS-L, 1× per module). This is the substrate that physically connects the Blackwell dies and HBM stacks at TSMC. CoWoS — Chip-on-Wafer-on-Substrate — comes in several variants; CoWoS-L uses Local Si Interconnect bridges and is the variant Blackwell needs. Capacity is at TSMC only; the company has been doubling CoWoS capacity year-over-year but cannot meet demand. Substitutes (Intel EMIB, Samsung X-Cube, ASE FOCoS) exist but are years behind on density and reliability.
NVLink 5 fabric switch ASICs. Designed by NVIDIA, fabbed at TSMC. The NVL72 rack contains 9 fabric switch trays totaling 36 of these ASICs, which give every GPU in the rack non-blocking access to every other GPU at 1.8 TB/s. These chips themselves are at the same node and substrate complexity as the GPUs.
Optical transceivers (rack-to-rack). Each NVL72 connects to other racks via optical fiber. The transceivers come from a fragmented market — Coherent (the merged II-VI and Finisar), Innolight, Eoptolink, Marvell-driven InPhi designs. The optics use indium phosphide (InP) lasers which are themselves a constrained category.
Bianca board. The PCB assembly that holds the module and routes power and signals. PCB suppliers are highly fragmented (Unimicron, Zhen Ding, AT&S, TTM, Compeq) and not the bottleneck most quarters.
Cold plates and liquid cooling. Vertiv, Liquidstack, and a small set of specialist suppliers. The cold plates are copper or copper alloy precision-machined parts. Demand has grown faster than supply over 2024-25 as the industry shifts from air to direct-liquid cooling.
Power conversion (analog content). Dozens of power-management ICs per module — Monolithic Power Systems, Vicor, Infineon, Texas Instruments. Voltage regulators that step the 48 V (or, increasingly, 400 V) DC input down to the sub-1 V supply the GPU dies need. This is where the analog industry described in chapter 9 actually shows up in the rack.

Where the chokepoints live

Out of all of these components, two are the most likely binding constraints on NVIDIA's roadmap in any given quarter. The first is HBM. Each Blackwell module needs 192 GB of HBM3e; SK Hynix and Micron together produce on the order of 30-50 thousand wafers per month of HBM, and each wafer yields roughly 25-30 working 24-Gb HBM3e dies. Multiplied out, the entire industry can produce somewhere around 15-25 million Blackwell-equivalent HBM bundles per year at current capacity. Demand exceeds this. NVIDIA has reportedly prepaid both SK Hynix and Micron in the billions of dollars to lock in capacity through 2026.

The second is CoWoS-L packaging at TSMC. Each Blackwell module consumes a CoWoS-L interposer that is approximately 2,500 mm² in area — substantially larger than the H100's CoWoS-S. TSMC's total CoWoS output in 2025 was estimated at around 65,000 wafers, and the share between CoWoS-S, CoWoS-R, and CoWoS-L splits this across multiple NVIDIA generations plus AMD MI300/MI350 plus a handful of other customers. TSMC has been adding ~50% capacity year over year through new fabs in Chiayi and Longtan, but the lag time from breaking ground to producing usable wafers is 12-18 months.

Either of these two — HBM or CoWoS-L — can independently delay an NVIDIA shipment schedule by 3-6 months. They have done so multiple times in 2023-24. Other components rarely make the news because they have either more suppliers or more slack capacity.

Source: TSMC investor disclosures Q2-Q4 2024; SK Hynix HBM capacity guidance Q3 2024 earnings; Micron 2024 HBM revenue disclosures; industry reporting from Trendforce and Yole Développement.

Reading the datasheet

Once you have the supply-chain decomposition above in your head, a chip datasheet stops looking like marketing and starts looking like a sourcing map. When NVIDIA announces that Rubin (the post-Blackwell generation, due 2026) will use HBM4 instead of HBM3e, you know that means a new supplier qualification cycle at SK Hynix, Micron, and Samsung — and you know which of the three will hit production first based on their public technology roadmaps. When NVIDIA discloses that Rubin moves to TSMC N3P, you know that means the per-die area is going down and the wafer cost is going up roughly proportionally, and you can read the gross-margin guidance against that fact.

Same for competitors. When AMD says MI400 uses TSMC N3 with HBM3e+, you can read the supply-chain implications directly: same memory supplier mix as NVIDIA, same TSMC node, different packaging variant (CoWoS-S for AMD vs CoWoS-L for NVIDIA), different customer mix (Microsoft, Meta, OpenAI, etc.). When a hyperscaler announces a custom chip, the press release will say 'designed in-house' but the chip is being fabbed at TSMC, packaged with CoWoS, paired with HBM from the same three suppliers, and validated against the same broader ecosystem.

Strategic read — the question every operator should be able to answer

Which single component shortage delays an NVIDIA roadmap by 6 months? In 2024 the answer was HBM3e from SK Hynix. In late 2024 into 2025 the answer was CoWoS-L capacity at TSMC. In 2026 the answer may shift to High-NA EUV scanner deliveries from ASML if the Rubin generation moves to nodes that require them. The answer is rarely the GPU die itself; it is almost always a non-NVIDIA-controlled component upstream.

This is the synthesis the rest of the track has been building toward. The AI compute supply chain is not 'NVIDIA makes chips.' It is a stack of dependencies — TSMC for fab, ASML for lithography, Zeiss for EUV optics, SK Hynix/Micron/Samsung for HBM, Shin-Etsu/SUMCO for wafers, Applied Materials/Lam/Tokyo Electron for deposition and etch, Infineon/Wolfspeed for SiC, MPS/Vicor for VRMs — each with its own production cadence, its own capacity buildout, its own geopolitical constraints. The supply of frontier AI chips is the supply of the slowest of these. Anyone who is making investment, hiring, deployment, or strategic decisions against the AI buildout should be able to name the current binding constraint and have a view on the next one.

If you have read all twelve chapters, you now have that map. It is the most durable mental model the Compute section can give you, because it lets you reason about future moves in the industry as architectural changes ripple through a stable set of supplier relationships, rather than as singular news events you have to react to.