Digital AI foundations · 2 of 4

Tokens as thought and action

Every token a model generates is either consumed by the model itself (thought) or sent to the world (action). The distinction is small but it cleans up a lot of confused thinking about agents, reasoning models, and tool use.

The primitive

Jensen Huang offered the cleanest version of this primitive at Stanford CS153: thinking is generating tokens that you consume internally, and tool use is generating tokens that you consume externally.

Internal tokens never reach a user or an external system. They are scratch work the model uses to reason its way to a better answer. External tokens are actions: API calls, search queries, function invocations, the final response to the user.

Source: Jensen Huang, Stanford CS153 Frontier Systems lecture, April 30, 2026 (https://cs153.stanford.edu/)

Why this disambiguates reasoning models

A reasoning model is one that generates many internal tokens before producing its external output. The cost is in the internal tokens, which is why reasoning models can be 10-100x more expensive per task than a standard model at similar accuracy.

When you read a reasoning model trace, you are reading internal tokens. They are not the product; they are the cost of the product. The product is the final external token sequence the user sees.

Why this disambiguates agents

An agent is just a system where the model alternates between internal tokens (thinking about what to do next) and external tokens (calling a tool, sending a message, taking an action). Each external action returns a result, which becomes part of the next round of internal tokens.

Once you see agents this way, the design questions become clear. How many internal tokens per action? How expensive is each external action? How does the model decide when to stop thinking and act? These are the parameters that determine whether an agent is fast, cheap, and reliable, or slow, expensive, and erratic.

Why this matters for cost and latency

Pricing AI workloads requires separating internal from external tokens. Internal token cost scales with reasoning depth. External token cost scales with tool-call complexity. A model that generates 50,000 internal tokens to call one $0.01 API is dominated by inference cost, not the API. The cost shape depends entirely on the internal-external mix.

Latency follows the same split. Internal tokens are fast but stacked: 50,000 of them is a real wall-clock wait. External tokens incur network round-trips: a slow tool kills agent throughput. The right architecture matches the right token type to the right hardware.