The tokenmaxxing trap

22 May

Token prices have fallen roughly 80% in the past year, and enterprise AI bills have risen roughly 320% over the same period. The issue is not what tokens cost, it is the culture that has formed around burning through them.

Volume is not value

Inside engineering teams, token consumption is being treated as a productivity metric and engineers are competing on how many tokens they burn through, which seems very close to those classic metrics of lines of code / tickets / PRs, which we all know don’t work and get used anyway.

Agentic AI has accelerated the problem. A single user task in an agentic workflow can trigger ten to twenty separate model calls, each loading far more context than the task requires. Gartner puts agentic workloads at five to thirty times the token consumption of standard interactions. Have finance teams modelled for this? I don’t think engineering teams have noticed.

The context engineering discipline

Last year, Andrej Karpathy coined the phrase “context engineering”, the discipline of filling the model's context window with exactly the right information and nothing more. Here are some principles of context engineering to help reduce your token count:

prompt caching for stable instructions
targeted retrieval, pulling only the relevant chunks of your codebase or documentation
model routing that reserves the expensive models for tasks that genuinely need them
output constraints that stop a model generating three paragraphs when one sentence was the ask

The Scail AI Risk Value Index surfaces exactly where token sprawl is eroding value and where context discipline would change the economics fast.

The capability gap is real and largely unacknowledged, with only 37% of organisations having formal AI governance policies. Nobody has clearly decided who owns this, whether that is engineering leads, FinOps teams, or the CFO, and in the meantime the bills keep coming!

What boards need to see now

Most businesses are already spending on AI at scale. Very few can say whether the outcomes justify the spend, or where the token budget is actually going. The gap between AI activity and AI value has never been wider.

The Scail scorecard gives leaders a structured view across the areas that determine whether AI investment is working, including the commercial and execution layers where token discipline lives. It is a continuous picture of where value is being created, where it is being consumed without return, and what to change first.

AI is no longer just a technology issue. It is a cost issue, a governance issue, a capability issue, and a board issue.

The winners will not be the teams consuming the most tokens. They will be the teams who know what every token is for.

Read more about our AI Risk & Value Scorecard.

Douglas Cole

The tokenmaxxing trap

Volume is not value

The context engineering discipline

What boards need to see now

Show your working

When the pace picks up, the brand is the first thing to go

Let’s build AI capability.