Why tokenmaxxing folded — and what it means for your AI operating model

Joel Hauer

For most of 2026, Silicon Valley ran on a simple equation: more tokens, more value. The more your enterprise consumed from the frontier labs, the more "AI-native" you were supposed to be. It even got a name — tokenmaxxing — and for a while it was the proxy everyone reached for when they couldn't measure the actual outcome.

It just folded its tent.

The reason isn't a loss of faith in AI. It's the arrival of the invoice. And the invoice is doing what no strategy deck managed to: forcing leaders to ask whether every process they wired to a frontier model actually needed one.

What tokenmaxxing got wrong

Tokenmaxxing confused activity with value. It's the same error that kills most operating-model shifts: counting inputs because the output is harder to name. Pilots get counted. Demos get counted. Token spend gets counted. None of them is the thing the board is paying for.

Gartner now expects AI spending to reach nearly $2.6 trillion this year — up 47% on the prior year. That number was easy to celebrate when it read as ambition. It reads differently as a run-rate. Uber, Nebius, and Microsoft have all started prioritising efficiency in how they use AI, and both Anthropic and OpenAI are reportedly weighing cuts to token costs. When the suppliers and the heaviest buyers move on price at the same time, the trend isn't cooling. It's being repriced.

How did enterprises get here? The honest answer is the uncomfortable one:

Most teams wired AI into their pipelines out of excitement and fear of being left behind — and built the workflows without ever modelling the downstream cost.

Branding did the rest. "The big labs have done a great job of branding, and people believe they can do things that nobody else can do," says Rob May, CEO of Neurometric.ai. "I don't think that's true." When you assume only the frontier can do the job, you stop asking which jobs actually need the frontier.

Usage was never the metric

This is the part worth sitting with, because it outlasts the trend. Tokenmaxxing folded for the same reason stalled pilots stall: nobody connected the spend to an owner and a measure.

A workflow that burns frontier tokens to summarise internal tickets and one that drafts regulated customer communications show up identically on the usage dashboard. They are not the same bet. One belongs in your value zone. The other belongs on a small, local model you control — or nowhere at all.

The discipline that survives the correction is the same one we apply to every pilot: owner, metric, next step. If you can't name the person accountable for a workflow's outcome and the measure that proves it's working, token spend isn't an investment. It's leakage with good branding.

Route the frontier to the value zone — and nothing else

The practical move isn't "use less AI." It's to stop treating every process as equally deserving of the most expensive model on the market. Frontier models are extraordinary, and they're worth their price exactly where the outcome justifies it. Everywhere else, the smarter operating posture is a tiered one:

  • Frontier models for the high-stakes, high-ambiguity work in your value zone — the workflows where a better answer changes a real number.
  • Small language models and open-source models for the high-volume, well-bounded work. Run them locally for far lower cost and far greater control over data and latency.
  • No model at all for the processes that were automated by enthusiasm rather than by a business case.

This isn't a downgrade. It's portfolio management. You fund a portfolio of bets with owners, metrics, and kill criteria — and you let the cheapest tool that clears the bar do the job. The frontier is reserved for the bets that earn it.

What this week looks like

You don't need a cost taskforce or another strategy review. You need a forced decision.

  1. List your AI-touching workflows by spend. The top of that list is where your tokenmaxxing exposure lives.
  2. For each one, name the owner and the metric. No owner, no metric? It's a candidate to kill, not optimise.
  3. Ask the routing question: does this outcome need a frontier model, or did it just default to one? Move everything that doesn't into the SLM or open-source column.
  4. Run the survivors through a gate. The Investable Bet Gate gives every workflow a forced verdict — fund, pause, or kill — based on whether a named owner is actually defending the spend.

The bubble question, answered operationally

There's a louder debate running underneath all this. OpenAI and Anthropic are approaching trillion-dollar valuations ahead of their public offerings, and xAI's parent SpaceX is reportedly set to debut above $1.7 trillion. Whether that's a bubble is a question for the markets.

The question for operators is narrower and more useful: if enterprises route frontier models only to their most valuable applications — and push everything else to lower-cost models they run themselves — what happens to the value that was priced on the assumption they'd never do that?

You don't have to predict the answer. You just have to make sure your AI operating model isn't built on the assumption that tokenmaxxing was ever the point. It wasn't. Owned outcomes were.

Want to see where your frontier spend is earning its price and where it's leaking? Start with the AI Operating Model Scorecard — it maps your workflows to owners, metrics, and the model tier each one actually needs.

Pilots stalling? Bring the specifics to a 20-minute call.

Book a discovery call