Innovation

Burn after prompting

What if the AI tool you think you're paying for is actually billing you for every decision a model makes before the answer even gets to you?

Tiana Cline

01 July 2026

Chris Wright, Red Hat

It only took Uber four months to burn through its 2026 AI budget. In December 2025, Uber rolled out Claude Code; by April, the rideshare platform had nothing left to spend. The reason wasn’t rogue engineers or poor planning, it was tokens. Unlike a fixed software licence fee, AI coding tools like Claude Code and GitHub Copilot charge per token, which sounds manageable until you understand how tokens work in practice. A token is the smallest unit of measurement that's processed by a GenAI model. When a new model is trained, it is fed all sorts of data, like text, images or audio. The model starts to pick up patterns in the data and breaks this information into small pieces – tokens – that it can recognise and reuse.

Over time, the model builds up a large internal vocabulary of tokens. Models count how many tokens you input, such as prompts, emails or documents uploaded, and get back outputs, like summaries, code suggestions or draft responses. The more tokens you use for a particular model, the more you pay. For Uber, the AI tools were only part of the story. Engineers were ranked on internal leaderboards based on Claude Code usage, creating a direct incentive to consume more tokens. The teams driving adoption were not the same teams watching the spend. With over 5 000 engineers using the tool, it was reported that even Uber's CTO spent $1 200 in a single two-hour session.

ITWeb Premium

Get 3 months of unlimited access
No credit card. No obligation.

Already a subscriber Log in