Global Token Shortage Reshapes AI Computing Landscape
The global artificial intelligence industry is confronting an unexpected bottleneck: a severe shortage of tokens, the fundamental units of information that AI models process. As demand for AI services explodes, major technology companies including Anthropic, Amazon, and OpenAI have been forced to impose usage limits and reallocate computing resources, signaling a structural shift in how the world’s most sought-after computational resource is produced, priced, and consumed.
What Is a Token — and Why Is It Running Out?
A token is the smallest unit of information that an AI model processes. Every character of text, every image pixel, and every line of code is broken into tokens, each consuming computational resources. Token shortage is thus a concrete manifestation of computing power scarcity — and the numbers are staggering.
According to data from OpenRouter, a global AI model aggregation platform, total worldwide AI model token consumption reached 27 trillion by early April 2026, with month-over-month growth of 18.9%. Weekly token consumption quadrupled in the first quarter of 2026 alone. Chinese AI model weekly token consumption has now exceeded that of the United States for five consecutive weeks, driven by cost-effective models handling high-frequency, general-purpose tasks.
Three Drivers Behind the Demand Explosion
The token shortage is not merely a supply-side problem. It reflects a fundamental transformation in how AI is being used.
First, AI is evolving from “tool” to “agent.” Rather than simple chat and content generation, new AI agents can operate computers, write code, and collaborate across software platforms. This shift from “talking to doing” dramatically increases per-task compute consumption, sometimes by orders of magnitude.
Second, leading AI labs are pursuing a “compute for intelligence” strategy. Models now employ “test-time compute scaling” — performing multiple rounds of analysis, reasoning, and verification before answering. A single complex task, such as researching an industry and writing a report, can run for hours and consume millions of tokens.
Third, AI has achieved commercial breakthrough. As Shen Jianguang, Chief Economist of JD.com, wrote in People’s Daily, AI is now generating real revenue in finance, healthcare, and enterprise core operations. Enterprise demand has shifted from “technology experimentation” to “business necessity.”
Industry Response: Limits, Pricing, and the Compute Rental Boom
Major AI companies are responding with unprecedented measures. Anthropic adjusted its terms of use for Claude to curb peak-hour overuse. Amazon cited “capacity constraints” as dragging on growth. OpenAI paused or slowed non-core projects like Sora to concentrate computing power on revenue-generating operations.
ByteDance’s Doubao assistant, which now consumes 120 trillion tokens daily — a 1,000-fold increase since its May 2024 launch — introduced paid tiers at 68, 200, and 500 yuan per month. The company’s 2026 capital expenditure plan of approximately 160 billion yuan (~$22 billion) is expected to allocate half toward AI chip procurement.
The computing power rental market is booming. H100 GPU one-year lease prices rose from $1.70 per card per hour in October 2025 to $2.35 by March 2026, a ~40% increase. Delivery lead times have extended to Q2 2027 for H200 and Q1 2027 for H100 GPUs.
In one of the largest deals of its kind, Anthropic leased the entire Colossus 1 data center from SpaceX — a facility with over 220,000 NVIDIA GPUs and 300 MW of power capacity. The company has also signed approximately 5 GW of compute supply agreements with Amazon and Google/Broadcom, plus a ~$300 billion compute contract with Microsoft Azure, representing total compute commitments in the hundreds of billions of dollars.
China’s Strategic Response: From “Selling Compute” to “Selling Tokens”
China is pursuing a multi-pronged strategy to address the token shortage. The “East Data West Computing” national initiative is accelerating, relocating data processing from crowded coastal cities to energy-rich western regions. In 2026, “computing power and electricity synergy” appeared in the central government work report for the first time.
China’s three major telecom operators have all entered the token business. Shanghai Telecom launched token computing service at 1 yuan per 250,000 tokens, while China Mobile Shanghai offers 1 yuan per 400,000 tokens. This represents a fundamental evolution from commodity hardware rental to value-added AI service provision — from “selling compute” to “selling tokens.”
Northwest China is emerging as a cost-effective AI computing hub. Xinjiang and Gansu provinces leverage cheap green power — roughly one-third the price of coastal industrial electricity — and cool climates to lower operating costs by over 40% compared to eastern China. Huawei’s Ascend 384 supernode clusters and Enflame AI chips are being deployed at scale in these regions.
“Just as oil is the lifeblood of industry, tokens are the most fundamental fuel for AI development,” said Tang Shuheng, head of the Xinjiang International Integrated Supercomputing Center.
Analysis: Computing Power as Strategic Resource
The token shortage signals a deeper structural transformation. Computing power is becoming a foundational strategic resource akin to water and electricity. The AI industry is transitioning from a “free access” model to a “pay-per-token” utility model, with profound implications for who can afford to participate in the AI revolution.
Global tech giants’ combined AI capital expenditure in 2026 is projected at $725 billion. Yet the “compute divide” between companies with access to massive GPU clusters and those without is widening. NVIDIA’s upcoming Vera Rubin platform, promising 10x performance-per-watt over Blackwell, may ease supply constraints, but domestic Chinese chips still face a significant software ecosystem gap with CUDA.
What to Watch For
In the near term, token prices will likely continue rising as demand outpaces supply. The computing power rental market will consolidate toward major players. China’s telecom operators will expand token-as-a-service offerings, while northwest computing hubs could contribute up to 70% of new data center orders by 2027.
Longer term, edge computing, more efficient model architectures (Mixture of Experts), and domestic chip maturation could reshape the landscape. But one thing is clear: the era of free, unlimited AI access is ending. In the intelligent age, tokens are the new oil — and the world is just beginning to feel the pinch.