Research · Summary

Research sweep · deep · 2023 – 2026

Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

Claude Opus 4.8
financial
frontier
academic
vc
blogs

Synthesised 2026-04-19

AI Token Pricing vs True Total Cost of Ownership, 2023–2026

Overview

The headline number is simple and misleading. A million input tokens from a flagship model cost roughly $30 in March 2023 and cluster around $1.75 to $5 by early 2026, an 85 to 95% drop in under three years. Beneath that decline sits a market where published prices and actual serving costs have decoupled almost entirely. The price you pay is set by capital markets and competitive strategy, not by what it costs to run the model.

Sources: Bloomberg (2023) (↗); PYMNTS (2025) (↗)

The defining shift of the past 18 months is the move from training spend to inference spend, and the recognition that inference, not training, is where the durable cost sits. Bloomberg marked this pivot explicitly in April 2025. ACL 2025 research established that inference now accounts for over 90% of LLM lifecycle energy consumption, inverting the popular framing that training dominates. Once inference dominates, electricity tariffs and data centre economics, not one-off training runs, set the long-run price floor.

Sources: Bloomberg (2025) (↗)

This matters now because the capital underwriting the subsidy has reached a scale that forces a reckoning. Big Tech AI capex is projected at roughly $650 billion for 2026 alone, and the associated debt financing has become a $3 trillion market event. OpenAI's leaked financials show inference costs of $8.4 billion in 2025 rising to $14.1 billion in 2026, with cumulative cash outflow projected at $143 billion through 2029 and profitability not targeted until 2030.

Sources: Bloomberg (2026) (↗); Bloomberg (2026) (↗); Fortune (2025) (↗)

The synthesis across all five lanes converges on one thesis: current token prices are strategically rather than economically determined. Where the lanes diverge is on when that ends, what the price floor will be, and whether algorithmic efficiency can keep outrunning rising energy and infrastructure costs through 2028.

Timeline

Key milestones in AI token pricing and TCO, 2023–2026

H1 2023

GPT-4 API launches near $30 per million input tokens
SemiAnalysis publishes first inference cost analysis of search disruption

H2 2023

GPT-4 Turbo cuts price to $10 per million
Google announces TPU v5p and AI Hypercomputer

H1 2024

GPT-4o drops flagship price to $5 per million
Sequoia frames the $600B revenue question
Epoch documents rising training costs

H2 2024

DeepSeek V3 ships near $0.14 per million
a16z coins LLMflation
OpenAI losses and inference spend exposed in reporting

H1 2025

Bloomberg marks the training-to-inference spend pivot
DeepSeek market shock challenges cost assumptions
Google and Anthropic cut prices, Anthropic Opus by 67%

H2 2025

ACL paper finds inference is over 90% of lifecycle energy
Epoch quantifies 9x-900x annual price declines by tier
Centific documents the 25x subscription gap

H1 2026

Big Tech AI capex projected near $650B
OpenAI pauses Stargate UK citing energy costs
arXiv formalises Structural Jevons and token-futures pricing

Key Findings

Training costs rise while inference costs fall, and the two curves are diverging. Epoch AI's foundational work established that frontier training costs grow roughly 2.4x per year, with the largest runs projected to exceed $1 billion by 2027. Against this, Epoch's "Price of Progress" dataset finds inference cost per unit of performance falling around 10x per year. The implication is that the training amortisation burden per token is not shrinking proportionally, even as software costs collapse.

Sources: arXiv (preprint) (2024) (↗); Epoch AI (2024) (↗); Epoch AI (2025) (↗)

Price declines are not uniform; they fracture sharply by capability tier. Epoch's granular tracking shows per-task inference prices falling anywhere from 9x to 900x per year depending on the benchmark. Commodity models have collapsed toward and below $0.10 per million tokens while frontier reasoning models have largely held price. This two-tier structure is the single most important nuance lost in the "tokens are getting cheaper" headline.

Sources: Epoch AI (2025) (↗); Epoch AI (2025) (↗)

The subsidy is real, measured in billions, and not matched by revenue at any frontier lab. The Register reported OpenAI spent $12 billion on inference via Microsoft. Data Center Dynamics estimated combined training and inference costs reaching $7 billion in 2024 against a $5 billion loss. Epoch's profitability analysis estimated OpenAI spent roughly $4 billion serving free users in 2025, nearly half its inference budget. Independent modelling from ScaleDown concluded providers absorb over 90% of each token's true cost.

Sources: The Register (2025) (↗); Data Center Dynamics (2024) (↗); Epoch AI (2025) (↗); ScaleDown (tinyml.substack.com, Substack) (2024) (↗)

The AWS parallel is invoked everywhere but breaks on unit economics. Strategy analysts at Stratechery and the VC lanes frame subsidisation through Aggregation Theory: capture the developer layer through API lock-in, eat losses, normalise later, exactly as AWS priced EC2 and S3 below cost from 2006 to roughly 2012. The critical difference is that AWS achieved positive unit economics for large customers early; frontier inference appears genuinely loss-making at scale, funded by equity rather than a profitable core business. The subsidy window has already outlasted early cloud cycles, and the capital required is orders of magnitude larger.

Sources: Stratechery (Ben Thompson) (2026) (↗); Sequoia Capital (2024) (↗)

Energy is becoming the binding constraint, and labs are now acting on it. Bloomberg's April 2026 reporting on OpenAI pausing its UK Stargate data centre named energy cost as the binding limit. arXiv research projects AI data centres reaching 9 to 12% of total US electricity by 2030, and an April 2026 paper shows geographic clustering of data centres producing nonlinear regional grid stress and capacity-market price spikes. Bloomberg Intelligence notes AI queries can consume up to 10x the energy of traditional search.

Sources: Bloomberg (2026) (↗); Bloomberg Intelligence (2025) (↗)

Jevons paradox dominates the demand side: cheaper tokens drive total spend up, not down. Enterprise AI inference spend reached $37 billion in 2025, up 320% year on year, despite token prices falling roughly a thousandfold over three years. After DeepSeek's efficiency announcement, Meta raised 2025 AI capex by 50% rather than cutting it. The a16z and OpenRouter study of 100 trillion real tokens found agentic coding prompts routinely exceeding 20,000 tokens. FAccT 2025 research formalised this as the principal framework: efficiency gains consistently produce higher aggregate consumption.

Sources: Andreessen Horowitz (a16z) (2026) (↗); AI Proem (Substack) (2025) (↗)

Capability per token is rising even faster than price per token is falling. METR's time-horizon work documents autonomous task-completion horizons doubling roughly every 7 months through early 2025, with a January 2026 update confirming the trend. Divided into falling per-token cost, this yields a cost-per-unit-of-useful-work declining faster than any prior productivity technology. The paradox is that this makes agentic deployment more attractive, multiplying total token consumption through multi-step chains.

Sources: METR (Model Evaluation & Threat Research) (2025) (↗); METR (Model Evaluation & Threat Research) (2026) (↗); Tiny Empires (Substack) (2025) (↗)

Hardware efficiency is the supply-side engine, but it is a cost ceiling, not a floor. NVIDIA's Blackwell results in InferenceMAX benchmarking show up to 15x efficiency gains over the H100 generation, and Google's Trillium delivered a 4x performance-per-dollar improvement. SemiAnalysis bottom-up modelling, however, places H100 inference cost floors at roughly $0.50 to $2.00 per million tokens, well above the sub-$0.30 commodity prices of late 2024. The gap between modelled floor and market price is the subsidy made visible.

Sources: NVIDIA (official) (2025) (↗); Google Cloud (official) (2024) (↗); SemiAnalysis (newsletter.semianalysis.com, Substack) (2023) (↗)

API price is only a fraction of enterprise total cost of ownership. Practitioner and consulting work consistently finds API costs represent 15 to 30% of deployment TCO for complex applications, with the remainder consumed by data preparation, fine-tuning, security, compliance logging, orchestration, and human-in-the-loop validation. a16z's CIO survey found 84% of respondents reporting AI-related margin erosion. CFOs see a cost stack three to five times the raw token bill.

Sources: Andreessen Horowitz (a16z) (2025) (↗); McKinsey & Company (2025) (↗)

Evidence & Data

The price trajectory is the factual backbone. GPT-4 launched near $30 per million input tokens in March 2023, GPT-4 Turbo cut that to $10 in November 2023, GPT-4o reached $5 in May 2024, and commodity models fell below $0.10 by late 2024, with DeepSeek V3 near $0.14. Simon Willison's weblog remains the most complete running record of this sequence.

Sources: Simon Willison's Weblog (2023) (↗); LessWrong (2024) (↗)

The capex figures define the scale. Big Tech AI computing spend is projected at $650 billion for 2026, Goldman Sachs sees over $500 billion invested in 2026, and McKinsey's "Cost of Compute" projects $3.7 to $7.9 trillion in data centre capex through 2030 with a $5.2 trillion base case. Bain calculated that $2 trillion in new annual revenue must be generated by 2030 to fund the scaling trend, with an $800 billion shortfall even if all enterprise on-premise IT budgets shifted to cloud. Gartner projects $2.5 trillion in worldwide AI spending for 2026, up 44% year on year.

Sources: Bloomberg (2026) (↗); Goldman Sachs (2026) (↗); McKinsey Global Institute (2025) (↗); Bain & Company (2025) (↗); Gartner (2026) (↗)

On the lab side, the OpenAI numbers are the clearest disclosed quantification: $8.4 billion inference spend in 2025 rising to $14.1 billion in 2026, $12 billion on inference via Microsoft, gross margin held below 35%, and spending projected to rise to $115 billion through 2029. ARK Invest pegs inference cost declines near 95% annually and earlier framed AI training cost improvement at 50x the rate of Moore's Law. The on-premise break-even research bounds the subsidy from another angle, finding that self-hosted inference can match commercial API pricing in 0.3 to 3 months for moderate workloads.

Sources: Fortune (2025) (↗); Bloomberg (2025) (↗); ARK Investment Management (2025) (↗); ARK Invest (2023) (↗)

The architectural mechanism behind declining costs is documented: MoE architectures and quantization deliver frontier-quality output at 3 to 5x lower per-token compute than dense models, and a comprehensive TMLR survey catalogues the full optimisation toolbox of pruning, distillation, KV-cache compression, and speculative decoding.

Sources: arXiv (2025)

Signals & Tensions

The clearest tension is between cost ceilings and cost floors. Efficiency researchers (Epoch, a16z's LLMflation thesis) anchor on falling cost ceilings driven by architecture and hardware. Hardware analysts (SemiAnalysis) and energy researchers anchor on a rising floor set by power, capex, and cooling. The contested intersection point, where efficiency gains stop offsetting infrastructure costs, is the central unresolved question for 2026 to 2028 forecasting.

Sources: Andreessen Horowitz (a16z) (2024) (↗); SemiAnalysis (2024) (↗)

Operational excellence may matter more than hardware in the near term. The LessWrong community observed that open-weight model prices vary 10x across providers for identical weights, implying batching, kernel optimisation, and memory management drive near-term cost more than silicon. This complicates clean hardware-floor arguments.

Sources: LessWrong (2024) (↗)

Underreported: the flat-subscription decoupling. OpenAI's CFO signalled $2,000-per-month enterprise tiers as early as December 2024, and Centific documented a 25x gap between flat subscription pricing and actual API cost for heavy users. The strategic move is to decouple revenue from per-token exposure entirely, which the financial press has covered thinly relative to its importance.

Sources: Bloomberg (2024) (↗); Centific (2025) (↗)

Overhyped: the simple "tokens approaching free" narrative. It ignores tier fracturing, Jevons-driven volume growth, and a true TCO where the API line is a minority of spend. The DeepSeek shock was read by markets as proof costs were collapsing; labs read it as a reason to spend more.

Sources: SemiAnalysis (newsletter.semianalysis.com, Substack) (2025) (↗); Bloomberg (2025) (↗)

A weak signal worth watching: commoditisation and derivatives. A March 2026 arXiv paper models an AI token futures market and derivatives contract design, suggesting compute is heading toward commodity-market pricing. If labs lose pricing power to a spot market, the subsidy ends not by choice but by structure.

Sources: arXiv (preprint) (2026) (↗)

Anthropic's realism versus inflated demand. CNBC's April 2026 perspective argued AI demand is inflated and only Anthropic is being realistic, a minority view that contradicts the broad capex-acceleration consensus and deserves tracking.

Sources: CNBC (2026) (↗)

Open Questions

When does marginal inference turn cash-flow positive? OpenAI targets profitability in 2030, but no lab has disclosed the volume or price at which a frontier token clears cost. Until that point, the subsidy timeline is a function of investor patience, not engineering.

Sources: Fortune (2025) (↗); Epoch AI (2025) (↗)

Where exactly does the energy floor sit? Projections of 9 to 12% of US electricity by 2030 and the prospect of 10x capacity-market price spikes give a range, not a number, and per-token pricing models still largely exclude regional grid stress.

Sources: arXiv (2025)

Does proprietary silicon create a durable cost moat? Google's TPUv7 analysis suggests labs with custom chips face different floors than GPU-spot-market dependents, but no public data confirms the size of that advantage at scale.

Sources: SemiAnalysis (2025) (↗)

Can AI accelerate its own inference optimisation fast enough to matter? METR's RE-Bench shows agents matching human ML experts on short tasks, raising the prospect of a self-reinforcing cost-reduction loop that current models do not price.

Sources: arXiv / METR (2024)

Goldman's 2024 question remains unanswered: will the trillion-dollar-plus infrastructure generate adequate returns? Jim Covello's scepticism has neither been refuted nor confirmed by 2026.

Sources: Goldman Sachs (2024) (↗); Goldman Sachs (2024) (↗)

How much will compliance cost add through 2028? The EU AI Act, NIST AI RMF, and emerging financial-sector governance are projected to grow the audit component, but no credible figure exists for its share of TCO.

If a token futures market materialises, who absorbs the subsidy unwind? Commodity pricing would transfer pricing power from labs to markets, and which providers survive that transition is entirely open.

Sources: arXiv (preprint) (2026) (↗)

The subsidy is not a discount. It is a bet that volume, lock-in, and capability growth will outrun a rising energy floor before the capital markets lose patience. Nobody serving tokens today knows which gives way first.

![[sources-ai-token-pricing-vs-true-total-cost-of-ownership-f]]

Sources

Summary: ↑ Back to summary

Financial Press

ID	Title	Outlet	Date	Significance
f1	How Much Is Big Tech Spending on AI Computing? A Staggering $650 Billion in 2026	Bloomberg	2026-02	Definitive Bloomberg News quantification of 2026 hyperscaler AI capex at $650B, establishing the scale of infrastructure investment that underpins current token pricing subsidies.
f2	The $3 Trillion AI Data Center Build-Out Becomes All-Consuming For Debt Markets	Bloomberg	2026-02	Bloomberg's deep-dive into debt market financing of AI infrastructure, revealing the financial mechanics behind how data-centre construction costs are being funded and how that cost ultimately flows through to inference economics.
f3	OpenAI Pauses Stargate UK Data Center Citing Energy Costs	Bloomberg	2026-04	Illustrates that energy cost constraints are already forcing project-level decisions at the frontier lab level, confirming that power is emerging as a binding cost floor for token pricing.
f4	AI Spending Boom Shifts From Training Models to Running Them	Bloomberg	2025-04	Pivotal Bloomberg newsletter piece documenting the structural shift in AI capex from model training to inference workloads, the key transition defining the 2025–2026 cost and pricing landscape.
f5	Why AI Bubble Concerns Loom as OpenAI, Microsoft, Meta Ramp Up Spending	Bloomberg	2025-11	Bloomberg synthesises mounting analyst concern that AI infrastructure investment is outpacing monetisation, directly relevant to whether current token prices can ever cover true costs.
f6	OpenAI Says Spending to Rise to $115B Through 2029	Bloomberg	2025-09	Bloomberg reporting on OpenAI's internal spending roadmap, confirming that compute cost trajectories are projected to rise sharply even as token prices are cut, widening the subsidy gap.
f7	Watch AI Cost Assumptions Challenged	Bloomberg	2025-01	Bloomberg live coverage on the day of DeepSeek's market shock, capturing real-time financial market reaction to a rival model achieving near-frontier performance at a fraction of the cost, directly challenging incumbent pricing assumptions.
f8	OpenAI CFO Thinks Business Users Will Pay Thousands For AI Software	Bloomberg	2024-12	Direct executive commentary from OpenAI's CFO on the enterprise pricing strategy, revealing the planned shift toward high-ARPU subscription models as an alternative to per-token revenue to fund infrastructure.
f9	Microsoft Sets Expensive Price Tag for New Corporate AI Products	Bloomberg	2023-07	Early Bloomberg benchmarking of enterprise AI product pricing (Microsoft Copilot at $30/user/month), providing a 2023 baseline to measure how enterprise AI pricing models have evolved.
f10	AI Inferencing at Crossroads	Bloomberg Intelligence	2025	Bloomberg Intelligence analysis of inference as the critical commercial battleground, detailing how model distillation and quantisation are reducing per-token costs while demand scaling offsets margin improvements.
f11	Big Tech 2025 Capex May Hit $200 Billion as Gen-AI Demand Booms	Bloomberg Intelligence	2025	Bloomberg Intelligence capex projection establishing that 2025 hyperscaler infrastructure spend - the cost base that subsidises token pricing - would reach $200B, up sharply from prior years.
f12	AI Accelerator Market Looks Set to Exceed $600 Billion by 2033	Bloomberg Intelligence	2025	Bloomberg Intelligence market-sizing of the AI accelerator chip ecosystem ($116B in 2024 to $604B by 2033), quantifying the hardware cost trajectory underlying all token pricing models.
f13	AI Is a Game Changer for Power Demand	Bloomberg Intelligence	2025	Bloomberg Intelligence analysis of how AI data centres are transforming energy markets, with generative AI queries consuming up to 10x the energy of traditional searches, establishing energy as a rising structural cost component.
f14	Gen AI: Too Much Spend, Too Little Benefit?	Goldman Sachs	2024-06	Goldman Sachs' most-cited AI sceptic report, with head of global equity research questioning whether $1T in AI infrastructure can generate adequate returns; a key reference point for the 'cost vs. benefit' debate in financial markets.
f15	Will the $1 Trillion of Generative AI Investment Pay Off?	Goldman Sachs	2024	Goldman Sachs investment research framing the core financial question around AI infrastructure: whether the capital cycle is commercially justifiable, directly informing how analysts assess the sustainability of below-cost token pricing.
f16	Why AI Companies May Invest More Than $500 Billion in 2026	Goldman Sachs	2026	Goldman Sachs' most current projection on AI infrastructure spending, providing a 2026 financial-market perspective on whether investment momentum is sustainable and what it implies for token cost floors.
f17	The Cost of Compute: A $7 Trillion Race to Scale Data Centers	McKinsey & Company	2025	McKinsey's comprehensive bottom-up analysis of data centre cost structure, projecting $5.2T required investment through 2030, and decomposing the build cost into land, power, cooling, and compute components.
f18	The New Economics of Enterprise Technology in an AI World	McKinsey & Company	2025	McKinsey's enterprise-facing analysis of how AI shifts IT spending from capex to opex, with FinOps and token-level cost visibility emerging as critical for managing true AI deployment TCO beyond API sticker prices.
f19	LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks	Epoch AI	2025	The most rigorous empirical tracking of token price declines across performance tiers, documenting 9x–900x annual price drops depending on task and showing that frontier reasoning models have not followed commodity price trends.
f20	Inference Economics of Language Models	Epoch AI	2024	Epoch AI's foundational decomposition of what drives LLM inference costs - hardware utilisation, model size, batch size, memory bandwidth - providing the analytical framework cited by financial analysts evaluating token pricing sustainability.
f21	AI Datacenter Energy Dilemma - Race for AI Datacenter Space	SemiAnalysis	2024	SemiAnalysis's deep technical analysis of data-centre power constraints as a structural cost floor for AI inference, widely cited in financial press as the authoritative bottom-up view on infrastructure economics.
f22	Groq Inference Tokenomics: Speed, But At What Cost?	SemiAnalysis	2024	SemiAnalysis cost-per-token breakdown for specialised inference hardware, quantifying real economics of serving tokens and demonstrating the gap between cloud-provider pricing and actual hardware cost.
f23	OpenAI Says It Plans to Report Stunning Annual Losses Through 2028 - and Then Turn Wildly Profitable Just Two Years Later	Fortune	2025-11	Fortune's reporting on leaked OpenAI financial projections showing $44B cumulative losses before 2029 profitability - the definitive document source for quantifying how much investor capital is subsidising current token prices.
f24	Perspective: AI Demand Is Inflated, and Only Anthropic Is Being Realistic	CNBC	2026-04	Most recent (April 2026) financial media critique of AI demand assumptions and token consumption projections, with direct commentary on Anthropic's more conservative pricing and demand forecasting relative to OpenAI and Nvidia.
f25	AI Training Costs Are Improving at 50x the Speed of Moore's Law	ARK Invest	2023	ARK Invest's Wright's Law application to AI compute, projecting that AI training and inference costs decline at 50x the pace of Moore's Law - the bullish analytical counterpoint to Goldman Sachs' scepticism on AI cost trajectories.

Frontier Lab & Model News

ID	Title	Outlet	Date	Significance
t1	METR's GPT-4.5 Pre-Deployment Evaluations	METR (Model Evaluation & Threat Research)	2025-02	Official METR pre-deployment autonomy evaluation of GPT-4.5, finding capabilities between GPT-4o and o1 and assessing risk level relative to existing frontier models.
t2	Details about METR's Preliminary Evaluation of Claude 3.7	METR (Model Evaluation & Threat Research)	2025-04	Pre-deployment autonomy assessment of Claude 3.7 Sonnet, noting impressive AI R&D capabilities on RE-Bench but no evidence of dangerous-level autonomous capabilities.
t3	Details about METR's Evaluation of OpenAI GPT-5	METR (Model Evaluation & Threat Research)	2025-05	METR's autonomy evaluation of OpenAI's flagship GPT-5 model, providing the most current public capability benchmarking for the frontier's leading model.
t4	Details about METR's Preliminary Evaluation of GPT-4o	METR (Model Evaluation & Threat Research)	2024-05	Baseline METR autonomy evaluation for GPT-4o, establishing a reference point against which later models' capability escalations are measured.
t5	Task-Completion Time Horizons of Frontier AI Models - Time Horizon 1.1	METR (Model Evaluation & Threat Research)	2026-01	METR's updated time-horizon dataset showing frontier model autonomous task-completion window doubling roughly every 7 months since 2019, with an expanded task suite giving tighter estimates at longer horizons.
t6	Measuring AI Ability to Complete Long Tasks	METR (Model Evaluation & Threat Research)	2025-03	Introduces METR's methodology for quantifying how long AI agents can sustain productive autonomous work, directly informing the inference-cost implications of extended agentic deployments.
t7	Details about METR's Preliminary Evaluation of DeepSeek and Qwen Models	METR (Model Evaluation & Threat Research)	2025-07	Finds mid-2025 DeepSeek autonomous capability levels comparable to late-2024 frontier models, highlighting how cost-efficient open-weight models are closing the autonomy gap.
t8	Introducing Claude 3.5 Sonnet	Anthropic	2024-06	Official launch announcement establishing Claude 3.5 Sonnet as Anthropic's price-performance flagship, priced at $3/$15 per million tokens - significantly undercutting Claude 3 Opus at $15/$75.
t9	Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet	Anthropic	2024-10	Official Anthropic model card documenting safety evaluations, capability benchmarks, and technical specifications for the October 2024 Claude 3.5 refresh - a primary technical disclosure.
t10	Google and Anthropic Drop AI Prices and Release New Models	PYMNTS	2025-05	Documents the coordinated 2025 pricing cuts by Google (Gemini) and Anthropic (Claude Opus 4.5, price cut by 67%), illustrating competitive subsidisation dynamics between frontier labs.
t11	OpenAI Has Spent $12B on Inference with Microsoft: Report	The Register	2025-11	Reports OpenAI's cumulative inference spend of $12B on Azure, exposing the massive infrastructure subsidy underpinning user-facing token prices.
t12	OpenAI Training and Inference Costs Could Reach $7bn for 2024, AI Startup Set to Lose $5bn	Data Center Dynamics	2024-09	Key financial disclosure showing OpenAI's 2024 compute cost structure - $7B in training and inference against $3.7B revenue - quantifying the scale of below-cost token pricing.
t13	Exclusive: Here's How Much OpenAI Spends on Inference and Its Revenue Share With Microsoft	Where's Your Ed At (Ed Zitron)	2025-05	Detailed breakdown of OpenAI's leaked internal financials, showing inference costs at $8.4B in 2025 - 66% from paying users - with projections rising to $14.1B in 2026.
t14	OpenAI Faces Financial Growing Pains, Spending Double Its Revenue	DeepLearning.AI – The Batch	2024-10	Concise summary of OpenAI's loss trajectory ($540M in 2022 → $5B in 2024), contextualising why user-facing token prices remain far below true cost.
t15	The Rising Costs of Training Frontier AI Models	arXiv (preprint)	2024-05	Academic analysis quantifying the exponential escalation in frontier model training costs, providing the cost-amortisation context for why labs price tokens below marginal cost.
t16	AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design	arXiv (preprint)	2026-03	Proposes a formal framework for AI token pricing as a tradeable commodity, analysing the structural forces - lab subsidisation, demand elasticity, and market power - driving current API pricing.
t17	Photons = Tokens: The Physics of AI and the Economics of Knowledge	arXiv (preprint)	2026-03	Formalises the Structural Jevons Paradox in AI: as unit token costs fall, firms redesign agent architectures to consume dramatically more compute via deeper reasoning loops and larger context windows.
t18	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis	2025-06	SemiAnalysis's open-source benchmark showing NVIDIA Blackwell delivering 15× lower cost per million tokens versus prior generation, setting the hardware efficiency baseline for 2025–2026 pricing floors.
t19	AI Datacenter Energy Dilemma - Race for AI Datacenter Space	SemiAnalysis	2024-12	Detailed infrastructure analysis from SemiAnalysis on power constraints, data-centre construction timelines, and energy costs as the principal rising-cost vector offsetting hardware efficiency gains.
t20	Google TPUv7: The 900lb Gorilla In the Room	SemiAnalysis	2025-08	Deep technical analysis of Google's latest proprietary TPU, showing how vertical compute integration gives Google a structural cost advantage in Gemini inference pricing versus GPU-dependent rivals.
t21	Introducing Cloud TPU v5p and AI Hypercomputer	Google Cloud (official)	2023-12	Google's official announcement of TPU v5p infrastructure powering Gemini training, establishing the proprietary compute stack that underpins Google's inference cost economics.
t22	Trillium TPU Is GA	Google Cloud (official)	2024-11	Announces general availability of Trillium (TPU v6e), offering 4× better performance-per-dollar for inference versus v5e and used to train Gemini 2.0 - quantifying Google's hardware efficiency edge.
t23	NVIDIA Blackwell Raises Bar in New InferenceMAX Benchmarks, Delivering Unmatched Performance and Lowest Cost Per Token	NVIDIA (official)	2025-07	Official NVIDIA benchmark results showing Blackwell architecture's cost-per-token leadership, directly informing the hardware cost floor for frontier labs running GPU-based inference.
t24	Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters	NVIDIA (official)	2025-03	NVIDIA's TCO framework for 'AI factories,' arguing that total cost of ownership - not GPU price - governs real inference economics, encompassing compute, networking, cooling, and utilisation.
t25	The 25× Subscription Trap: Why Frontier Labs Can No Longer Subsidize Your AI	Centific	2025-09	Documents the 25× gap between flat subscription fees and actual API cost for heavy users, providing concrete evidence of the scale of cross-subsidisation in frontier lab pricing models.

VC & Analyst Reports

ID	Title	Outlet	Date	Significance
v1	State of AI: An Empirical 100 Trillion Token Study with OpenRouter	Andreessen Horowitz (a16z)	2026-01	Empirical study of 100T tokens routed via OpenRouter reveals that agentic inference is the fastest-growing use pattern and that developers overwhelmingly optimise for quality over price, with Claude holding ~60% of coding workloads at 20K+ token average prompts - directly illustrating Jevons paradox at the token level.
v2	AI Is Driving A Shift Towards Outcome-Based Pricing (December 2024 Enterprise Newsletter)	Andreessen Horowitz (a16z)	2024-12	Argues that per-token pricing is giving way to outcome-based pricing as AI costs scale, but finds CIOs remain uncomfortable with outcome metrics - a key signal that token-cost opacity is migrating into enterprise contract design.
v3	How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025	Andreessen Horowitz (a16z)	2025	Survey of 100 enterprise CIOs finds 80% missed AI infrastructure cost forecasts by more than 25% and 84% report margin erosion tied to AI workloads, establishing that user-facing token prices grossly understate true enterprise TCO.
v4	CFO Roundtable: AI Growth, Pricing, and Forecasting (June 2025 Fintech Newsletter)	Andreessen Horowitz (a16z)	2025-06	CFO-level discussion on AI unit economics reveals that every token processed is a direct variable cost, and that the newest reasoning models still command relatively high costs despite commodity-model price compression.
v5	AI's $600B Question	Sequoia Capital	2024-06	David Cahn's landmark framework quantifies the annual revenue gap between AI infrastructure investment and actual AI-ecosystem revenue at ~$600B, calculates GPU costs as exactly half of AI data-centre TCO, and explicitly flags rapid GPU depreciation as a structural risk to lab economics.
v6	AI is Now Shovel Ready	Sequoia Capital	2024-12	Designates 2025 as the 'Year of the Data Center,' detailing that average AI data-centre construction takes ~2 years, that Amazon committed $50B+ to new builds in H1 2024, and that capital allocation risk from long lead times is a primary structural constraint on AI supply economics.
v7	AI in 2025: Building Blocks Firmly in Place	Sequoia Capital	2025-01	Annual outlook positions 2025 as an execution year where infrastructure build-out transitions from deal-signing to physical deployment, with cloud service providers competing on GPU cluster scale and pricing as the primary near-term battleground.
v8	AI in 2026: A Tale of Two AIs	Sequoia Capital	2026-01	Identifies a bifurcation between commoditised inference and frontier reasoning models, framing the divergence as a structural price-floor dynamic where frontier capability commands premium pricing while commodity models race toward near-zero marginal cost.
v9	The Cost of Compute: A $7 Trillion Race to Scale Data Centers	McKinsey Global Institute	2025	Projects $3.7–$7.9 trillion in global data-centre capex through 2030 across three demand scenarios, with the base case at $5.2 trillion, and allocates ~60% of spend to computing hardware, ~25% to power and cooling, and ~15% to construction - the most comprehensive public cost-stack decomposition available.
v10	Who's Funding the AI Data Center Boom?	McKinsey Global Institute	2025	Examines the financing structure behind AI data-centre buildout, clarifying that hyperscaler balance sheets, sovereign wealth funds, and private credit are the three capital pools underwriting infrastructure that token prices must eventually recoup.
v11	Issue Brief: AI Infrastructure	McKinsey Global Institute	2025	Frames AI infrastructure as an 'AI factory' model - data and electricity as inputs, tokens and insights as outputs - directly linking compute capex to revenue generation and articulating the economic logic that will ultimately drive token price normalisation.
v12	Beyond Compute: Infrastructure That Powers and Cools AI Data Centers	McKinsey Global Institute	2025	Analyses the non-compute TCO components (power, cooling, backup generation, physical plant) that are often invisible in quoted token prices, projecting 200 incremental GW of AI-related capacity required in the accelerated scenario and flagging energy as a rising, not falling, cost component.
v13	Token Economics, Physical AI, and Beyond: McKinsey Previews NVIDIA GTC	McKinsey Global Institute	2025-03	McKinsey's Chris Smith explicitly adopts 'token economics' as an analytical unit, signalling that major strategy consultancies have shifted from cloud-hour pricing to per-token unit economics as the primary framework for AI infrastructure ROI analysis.
v14	Technology Report 2025: $2 Trillion in New Revenue Needed to Fund AI's Scaling Trend	Bain & Company	2025	Bain's headline finding that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand is the single most cited cost-recovery gap figure in 2025 analyst literature, and directly implies sustained lab subsidisation until that gap closes.
v15	How Can We Meet AI's Insatiable Demand for Compute Power?	Bain & Company	2025	Quantifies that AI compute demand is growing at more than twice the rate of Moore's Law and projects a global $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud and AI data centres.
v16	AI's Trillion-Dollar Opportunity (Global Technology Report 2024)	Bain & Company	2024	Bain's 2024 baseline report that documents unprecedented GenAI adoption speed despite cost roadblocks, establishing the trajectory against which the 2025 $2T gap estimate is benchmarked.
v17	Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026	Gartner	2026-01	Gartner's official forecast of $2.52 trillion in worldwide AI spending for 2026 - a 44% YoY increase - provides the most widely cited market-size anchor for contextualising token-price economics against total infrastructure outlays.
v18	Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business	Gartner	2025-10	Gartner's top strategic predictions for 2026 and beyond cover AI agent proliferation, agentic spending intermediation, and enterprise cost displacement - providing the Technology Radar framing for how token-cost trajectory intersects with enterprise software budgets.
v19	Gartner Survey Finds 54% of Infrastructure & Operations Leaders Are Adopting AI to Cut Costs	Gartner	2025-10	Survey evidence that more than half of I&O leaders view AI primarily as a cost-reduction tool, creating a circular dynamic where AI's cost is justified by AI's cost savings - a framing that shapes enterprise willingness to absorb rising token bills.
v20	The State of AI Infrastructure: Demand, Costs, and Custom Silicon	ARK Investment Management	2025-12	Using SemiAnalysis InferenceMax benchmarks, ARK calculates that inference costs for capable models are falling at ~95% annually, outpacing the ~75% annual training cost decline, and identifies custom silicon (Trainium, TPU, MTIA) as the next structural cost lever hyperscalers are deploying to reduce Nvidia dependence.
v21	AI Will Determine the Future of Software and Cloud Spending	ARK Investment Management	2025	ARK projects global data-centre systems investment growing at 30%+ annually to reach $653B in 2026, with AI infrastructure spend tripling to ~$1.5T by 2030, providing the demand-side framework for understanding why token prices cannot fall indefinitely even with hardware efficiency gains.
v22	Can AI Companies Become Profitable?	Epoch AI	2025	Epoch AI's analysis of multiple frontier labs finds compute (R&D plus inference) comprises 54–62% of costs and that spending is currently 2–3× revenue at each lab, with OpenAI alone spending ~$4B serving free users in 2025 - the most rigorous published quantification of frontier-lab subsidisation.
v23	LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks	Epoch AI	2025	Tracks state-of-the-art model prices across six benchmarks from 2022–2025, finding task-specific price-performance declines ranging from 9× to 900× per year, with the fastest declines post-January 2024 following DeepSeek and open-weight model competition.
v24	Inference Economics of Language Models	Epoch AI	2025	Deep-dives into the unit economics of LLM inference - compute, memory bandwidth, batching efficiency, and hardware utilisation - establishing that electricity is only 10–15% of GPU TCO while capital costs dominate, which sets a structural cost floor on token prices.
v25	How Persistent Is the Inference Cost Burden?	Epoch AI	2025	Examines whether inference cost burdens at frontier labs are structural or transient, finding that rising query complexity (reasoning chains, agentic loops) offsets hardware efficiency gains - directly addressing whether price-per-token declines will continue through 2028.

Blogs & Independent Thinkers

ID	Title	Outlet	Date	Significance
b1	The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment	ScaleDown (tinyml.substack.com, Substack)	2024	Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing.
b2	The Cost of Inference: Running the Models	ScaleDown (tinyml.substack.com, Substack)	2024	Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly.
b3	Tokenomics 101: Navigating the Nuances of LLM Product Pricing	ScaleDown (tinyml.substack.com, Substack)	2024	Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics.
b4	The Economics of Building ML Products in the LLM Era	ScaleDown (tinyml.substack.com, Substack)	2024	Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates.
b5	The Price of Tokenmaxxing	Aspiring for Intelligence (Substack)	2025	Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads.
b6	The Price Is Wrong	Aspiring for Intelligence (Substack)	2025	Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions.
b7	Groq Inference Tokenomics: Speed, But At What Cost?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024-02	First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations.
b8	Inference Race To The Bottom - Make It Up On Volume?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024	Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee.
b9	The Inference Cost Of Search Disruption – Large Language Model Cost Analysis	SemiAnalysis (newsletter.semianalysis.com, Substack)	2023	Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation.
b10	DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-01	Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact.
b11	AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-05	Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns.
b12	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-10	Introduces an independent TCO-per-million-token benchmark - the first to measure total cost of compute across diverse model sizes and real-world scenarios - establishing a replicable methodology for ongoing cost trajectory analysis.
b13	Mythos, Muse, and the Opportunity Cost of Compute	Stratechery (Ben Thompson)	2026	Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated.
b14	AI Promise and Chip Precariousness	Stratechery (Ben Thompson)	2025	Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation.
b15	Rapidus, The End of Economic Rationality, AI Disruption	Stratechery (Ben Thompson)	2024	Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position.
b16	Observations About LLM Inference Pricing	LessWrong	2024	Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone.
b17	Simon Willison on llm-pricing (tag archive)	Simon Willison's Weblog	2023	Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples.
b18	Welcome to LLMflation - LLM inference cost is going down fast	Andreessen Horowitz (a16z)	2024-11	Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years - from $60/M tokens in 2021 to $0.06/M by late 2024 - the most-cited single data point in independent discourse on the price-collapse rate.
b19	How persistent is the inference cost burden?	Epoch AI (Substack)	2025	Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories.
b20	How much does it cost to train frontier AI models?	Epoch AI	2024	Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs.
b21	LLM inference prices have fallen rapidly but unequally across tasks	Epoch AI	2025	Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed - the key bifurcation story of 2024–2025.
b22	The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand	AI Proem (Substack)	2025	Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward.
b23	The Jevons Paradox in AI: Why Efficiency Creates More Demand	The Substrate (Substack)	2025	Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B - empirically confirming that Jevons effects dominate price reductions at the market level.
b24	AI agents are about to get more expensive	Tiny Empires (Substack)	2025	Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest.
b25	AI Pricing Architecture Is Now Strategy	SaaS Intelligence (Substack)	2025	Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation.