Research · Blogs & Independent Thinkers

Back to sweep

Research sweep · deep · 2025 – 2026

Comparative LLM Usage Across Sectors

Comparative real-world usage of LLMs and adjacent AI technologies from June 2025 to June 2026: which models (GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen) dominate which sectors, how they are deployed (hosted API, Bedrock/Azure, self-hosted vLLM/Ollama, RAG, agents, fine-tuning), what workloads they serve, and how organisations measure, budget, and publicly report token cost and actual spend.

  • Claude Opus 4.8
  • financial
  • frontier
  • academic
  • vc
  • blogs
  • tech

Synthesised 2026-06-20

Narrative

The clearest quantitative window into enterprise LLM adoption comes from Menlo Ventures' paired 2025 reports. The mid-year report recorded enterprise LLM API spend doubling from $3.5 billion to $8.4 billion in a single six-month stretch, driven by inference overtaking training as the dominant cost category. By year-end, Anthropic had captured 40% of the enterprise LLM API market, up from 12% in 2023, with OpenAI falling to 27% and Google reaching 21%. Code generation was the catalyst: Claude held 42% developer share in that workload, double OpenAI's figure. These are survey-derived numbers from roughly 500 US enterprise decision-makers, and Menlo is an investor in Anthropic, which warrants scepticism about precision, but the directional claim of a rapid share shift is corroborated by the OpenRouter empirical dataset.

OpenRouter's 100 trillion-token study, published in partnership with a16z in December 2025, offers the most granular observed-behaviour dataset available to independent analysts. The platform, serving over five million developers across 300-plus models, recorded proprietary models handling the majority of tokens throughout the study period, but open-weight models grew to roughly one-third of usage by late 2025. Within the open-weight segment, DeepSeek alone accumulated 14.37 trillion tokens, followed by Qwen at 5.59 trillion and Meta Llama at 3.96 trillion. The market fragmented sharply after what the study labels the "Summer Inflection": DeepSeek's near-monopoly of over 50% of open-source share in early 2025 collapsed as Qwen 3, Kimi K2, and the GPT-OSS variants entered the field. Tool use and agentic workloads showed a parallel shift, with reasoning-model tokens climbing from a negligible slice in Q1 2025 to over 50% of total usage by mid-year.

Self-hosted inference emerged as a mainstream business strategy rather than a researcher pastime. Practitioners broadly distinguish two deployment paths: Ollama for prototyping and single-developer use, and vLLM for production multi-user workloads, with the latter's PagedAttention mechanism cited as delivering substantially higher throughput at concurrent load. The driver in regulated sectors - healthcare, legal, finance, government - is compliance. HIPAA, GDPR, and SOC 2 create hard boundaries around data egress that push organisations toward on-premise or private-cloud deployment of open-weight models, primarily Llama and Mistral variants. By contrast, Menlo's enterprise survey found open-source adoption actually declining from 19% to 11% year-on-year among large enterprises, suggesting that the self-hosting trend is more pronounced in mid-market and developer-led organisations than in top-tier procurement-driven accounts.

The production-deployment gap remains the defining practical problem. The Metadata Weekly Substack cited an MIT figure that 95% of enterprise AI pilots never reached production in 2025, while ZenML's LLMOps Database, which crossed 1,200 catalogued case studies by December 2025, found that successful systems consistently combined LLMs with traditional deterministic rules and classical ML rather than delegating entirely to foundation models. On cost, practitioners writing on Medium and specialist FinOps blogs in 2025 and 2026 document a structural measurement problem: LLM bills arrive as undifferentiated totals that hide per-feature and per-user attribution. The practitioner consensus, drawn from multiple independent sources, treats LLM cost as an architectural discipline imposed at design time rather than a billing line read at month-end. Token prices dropped approximately 80% between early 2025 and early 2026, yet total enterprise spend tripled, illustrating classic Jevons-paradox dynamics that independent commentators noted explicitly.


Sources

ID Title Outlet Date Significance
b1 2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics Menlo Ventures 2025-07 Primary quantitative source on enterprise LLM API market share shift, recording Anthropic overtaking OpenAI and enterprise spend doubling to $8.4 billion in six months.
b2 2025: The State of Generative AI in the Enterprise Menlo Ventures 2025-12 Year-end enterprise survey of ~500 decision-makers documenting Anthropic at 40% LLM API share, open-source decline to 11%, and $37 billion total generative AI spend in 2025.
b3 State of AI: An Empirical 100 Trillion Token Study with OpenRouter OpenRouter / a16z 2025-12 Largest observed-behaviour dataset on LLM usage patterns, covering 100 trillion tokens and documenting open-weight model growth, reasoning model surge, and tool-use concentration.
b4 State of AI: An Empirical 100 Trillion Token Study with OpenRouter (arXiv preprint) arXiv 2026-01 Peer-accessible version of the OpenRouter/a16z study, with detailed methodology including DeepSeek's 14.37 trillion tokens, Qwen at 5.59 trillion, and Llama at 3.96 trillion.
b5 OpenRouter's 100 Trillion Token Study: The Real State of AI Usage in 2025 Adam Holter (personal blog) 2025-12 Independent analysis of the OpenRouter dataset, synthesising the dual-market structure thesis and the market fragmentation after the Summer Inflection.
b6 The State of AI in Q4 2025 Substack (Pat McGuinness) 2025-12 Independent Substack synthesis of Q4 2025 AI adoption data, citing Ramp card-data showing paid AI adoption at 43.8% of US businesses and Google reporting a 50x yearly increase in monthly tokens.
b7 I think Anthropic and OpenAI have found product-market fit Simon Willison's Weblog 2026-05 Simon Willison's practitioner observation that Anthropic's Enterprise plan shifted to API-usage billing by late 2025, with companies reporting surprising LLM bill sizes, signalling genuine production-scale adoption.
b8 The last six months in LLMs in five minutes Simon Willison's Weblog 2026-05 Practitioner summary of the November 2025 inflection point in LLM capability, covering the shift to RLVR-trained coding models across OpenAI and Anthropic.
b9 LLM predictions for 2026, shared with Oxide and Friends Simon Willison's Weblog 2026-01 First-principles prediction piece from a leading practitioner blogger, explicitly invoking Jevons paradox as the mechanism explaining why falling token prices do not reduce total spend.
b10 Agentic Engineering Patterns Substack (Simon Willison) 2026-02 Willison's Substack post covering the November 2025 inflection point and the emergence of agentic engineering as a distinct discipline from earlier LLM prompt-engineering workflows.
b11 What is agentic engineering? Simon Willison's Weblog 2026-03 Practitioner definition of agentic engineering, providing the architectural framing most cited in 2025-2026 discussions of production agent deployment across GPT-5, Gemini, and Claude.
b12 [Deep LLM 2026: From the Illusion of Model Development Stagnation to Large-Scale Real-World Agent Deployment](https://fundaai.substack.com/p/deepllm-2026-from-the-illusion-of) Substack (FundaAI) 2026-01
b13 The 2026 AI Reality Check: It's the Foundations, Not the Models Substack (Metadata Weekly) 2025-12 Substack analysis citing MIT data that 95% of enterprise AI pilots failed to reach production in 2025, arguing that data and governance foundations, not model selection, determine deployment success.
b14 Why Do LLM Applications Fail in Production? Substack (The Gen Academy) 2026-05 Detailed technical Substack post documenting that agentic token consumption runs at roughly 4x chat usage and multi-agent at 15x or more, explaining why production economics differ sharply from demo economics.
b15 What 1,200 Production Deployments Reveal About LLMOps in 2025 ZenML Blog 2025-12 Practitioner analysis of 1,200 catalogued LLMOps case studies, finding that successful production systems combine LLMs with deterministic rules rather than relying on foundation models alone.
b16 The Agent Deployment Gap: Why Your LLM Loop Isn't Production-Ready ZenML Blog 2025-07 Practitioner post identifying the structural gap between agent prototyping and production deployment, with patterns drawn from real deployments as of mid-2025.
b17 The AI Agents Stack (2026 Edition) O'Reilly Radar 2026-06 Maps the six-layer infrastructure required for production agents, documenting LangGraph's emergence as the graph-orchestration standard with confirmed deployments at Uber, JPMorgan, LinkedIn, and Klarna.
b18 The Rise of the Agent Runtime Work-Bench 2026-02 Documents agentic infrastructure cost shock with a case study showing costs jumping 10x from prototyping to staging, illustrating budget risk from unoptimised RAG and agent orchestration.
b19 LLM Token Costs Benchmarked: What Engineering and FinOps Leaders Actually Need to Know Cloudchipr 2026-05 Documents an approximately 80% drop in LLM API prices between early 2025 and early 2026 and argues for per-workload cost tracking over per-token pricing as the operative FinOps metric.
b20 FinOps for AI LLM Cost Governance Rick Pollick (personal blog) 2026-06 Synthesises Stanford AI Index data on inference cost decline alongside Menlo spend figures and FinOps Foundation survey showing 98% of practitioners now managing AI spend, framing the Jevons-paradox dynamic explicitly.
b21 LLM FinOps: Per-Feature Cost Attribution and Token Budgets Zop.dev 2026-05 Practitioner post documenting the per-feature attribution problem with a concrete example of a $48,000 monthly Anthropic bill that no one could break down by feature or customer.
b22 10 ML FinOps Habits to Right-Size Models, Right-Price Tokens Medium (Nexumo) 2025-12 Medium practitioner post framing LLM budget leakage as the norm and arguing that model routing, token caps, and per-feature tagging are the core habits of mature ML FinOps.
b23 Open-Weight AI Models Are Catching Up: What It Means for Enterprise Automation MindStudio 2026-05 Practitioner analysis comparing open-weight and closed models across production task categories, finding parity on coding, classification, and extraction but a persistent closed-model edge on complex multi-step reasoning.
b24 vLLM vs Ollama vs LocalAI: Best tools for self-hosting LLMs in 2025 eMasterLabs 2026-03 Practitioner comparison articulating the compliance-driven case for self-hosted LLMs in healthcare, legal, finance, and government under HIPAA, GDPR, and SOC 2 constraints.
b25 Self-Hosted LLM Guide: Costs, Architecture and Breakeven Point Alpacked 2026-05 Documents the canonical Ollama-to-vLLM migration path and the total cost of ownership components most teams undercount when evaluating self-hosted versus API deployment.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.