In a field long dominated by the belief that larger AI models are superior, a significant shift is on the horizon. With rising costs tied to deploying these advanced systems, many industry players are reassessing their strategies and considering smaller, more affordable alternatives. This transformation could redefine the standards for quality in AI services.
Current economic pressures have prompted users to seek more cost-effective models, a trend gaining traction across the industry. Brian Armstrong, co-founder of Coinbase, predicts that in the next 12 to 18 months, a remarkable 80% of AI workloads could be managed by models that are up to 99% less expensive. While a minority of tasks will still require high-performance models, the paradigm of AI deployment is poised to change.
Armstrong's predictions highlight a critical moment for AI companies, especially those like OpenAI and Anthropic, which are eyeing initial public offerings. If less expensive models can deliver similar performance, the financial implications could be substantial, shifting the economic balance away from leading labs. Previously, competition in AI focused on model sophistication and quality, but a shift toward efficiency over sheer power may redefine the sector.
Initial trials reveal promising insights into this emerging trend. For example, the legal AI company Harvey recently conducted tests showing that by combining Claude Opus and Fireworks AI’s GLM 5.1, it could reduce inference costs by threefold without compromising quality. This innovative approach not only cut server time but also demonstrated that effective tasks could be managed with smaller models, challenging the long-held belief that bigger is better.
Gabe Pereyra, co-founder of Harvey, emphasized this evolving definition of quality: "Quality comes first, and in legal it always will. However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently." This sentiment resonates throughout the industry as more companies adopt a cost-conscious mindset.
This change is not just a competition between proprietary systems and open-weight alternatives; it indicates a deeper division between large and small models. The ability to transition from heavyweight models like GPT-5.5 to smaller options, such as DeepSeek’s V4 Flash or even GPT-5.4-mini, without sacrificing performance could reshape operational strategies for AI firms.
As companies navigate these shifts, the future of AI infrastructure will likely depend on their readiness to adapt. The traditional preference for the most powerful technologies may soon yield to a more nuanced approach that prioritizes efficiency and cost-effectiveness. If Armstrong’s forecast proves accurate, the AI industry will need to recalibrate not only its economic models but also its understanding of quality and effectiveness in artificial intelligence.
With financial pressures mounting, the AI sector stands at a crossroads, holding the potential for a seismic shift in how intelligence is delivered and consumed. The coming months will determine whether this trend toward smaller models will take root, reshaping the dynamics of AI development and deployment for years to come.
Quick answers
What is the main prediction about AI model usage?
Brian Armstrong predicts that 80% of workloads will rely on models that are 99% cheaper within 12-18 months.
How have initial tests with cheaper models performed?
Initial tests, such as those conducted by Harvey, show that cheaper models can reduce costs significantly without compromising quality.
What does the shift from large to small models signify?
It indicates a change in the AI industry’s focus from purely maximizing model power to prioritizing efficiency and cost-effectiveness.
The stories that move AI & crypto markets — before the market reacts.
Free. 7am ET. Five stories. 62,400 readers.



