NVIDIA's latest Blackwell Ultra platform has demonstrated notable performance improvements, achieving up to 50 times higher throughput per megawatt compared to its predecessor, the Hopper platform. This leap in efficiency enhances the capabilities of AI agents and reduces operational costs by 35% per token. Such advancements are set to transform inference applications in the AI sector.
The demand for AI agents and coding assistants has surged, with software-programming-related AI queries rising from 11% to nearly 50% in just one year, according to OpenRouter’s State of Inference report. This dramatic increase highlights the pressing need for low-latency performance to ensure real-time responsiveness in complex, multi-step workflows that require extensive reasoning capabilities across entire codebases.
New performance data from SemiAnalysis InferenceX shows that NVIDIA's ongoing innovations in software and hardware are yielding results. The GB300 NVL72 systems have achieved over ten times more tokens per watt, a performance boost that is expected to grow with further optimizations in the underlying technology stack.

NVIDIA’s focus on extreme codesign—integrating chips, system architecture, and software—has been crucial in driving these improvements. Enhanced GPU kernels have been optimized for both efficiency and low latency, maximizing the compute capabilities of the Blackwell architecture. Features like NVLink Symmetric Memory enable direct GPU-to-GPU communication, further improving performance.
The efficiency of these systems has been strengthened by continuous refinements from the NVIDIA TensorRT-LLM, NVIDIA Dynamo, Mooncake, and SGLang teams. Recent updates to the TensorRT-LLM library have resulted in a fivefold increase in performance for low-latency workloads on the GB200 platform within just a few months.
As inference technologies become increasingly vital to AI production, the significance of long-context performance and token efficiency is paramount. NVIDIA's advancements in the Blackwell Ultra platform position the company as a leader in AI infrastructure and a key player in the evolving dynamics of agentic AI applications. This evolution is likely to reshape expectations for performance and cost-effectiveness in AI-driven tasks, influencing a broader range of industries that rely on AI solutions in the future.
The stories that move AI & crypto markets — before the market reacts.
Free. 7am ET. Five stories. 62,400 readers.

