AI INFRASTRUCTURE

Perplexity AI Leverages Nvidia’s GB200 for Enhanced Model Performance

Perplexity AI has deployed Qwen3 235B models on Nvidia's GB200 racks, achieving remarkable improvements in speed and cost, signalling a pivotal shift in AI capabilities.

Perplexity AI Leverages Nvidia’s GB200 for Enhanced Model Performance Image by rwindr on Pixabay
CoinSynaptic Desk
AI INFRASTRUCTURE · Correspondent
· PUBLISHED MAY 15, 2026 · UPDATED 12:30 ET · 2 MIN READ

Perplexity AI has introduced a significant upgrade in AI model deployment by using Nvidia’s latest GB200 NVL72 hardware. This transition represents a major leap in performance metrics, especially for the company’s post-trained Qwen3 235B mixture-of-experts (MoE) models. These advancements improve operational efficiency and reshape expectations for AI hardware utilization.

Performance Enhancements with GB200

The GB200 NVL72 racks, featuring 72 high-bandwidth memory GPUs each, demonstrate impressive technological advancements. Each GPU has 180 GB of memory interconnected via 72-way NVLink, providing an extraordinary 1,800 GB/s of bandwidth. This setup is essential for supporting high-throughput inference on large models, going beyond just training capabilities.

Perplexity's technical research reveals a significant reduction in latency for key operations. Latency for NVLink all-reduce operations has decreased from 586.1 microseconds on the previous Hopper generation to 313.3 microseconds with the GB200, a 46% reduction. The time required for MoE prefill combines has also dropped from 730.1 microseconds to 438.5 microseconds, marking a 40% improvement. These metrics highlight how the new hardware can better meet the increasing demands of AI processes.

Illustrative visual for: Perplexity AI Leverages Nvidia's GB200 for Enhanced Model Performance

Cost Efficiency and Real-Time Inference

Even more impressive is the reported ability to achieve up to 30 times real-time inference capability for certain configurations compared to the H100 baselines. This performance leap leads to significant cost reductions in inference execution, making it economically viable to deploy large MoE models at scale.

The combination of advanced hardware and sophisticated software optimizations drives these gains. By implementing Blackwell-native quantization, Perplexity reduces model weight precision without compromising output quality. The separation of prompt processing from token generation, known as prefill/decode disaggregation, further streamlines model performance. Custom kernels designed for the 235-billion-parameter MoE model optimize the entire process, boosting overall efficiency.

See also  Calibre Secures $3.3M to Enhance AI Use in Certification Processes

Implications for the AI Hardware Landscape

The effects of these advancements extend beyond Perplexity AI. As the AI hardware race heats up, the GB200 NVL72 setup represents a significant shift in how large-scale AI models will be deployed in the future. The ability to drastically lower inference costs while improving output quality sets a new benchmark for both hardware manufacturers and AI developers. With these developments, Nvidia strengthens its position as a key player in the AI infrastructure market, as companies increasingly seek innovative solutions for managing complex models.

As AI applications continue to expand, the demand for powerful, efficient hardware solutions like the GB200 is expected to rise. Perplexity's successful integration of this technology not only demonstrates the potential of Nvidia's advancements but also marks a crucial moment in the ongoing evolution of AI infrastructure.

CoinSynaptic Desk

AI Infrastructure · 1,335 stories

CoinSynaptic Desk covers the intersection of artificial intelligence and decentralized networks — frontier AI infrastructure, crypto-native AI agents, Bittensor subnets, DePIN economies, and tokenized compute.

THE DAILY SIGNAL

The stories that move AI & crypto markets — before the market reacts.

Free. 7am ET. Five stories. 62,400 readers.