Perplexity AI Leverages Nvidia’s GB200 for Enhanced Model Performance

Perplexity AI has introduced a significant upgrade in AI model deployment by using Nvidia’s latest GB200 NVL72 hardware. This transition represents a major leap in performance metrics, especially for the company’s post-trained Qwen3 235B mixture-of-experts (MoE) models. These advancements improve operational efficiency and reshape expectations for AI hardware utilization.

Performance Enhancements with GB200

The GB200 NVL72 racks, featuring 72 high-bandwidth memory GPUs each, demonstrate impressive technological advancements. Each GPU has 180 GB of memory interconnected via 72-way NVLink, providing an extraordinary 1,800 GB/s of bandwidth. This setup is essential for supporting high-throughput inference on large models, going beyond just training capabilities.

Perplexity's technical research reveals a significant reduction in latency for key operations. Latency for NVLink all-reduce operations has decreased from 586.1 microseconds on the previous Hopper generation to 313.3 microseconds with the GB200, a 46% reduction. The time required for MoE prefill combines has also dropped from 730.1 microseconds to 438.5 microseconds, marking a 40% improvement. These metrics highlight how the new hardware can better meet the increasing demands of AI processes.

Illustrative visual for: Perplexity AI Leverages Nvidia's GB200 for Enhanced Model Performance

Cost Efficiency and Real-Time Inference

Even more impressive is the reported ability to achieve up to 30 times real-time inference capability for certain configurations compared to the H100 baselines. This performance leap leads to significant cost reductions in inference execution, making it economically viable to deploy large MoE models at scale.

The combination of advanced hardware and sophisticated software optimizations drives these gains. By implementing Blackwell-native quantization, Perplexity reduces model weight precision without compromising output quality. The separation of prompt processing from token generation, known as prefill/decode disaggregation, further streamlines model performance. Custom kernels designed for the 235-billion-parameter MoE model optimize the entire process, boosting overall efficiency.

Implications for the AI Hardware Landscape

The effects of these advancements extend beyond Perplexity AI. As the AI hardware race heats up, the GB200 NVL72 setup represents a significant shift in how large-scale AI models will be deployed in the future. The ability to drastically lower inference costs while improving output quality sets a new benchmark for both hardware manufacturers and AI developers. With these developments, Nvidia strengthens its position as a key player in the AI infrastructure market, as companies increasingly seek innovative solutions for managing complex models.

As AI applications continue to expand, the demand for powerful, efficient hardware solutions like the GB200 is expected to rise. Perplexity's successful integration of this technology not only demonstrates the potential of Nvidia's advancements but also marks a crucial moment in the ongoing evolution of AI infrastructure.

CoinSynaptic Desk

AI Infrastructure · 2,404 stories

CoinSynaptic Desk covers the intersection of artificial intelligence and decentralized networks — frontier AI infrastructure, crypto-native AI agents, Bittensor subnets, DePIN economies, and tokenized compute.

Perplexity AI Leverages Nvidia’s GB200 for Enhanced Model Performance

Performance Enhancements with GB200

Cost Efficiency and Real-Time Inference

Implications for the AI Hardware Landscape

CoinSynaptic Desk

The stories that move AI & crypto markets — before the market reacts.

More from AI Infrastructure

Bridging the Gap: The Infrastructure Needs for Enterprise AI Agents

MVP1 Ventures Launches AI Agents-as-a-Service to Streamline Business Workflows

AI Agents Require Oversight to Prevent Unintended Consequences

KKR Unveils $10B Helix Digital Infrastructure Platform for AI