Rubin GPUs
Rack-scale agentic AI supercomputer
Vera Rubin NVL72 for next-generation AI factories.
A technical guide to the NVIDIA Vera Rubin NVL72 concept: Rubin GPUs, Vera CPUs, NVLink-scale communication, accelerated networking, liquid-cooled racks, and infrastructure designed for frontier AI training and high-throughput inference.
Platform overview
Designed for the next phase of accelerated computing.
The Vera Rubin NVL72 reference is positioned around rack-scale AI infrastructure. It combines compute, memory, networking, cooling, and software into a platform for large models, reasoning systems, synthetic data, inference services, and agentic AI operations.
Vera CPUs
Host processing aligned with GPU-scale performance
Coordinate data movement, system services, orchestration, and workload management inside the rack.NVLink Fabric
Scale-up communication across the full rack
Fast GPU-to-GPU connectivity is central to keeping model parallel workloads efficient.AI Factory
Infrastructure for continuous model production
Support data pipelines, training, fine-tuning, evaluation, inference, and agentic application services.Rack-scale architecture
Compute, fabric, and cooling designed as one system.
Vera Rubin NVL72-style systems are not standalone servers arranged in a cabinet. The rack is the computer: accelerated compute, CPU support, scale-up interconnect, scale-out networking, power, and thermal design are planned together to keep utilization high.
Networking and system services
AI factories depend on the fabric around the rack.
The NVIDIA reference highlights technologies such as ConnectX-9, BlueField-4, and Spectrum-X as part of the broader Vera Rubin platform story. In practical terms, the network must carry model traffic, storage traffic, observability, security, and inference requests without becoming the bottleneck.
Performance themes
Built for training, inference, and reasoning at scale.
The Vera Rubin NVL72 positioning emphasizes efficiency across both model creation and model serving. The same AI factory may need to pretrain large models, fine-tune specialized systems, generate synthetic data, run evaluators, and serve agentic applications with strict latency targets.
- Large-model training with high GPU utilization and scale-up bandwidth.
- Inference fleets for language, vision, multimodal, retrieval, and reasoning services.
- Agentic AI workloads that call tools, coordinate memory, evaluate plans, and execute workflows.
Deployment path
How teams turn rack-scale hardware into an AI factory.
-
01
Plan power and cooling
Validate rack density, liquid cooling loops, redundancy, facility constraints, and service access.
-
02
Design the fabric
Map scale-up domains, scale-out networks, storage lanes, management paths, and security zones.
-
03
Build the software layer
Deploy schedulers, containers, drivers, telemetry, model pipelines, evaluation suites, and inference stacks.
-
04
Operate continuously
Monitor utilization, latency, faults, thermal margins, costs, model quality, and agentic workflow safety.
AI factory guide
Vera Rubin as a platform story, not just a chip story.
The strongest interpretation of Vera Rubin NVL72 is a full-stack system: accelerators, CPUs, NVLink, networking, DPUs, switches, software, operations, and data-center design all supporting the same goal of efficient AI production.
FAQ
Vera Rubin NVL72 questions
What is Vera Rubin NVL72?
Vera Rubin NVL72 is presented here as a rack-scale NVIDIA AI supercomputer platform concept combining Rubin GPUs, Vera CPUs, NVLink-scale communication, accelerated networking, liquid cooling, and AI factory operations.
What keywords does this page target?
The page targets Vera Rubin, NVIDIA Vera Rubin NVL72, Rubin GPU, Vera CPU, NVL72, rack-scale AI supercomputer, agentic AI, AI factory, NVLink 6, ConnectX-9, BlueField-4, Spectrum-X, AI training, and AI inference.
Why does rack-scale design matter?
Rack-scale design matters because modern AI workloads are limited by communication, memory movement, thermal density, scheduling, networking, and operations as much as by individual accelerator performance.