NVIDIA recently announced that the first units of its Vera CPU, the company's first processor designed specifically for Agentic AI, have been delivered to initial customers, including Anthropic, OpenAI, SpaceX AI, and Oracle Cloud Infrastructure (OCI).
The Vera CPU is NVIDIA's first fully self-designed data center CPU and the successor to the Grace processor. Unlike Grace, which was positioned primarily as a host processor paired with GPUs, Vera is specifically engineered for agentic AI workloads, undertaking the following key tasks:
Orchestration and scheduling, tool calling, reinforcement learning training and data analysis, agent sandboxing and isolation, and long-context state management.
The Vera CPU utilizes NVIDIA's next-generation custom Arm architecture, codenamed Olympus. It features 88 cores and 176 threads, with up to 1.5TB of system memory (three times that of Grace), memory bandwidth of 1.2 TB/s, NVLink-C2C interconnect bandwidth of 1.8 TB/s, and support for rack-level confidential computing.
For its memory solution, Vera CPU uses LPDDR5X packaged in SOCAMM modules. NVIDIA's primary consideration behind this choice is energy efficiency. Compared to traditional DDR5, LPDDR5X delivers high bandwidth while consuming significantly less power. The company states that Vera leads the industry in performance per watt.
It is worth noting that a single Vera CPU will consume a substantial amount of LPDDR5X memory (1.5TB capacity). As Vera ramps up to high-volume shipments, demand for LPDDR5X DRAM is expected to increase significantly, potentially leading to further tightening of the supply chain.
While market focus has historically been on GPU computing power, the CPU is re-emerging as a critical component in the era of agentic AI. NVIDIA points out that in long-context reasoning, tool calling, reinforcement learning sandboxes, and multi-agent workflows, a significant amount of work actually occurs on the CPU. The CPU-to-GPU ratio in AI servers is gradually shifting from past ratios like 1:4 or 1:8 towards 1:1, and in some future scenarios, the number of CPUs may even exceed that of GPUs.
It is understood that the Vera CPU will be delivered in two form factors: as a standalone LPX server, or as the host processor within the Vera Rubin NVL72 rack.