On March 16th (at GTC 2026), NVIDIA officially released the BlueField-4 STX modular storage reference architecture, specifically designed for long-context inference in AI agents, breaking through traditional storage bottlenecks. Based on the Vera Rubin platform and driven by the BlueField-4 DPU, this architecture combines Vera CPU and ConnectX-9 SuperNIC to bring context storage closer to the GPU, building a high-performance storage layer.
Compared to traditional architectures, it offers up to 5x higher token processing capacity per second, 4x better energy efficiency, and double the data ingestion speed. Currently, this architecture is supported by cloud service providers such as CoreWeave, Oracle OCI, and Mistral AI. Dell, HPE, IBM, and other vendors are using it to design their next-generation AI infrastructure, with related systems expected to be available in the second half of 2026.