Codesigned with NVIDIA GPUs, CPUs, networking and systems, and strengthened by a broad open source ecosystem, NVIDIA's full-stack inference software continuously improves hardware performance. On the NVIDIA Blackwell platform, the software stack has already reduced token costs by up to 5x on the DeepSeek V4 model in just one month. NVIDIA's inference software stack lowers cost per token by connecting three layers, including production operation, application acceleration and infrastructure access. When these layers work as one system, individual optimizations compound. Combined the optimization with multiple technologies, the token throughput per GPU on the Blackwell platform can be increased by up to 20x.