NVIDIA has released Nemotron 3 Nano Omni, an open-source omni-modal model that continues the "Nano" positioning, emphasizing high cost-performance and inference efficiency. The model involves approximately 30 billion total parameters and supports ultra-long context windows of up to one million tokens.
The model adopts a 30B-A3B mixture-of-experts architecture and integrates Mamba layers with Transformer layers at the architectural level. The Mamba layer is responsible for improving long-sequence processing efficiency and memory utilization, while the Transformer layer ensures inference accuracy. Official data shows that this hybrid design can increase memory and computational efficiency by up to fourfold.
According to the industry benchmark MediaPerf, the Nemotron 3 Nano Omni achieves the highest throughput across all evaluated tasks and reaches the lowest inference cost in video-level annotation tasks. Under a fixed user interaction latency threshold, the model delivers an effective system capacity approximately 9.2 times higher than other open full-modal models in video reasoning tasks, and about 7.4 times higher in multi-document reasoning tasks.
AI and software companies that have already adopted Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir, and Pyler, while Dell Technologies, DocuSign, Infosys, K-Dense, Lila, Oracle, and Zefr are currently evaluating the model.