TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

OmniVinci

By NVIDIA
OmniVinci is an omni-modal transformer designed for unified perception across visual, auditory, and textual streams. It introduces new architectural and data-curation strategies to efficiently train a 9B-parameter model on roughly 0.2 trillion tokens while still reaching state-of-the-art performance on multi-modal benchmarks. The system supports tasks like video question answering, audio understanding, captioning, and cross-modal retrieval, targeting use cases where a single model needs to reason over complex, time-varying inputs.
New Image Gen 4
Released: May 31, 2025

Overview

OmniVinci is NVIDIA’s 9B omni-modal LLM that jointly understands images, video, audio, and text, achieving strong cross-modal reasoning with only about 0.2T training tokens.

About NVIDIA

Industry: Computer Hardware Manufacturing
Company Size: 36000
Location: Santa Clara, California, US
Website: nvidia.com
View Company Profile

Tools using OmniVinci

No tools found for this model yet.

Last updated: February 25, 2026
0 AIs selected
Clear selection
#
Name
Task