OmniVinci

OmniVinci

OmniVinci is an omni-modal transformer designed for unified perception across visual, auditory, and textual streams. It introduces new architectural and data-curation strategies to efficiently train a 9B-parameter model on roughly 0.2 trillion tokens while still reaching state-of-the-art performance on multi-modal benchmarks. The system supports tasks like video question answering, audio understanding, captioning, and cross-modal retrieval, targeting use cases where a single model needs to reason over complex, time-varying inputs.

Overview

OmniVinci is NVIDIA’s 9B omni-modal LLM that jointly understands images, video, audio, and text, achieving strong cross-modal reasoning with only about 0.2T training tokens.

🖼️Image generation 📷Images 💬Chatting 🎥Videos

About NVIDIA

Industry: Computer Hardware Manufacturing

Company Size: 36000

Location: Santa Clara, California, US

Website: nvidia.com

View Company Profile

Tools using OmniVinci

No tools found for this model yet.

Last updated: February 25, 2026

Search

Overview

About NVIDIA

Tools using OmniVinci

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: