MOVA

By OpenMOSS

MOVA is a diffusion based video-audio generator designed to move beyond silent or cascaded pipelines. It synthesizes visuals and sound together, improving alignment and reducing error buildup. The model uses an asymmetric dual tower architecture for video and audio, fused with cross-attention, and targets multilingual lip sync, sound effects and high fidelity output, with weights, training configs and LoRA tools released.

New Video Gen 3

Released: February 1, 2026

Overview

Open source foundation model that jointly generates video and audio in one pass, achieving tightly synchronized lip movements and environment-aware sound effects.

🎥Videos 🔊Text to speech 🎵Music 🎬Animations