OmniVoice

OmniVoice

OmniVoice is a state-of-the-art multilingual TTS model from the Xiaomi Next-gen Kaldi team, built for zero-shot speech generation across more than 600 languages. Public project materials describe it as using a novel diffusion language model-style architecture over discrete audio tokens, with support for three main modes: voice cloning from reference audio, voice design from speaker attributes, and automatic voice generation without a reference clip. The project also highlights very fast inference, with reported real-time factors as low as 0.025, and an Apache 2.0 open-source release.

Overview

OmniVoice is a multilingual zero-shot text-to-speech model built for voice cloning, voice design, and general speech synthesis at massive language scale. It supports more than 600 languages, uses a diffusion language model-style architecture, and is positioned for high-quality speech generation with fast inference.

🔊Text to speech 🎤Voice changing 🗣️Voice cloning 🌐Multilingual communication

About Xiaomi

Consumer electronics and smart device company making smartphones, wearables, IoT products, home appliances, smart TVs, scooters, and connected lifestyle hardware.

Industry: Consumer Electronics

Company Size: 43690

Location: Beijing, CN

Website: mi.com

View Company Profile

Last updated: July 7, 2026

Go to section

Search

Overview

About Xiaomi

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: