TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

OmniVoice

By Xiaomi
OmniVoice is a state-of-the-art multilingual TTS model from the Xiaomi Next-gen Kaldi team, built for zero-shot speech generation across more than 600 languages. Public project materials describe it as using a novel diffusion language model-style architecture over discrete audio tokens, with support for three main modes: voice cloning from reference audio, voice design from speaker attributes, and automatic voice generation without a reference clip. The project also highlights very fast inference, with reported real-time factors as low as 0.025, and an Apache 2.0 open-source release.
New Multimodal Gen 3
Released: April 2, 2026

Overview

OmniVoice is a multilingual zero-shot text-to-speech model built for voice cloning, voice design, and general speech synthesis at massive language scale. It supports more than 600 languages, uses a diffusion language model-style architecture, and is positioned for high-quality speech generation with fast inference.

About Xiaomi

At Xiaomi, we believe technology’s the true power lies in its ability to understand and enhance the human experience. This year, with the upgraded Xiaomi HyperOS 2 and the newly launched Xiaomi HyperAI, we are redefining connection.

Company Size: 43690
Location: Beijing, CN
Website: mi.com
View Company Profile

Tools using OmniVoice

No tools found for this model yet.

Last updated: April 7, 2026
0 AIs selected
Clear selection
#
Name
Task