TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

VibeVoice

VibeVoice is Microsoft’s open-source long-form text-to-speech model and framework, able to synthesize up to roughly 90 minutes of multi-speaker audio from text. It uses ultra-low-rate acoustic and semantic speech tokenizers to keep quality while scaling to long dialogues, supports several model sizes like VibeVoice-1.5B, and targets podcast-style conversations, dialogue generation and accessibility use cases, with an online demo and Hugging Face deployment options.
New Audio Gen 4
Released: December 9, 2025

Overview

VibeVoice is Microsoft’s open-source frontier TTS framework that turns long text into expressive multi-speaker conversational audio, generating podcast-style speech with natural turn-taking in English and Mandarin

About Microsoft

Microsoft is a technology company that offers a wide range of software, cloud computing services, hardware, and artificial intelligence solutions.

Industry: Software Development
Company Size: 228000+
Location: Redmond, Washington, US
View Company Profile

Tools using VibeVoice

No tools found for this model yet.

Last updated: December 9, 2025
0 AIs selected
Clear selection
#
Name
Task