VibeVoice

VibeVoice

VibeVoice is Microsoft’s open-source long-form text-to-speech model and framework, able to synthesize up to roughly 90 minutes of multi-speaker audio from text. It uses ultra-low-rate acoustic and semantic speech tokenizers to keep quality while scaling to long dialogues, supports several model sizes like VibeVoice-1.5B, and targets podcast-style conversations, dialogue generation and accessibility use cases, with an online demo and Hugging Face deployment options.

Overview

VibeVoice is Microsoft’s open-source frontier TTS framework that turns long text into expressive multi-speaker conversational audio, generating podcast-style speech with natural turn-taking in English and Mandarin

🔊Text to speech 🌐Websites 📚Book writing 📚Academic writing

About Microsoft

Microsoft is a technology company that offers a wide range of software, cloud computing services, hardware, and artificial intelligence solutions.

Industry: Technology, Information and Internet

Company Size: 228000+

Location: Redmond, Washington, US

Website: microsoft.com

View Company Profile

Tools using VibeVoice

No tools found for this model yet.

Last updated: February 25, 2026

Search

Overview

About Microsoft

Tools using VibeVoice

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: