Miso TTS 8B

Miso TTS 8B

MisoTTS is an 8B-parameter speech generation model from Miso Labs, announced on 03-06-2026. It generates speech from both text and audio context, letting the model condition output on a user’s tone rather than only on written text. Its architecture uses residual vector quantization with 32 codebook indices over 2048-way codebooks, a 7.7B-parameter temporal backbone, and a 300M-parameter depth decoder to generate expressive audio tokens. Miso Labs says the model weights are open source under a modified MIT license, with API access coming soon. Current limitations are that it models individual turns and half-duplex audio, but not full conversation turn-taking or full-duplex speech.

Overview

MisoTTS is Miso Labs’ open-weight 8B text-and-audio-conditioned speech generation model for expressive, context-aware, emotive TTS and dialogue voice output.

🔊Text to speech 🗣️Voice cloning 🎙️Voiceovers 🗣Dialogue generation

About Miso Labs

Miso Labs builds the most emotive foundation models for voice. Its open-source flagship, Miso One (Miso TTS 8B), is an 8-billion-parameter text-to-speech model for highly expressive speech with ~110ms latency, one-shot voice cloning and on-premises deployment — helping developers build natural-sounding voice agents.

Industry: Artificial Intelligence

Company Size: 2

Location: San Francisco, California, US

Website: misolabs.ai

View Company Profile

Last updated: July 21, 2026

Go to section

Search

Overview

About Miso Labs

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: