TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Miso TTS 8B

MisoTTS is an 8B-parameter speech generation model from Miso Labs, announced on 03-06-2026. It generates speech from both text and audio context, letting the model condition output on a user’s tone rather than only on written text. Its architecture uses residual vector quantization with 32 codebook indices over 2048-way codebooks, a 7.7B-parameter temporal backbone, and a 300M-parameter depth decoder to generate expressive audio tokens. Miso Labs says the model weights are open source under a modified MIT license, with API access coming soon. Current limitations are that it models individual turns and half-duplex audio, but not full conversation turn-taking or full-duplex speech.
New Audio Gen 4
Released: June 3, 2026

Overview

MisoTTS is Miso Labs’ open-weight 8B text-and-audio-conditioned speech generation model for expressive, context-aware, emotive TTS and dialogue voice output.

About Miso Labs

Miso Labs builds the most emotive foundation models for voice. Its open-source flagship, Miso One (Miso TTS 8B), is an 8-billion-parameter text-to-speech model for highly expressive speech with ~110ms latency, one-shot voice cloning and on-premises deployment — helping developers build natural-sounding voice agents.

Industry: Artificial Intelligence
Company Size: 2
Location: San Francisco, California, US
Website: misolabs.ai
View Company Profile

Tools using Miso TTS 8B

No tools found for this model yet.

Last updated: June 4, 2026
0 AIs selected
Clear selection
#
Name
Task