TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Fish Audio S2

Fish Audio S2 is a dual-autoregressive TTS system trained on over 10 million hours of audio across roughly 50 languages, aligned with reinforcement learning, and built to support word-level, inline natural-language control tags (for example [laugh], [whispers], [super happy]) for prosody and emotion. It also supports native multi-speaker, multi-turn generation and is released together with a production streaming inference stack (SGLang-based), model weights, and fine-tuning code.
New Audio Gen 4
Released: March 9, 2026

Overview

Fish Audio S2 is Fish Audio’s latest text-to-speech model designed for natural, emotionally rich speech generation with fine-grained prosody control and native multi-speaker dialogue.

About Fish Audio

View Company Profile

Tools using Fish Audio S2

No tools found for this model yet.

Last updated: March 11, 2026
0 AIs selected
Clear selection
#
Name
Task