Fish Audio S2

Fish Audio S2

Fish Audio S2 is a dual-autoregressive TTS system trained on over 10 million hours of audio across roughly 50 languages, aligned with reinforcement learning, and built to support word-level, inline natural-language control tags (for example [laugh], [whispers], [super happy]) for prosody and emotion. It also supports native multi-speaker, multi-turn generation and is released together with a production streaming inference stack (SGLang-based), model weights, and fine-tuning code.

Overview

Fish Audio S2 is Fish Audio’s latest text-to-speech model designed for natural, emotionally rich speech generation with fine-grained prosody control and native multi-speaker dialogue.

About Fish Audio

View Company Profile

Tools using Fish Audio S2

No tools found for this model yet.

Last updated: March 11, 2026

Search

Overview

About Fish Audio

Tools using Fish Audio S2

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: