TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

StepAudio 2.5 TTS

By StepFun
StepAudio 2.5 TTS is a contextual speech synthesis model designed to make generated voice act, not just narrate. StepFun describes it as the first TTS system to integrate contextual understanding through the whole speech-generation pipeline, using dual-level control with Global Context and Inline Context plus zero-shot voice cloning. The official documentation lists a 1,000-character maximum input per request and positions it for richer, more expressive spoken output with natural-language guidance over delivery style.
New Multimodal Gen 3
Released: April 16, 2026

Overview

StepAudio 2.5 TTS is StepFun’s contextual text-to-speech model with performance-oriented vocal control. It combines global and inline context guidance with zero-shot voice cloning so generated speech can follow broader style instructions as well as local delivery details, rather than just reading text flatly.

About StepFun

StepFun AI is a creative AI assistant platform that offers chat, search, coding help, and multimodal generation (images, audio, video), plus research models like Step3 and NextStep-1.

Company Size: 400
Location: Shanghai, Xuhui District, CN
Website: stepfun.ai
View Company Profile

Tools using StepAudio 2.5 TTS

No tools found for this model yet.

Last updated: April 22, 2026
0 AIs selected
Clear selection
#
Name
Task