Fish Audio STT

Fish Audio STT

Model family: Fish

Fish Audio Speech-to-Text is a speech recognition product built for practical media transcription rather than generic ASR demos. Fish Audio describes it as optimized for conversational and multi-speaker recordings such as podcasts, interviews, and live discussions, with automatic speaker detection, timestamped segments, and inline tagging for events like pauses, sighs, emphasis, and breath. The tool supports more than 100 languages, accepts 24 audio and video formats, and exports structured outputs in SRT, VTT, or JSON. Based on the linked page, the public emphasis is on usability and rich transcript structure more than on architecture details or benchmark claims.

Overview

Fish Audio Speech-to-Text is Fish Audio’s transcription model and tool for podcasts, interviews, and conversational recordings. It converts audio to text with speaker labels, timestamps, and inline emotion or paralanguage tags, supports 100+ languages, accepts 24 audio and video formats, and exports transcripts as SRT, VTT, or JSON without requiring code.

🗒Transcription 🌍Multilingual subtitle creation

About Fish Audio

View Company Profile

Last updated: July 21, 2026

Go to section

Search

Overview

About Fish Audio

Other models from this family

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: