TAAFT
Free mode
100% free
Freemium
Free Trial
Deals
January 9, 2026
ElevenLabs Scribe icon

ElevenLabs Scribev2Version updatev2Jan 9, 2026

Live real-time speech-to-text model — built for streaming transcription with very low latency (~150 ms) for live voice interactions.

Ultra-low latency performance — instant speech transcription ideal for conversational AI, voice agents, meetings, and live captioning. 

High accuracy across many languages — supports 90+ languages with strong real-world performance and benchmark scores. 

Predictive streaming (“negative latency”) — anticipates next words and punctuation to reduce delays. 

Automatic language detection — the model detects and switches languages mid-conversation. 

Advanced streaming controls — includes manual commit control, text conditioning, and voice activity detection (VAD). 

Broad audio format support — works with PCM (8–48 kHz) and μ-law audio for compatibility across use cases.

Use tool
Inputs:
APIVideoAudio
Outputs:
APIText
The most accurate Speech to Text models
ElevenLabs Scribe website
Featured alternatives Supernormal App Supernormal App
202,388
Pulse by smallest.ai Pulse by smallest.ai
694
Velma Transcribe by Modulate Velma Transcribe by Modulate
732
Realms of Shod Realms of Shod
716
Yakki Yakki
284
VoiceAppear VoiceAppear
814
Link To Text | Video To Text Link To Text | Video To Text
659
InstantTranscriber InstantTranscriber
461
Ownvox Ownvox
143
24,220 elevenlabs.io

Overview

ElevenLabs Speech to text is a speech-to-text model that specializes in converting speech into text with remarkable accuracy across multiple contexts and languages.

It houses two main features namely, Scribe v2 and Scribe v2 Realtime. The former focuses on the transcription of audio and video content into text, perfect for creating captions, subtitles, and editable transcripts for various forms of recorded content.

It stands out for its ability to accurately transcribe specific words based on context, marked sound events in transcripts, and distinguish and label every speaker in a dialogue.

The latter, Scribe v2 Realtime, is designed for real-time applications with an emphasis on things like live calls, meetings, or AI agents requiring immediate transcription.

It uses a streaming-first architecture to provide real-time results while still maintaining accuracy. It also includes features like precision speech segmentation for smoother live processing and voice activity detection.

Both versions of Scribe support over 90 languages and can be incorporated into your products using their API.

Supported features

Show more

Releases

Get notified when a new version of ElevenLabs Scribe is released
ElevenLabs Scribe icon
ElevenLabs Scribe v2
Jan 9, 2026
Live real-time speech-to-text model — built for streaming transcription with very low latency (~150 ms) for live voice interactions.

Ultra-low latency performance — instant speech transcription ideal for conversational AI, voice agents, meetings, and live captioning. 

High accuracy across many languages — supports 90+ languages with strong real-world performance and benchmark scores. 

Predictive streaming (“negative latency”) — anticipates next words and punctuation to reduce delays. 

Automatic language detection — the model detects and switches languages mid-conversation. 

Advanced streaming controls — includes manual commit control, text conditioning, and voice activity detection (VAD). 

Broad audio format support — works with PCM (8–48 kHz) and μ-law audio for compatibility across use cases.
5 0
By unverified author Claim this AI

Pricing

Pricing model
Freemium
Paid options from
$5/month
Billing frequency
Monthly
Refund policy
No Refunds
Save

Other tools by Eleven Labs

#2517 1 0
0 AIs selected
Clear selection
#
Name
Task