The most accurate Speech to Text models

Open

January 9, 2026

2026 Rank: #2517

ElevenLabs Scribev2

United States Eleven Labs Transcription

24,944

No ratings

Use tool Copy 🔗

24,944

No ratings

Inputs:

Outputs:

Scribe v2 Realtime

The most accurate Speech to Text models

API SpeechToText Transcription RealTimeTranscription AudioToText MultilingualTranscriptionFree + from $5/mo

Overview

Overview Releases Alternatives Pricing Pros & Cons Prompts Reviews Q&A

Featured alternatives

Velma Transcribe by Modulate

Link To Text | Video To Text

Overview Discussion

24,220 elevenlabs.io

Overview

ElevenLabs Speech to text is a speech-to-text model that specializes in converting speech into text with remarkable accuracy across multiple contexts and languages.

It houses two main features namely, Scribe v2 and Scribe v2 Realtime. The former focuses on the transcription of audio and video content into text, perfect for creating captions, subtitles, and editable transcripts for various forms of recorded content.

It stands out for its ability to accurately transcribe specific words based on context, marked sound events in transcripts, and distinguish and label every speaker in a dialogue.

The latter, Scribe v2 Realtime, is designed for real-time applications with an emphasis on things like live calls, meetings, or AI agents requiring immediate transcription.

It uses a streaming-first architecture to provide real-time results while still maintaining accuracy. It also includes features like precision speech segmentation for smoother live processing and voice activity detection.

Both versions of Scribe support over 90 languages and can be incorporated into your products using their API.

Supported features

Releases

ElevenLabs Scribev2

ElevenLabs ScribeInitial

Get notified when a new version of ElevenLabs Scribe is released

Notify me

ElevenLabs Scribe v2

Jan 9, 2026

Live real-time speech-to-text model — built for streaming transcription with very low latency (~150 ms) for live voice interactions.

Ultra-low latency performance — instant speech transcription ideal for conversational AI, voice agents, meetings, and live captioning.

High accuracy across many languages — supports 90+ languages with strong real-world performance and benchmark scores.

Predictive streaming (“negative latency”) — anticipates next words and punctuation to reduce delays.

Automatic language detection — the model detects and switches languages mid-conversation.

Advanced streaming controls — includes manual commit control, text conditioning, and voice activity detection (VAD).

Broad audio format support — works with PCM (8–48 kHz) and μ-law audio for compatibility across use cases.

5 0

+ Submit new release

By unverified author Claim this AI

Organization Eleven Labs 🇺🇸 United States

Pricing

Pricing model

Freemium

Paid options from

$5/month

Billing frequency

Monthly

Refund policy

No Refunds

Use tool

Save

🔗 Copy link

🗳️ Vote Best AI Tool

Featured

Transcription ElevenLabs Scribe

United States Eleven Labs Transcription

24,944

No ratings

Overview Releases Alternatives Pricing Pros & Cons Prompts Reviews Q&A

Use tool

Save

Other tools by Eleven Labs

ElevenLabs v3

Create lifelike AI voices for compelling storytelling.

Text to speech

Open

77,981 www.elevenlabs.io

Grzegorz Rolnik

🙏 100 karma

Aug 3, 2023

@ElevenLabs

too expensive for me, I just want to make memes, not pay that much

9432 Reply Share Edit Delete Report

Share

🇺🇸 United States
Released 4mo ago
Free + from $3/mo

162,331
748
4.1
ElevenLabs Dubbing

Dub videos into 28 languages instantly

Video dubbing

Open

1,925 elevenlabs.io

Sayanwita Khaskel

🙏 42 karma

Oct 16, 2023

@ElevenLabs Dubbing

Seemed good UI at first. But the quality is not good at all.

3 Reply Share Edit Delete Report

Share

🇺🇸 United States
Released 2y ago
Free + from $5/mo

7,420
55
2.7
ElevenAgents

Deploy conversational AI agents in minutes, not months.

Voice agents

Open

Share

🇺🇸 United States
Released 9mo ago
No pricing

183
3
2.5
ElevenCreative

AI creative platform to bring content to life

Content

Open

John Ponder

Mar 14, 2026

@ElevenCreative

No free use because I use a VPN?? I'm out. I should have a choice to stay private if I want to and still be able to review the product without penalty. Others allow it - I will use them before buying. If you have that big of an issue with what you feel are multiple accounts using your free services, then reduce the amount of free use.

Reply Share Edit Delete Report

Share

🇺🇸 United States
Released 3mo ago
No pricing

552
11
1.0

Top alternatives

Voicetype AI v1.9.41

Write 9x Faster with AI Speech to Text on all Apps

217,887 voicetype.ai

Share

Released 4mo ago
Free + from $13.59/mo

235,704
183
5.0
AssemblyAI

Build Voice AI Apps With Insanely Accurate Speech-to-Text

162,687 www.assemblyai.com

Mery

🙏 82 karma

May 16, 2025

@AssemblyAI

One of the most accurate API's I've used for speech to text and summarization. Cost effective w/ bulk contracts too.

668 Reply Share Edit Delete Report

Share

🇺🇸 United States
Released 8y ago
No pricing

173,496
131
4.2
Transcript LOL v3.1

Unlimited transcripts, summaries, 99.8% accuracy, speaker recognition, superfast

78,694 transcript.lol

dunn

🙏 13 karma

Aug 3, 2024

@Transcript LOL

I already have another transcription tool, but this one is much better. I love the different features such as the summary, quiz, and chapters. It does a great job of them. I've only done one transcript so far to try it out, but I'm truly impressed and am going to grab another code. A couple things that would make it even better are: - the ability to rename the files and organize them through folders. - the ability to download a copy of the other features as well as the transcript. Copying and pasting it works, but doesn't keep the format.

2011 Reply Share Edit Delete Report

Share

Released 10mo ago
#39 in Trending

144,054
1,150
4.4
TurboScribe v2.1

🎯 3 free transcripts every day. 🔥 Unlimited transcription starting at $10/mo.

50,710 turboscribe.ai

Juan Sierra

🙏 131 karma

Aug 9, 2024

@TurboScribe

No other tool quite like this, it's pretty straightforward. Needed to extract a long interview from YouTube and it extracted everything, providing it in different meaningful formats in less than two minutes. Awesome

14843 Reply Share Edit Delete Report

Share

🇺🇸 United States
Released 1y ago
Free + from $10/mo

129,602
1,114
4.3
RambleFix v3.0

⚡ Write by thinking aloud - emails, notes, articles, in your style.

59,757 ramblefix.com

Colin Fitzpatrick

🙏 36 karma

Feb 2, 2024

@RambleFix

This is my favourite, so handy and works brilliant

3717 Reply Share Edit Delete Report

Share

Released 9mo ago
From $7.5/mo

71,000
109
4.6
Paraspeech v1.6

Private dictation, cloud optional

60,363 paraspeech.com

Samik Choudhury

🙏 27 karma

Oct 13, 2025

@Paraspeech

the founder is cool, definitely a recommendation if you have personal requests and grow the app along with you.

314 Reply Share Edit Delete Report

Share

🇩🇪 Germany
Released 1mo ago
Free + from $7.5/mo

65,191
50
5.0

Promote AI Claim AI New release

Reviews

No ratings yet.

★ ★ ★ ★ ★ 0

★ ★ ★ ★ 0

★ ★ ★ 0

★ ★ 0

★ 0

Your rating

★ ★ ★ ★ ★

Attach prompt

Attach result

Post

How would you rate ElevenLabs Scribe?

Help other people by letting them know if this AI was useful.

Prompts & Results

Title:

Description:

Prompt type:*

Prompt:*

Output type:*

Output:*

Add your own prompts and outputs to help others understand how to use this AI.

Pros and Cons

Pros

Multilingual transcription

Real-time transcription

Supports 90+ languages

API integration

High transcription accuracy

Context-based word transcription

Marked sound events in transcripts

Speaker distinguishing in dialogues

Streaming-first architecture

Precision speech segmentation

Voice activity detection

Content creation: captions, subtitles

Transcript editing

Supports recorded content

Transcript for audio/video

Live processing

Performance benchmarking

Industry-leading latency

Automated keyterm prompting

Dynamic audio tagging

Captures live speech

Enterprise-grade security

Control over data handling

Supports encrypted APIs

Granular team permissions for collaboration

Elevated support for smooth launch

Supports local and cloud configurations

Automated speaker diarization for overlapping conversations

Recognizes diverse accents

Transcribe diverse media formats: MP4, MOV, MP3, WAV

Supports offline processing

Can transcribe difficult audio conditions

Entity timestamps calculation

Effective for social media videos

Supports diverse workflows: API to agents

Supports hands-free typing

Automatic data encryption in transit and at rest

Includes editing tools and collaboration features

SOC 2, HIPAA, and GDPR compliance

Supports accessibility and content repurposing

Handled through encrypted APIs

Sensitive information processed locally

Auto-generation of captions and subtitles

Industry-leading accuracy across 90+ languages

Sub-150 ms latency

View 40 more pros

Cons

No offline support

Doesn't support all languages

No free tier

Context-based transcription inconsistencies

Possibly high latency

Language support varies by accuracy

Complex API integration

View 2 more cons

Q&A

What is the main functionality of ElevenLabs Speech to Text Scribe?

ElevenLabs Speech to Text Scribe's main functionality is to convert speech into text across multiple contexts and languages. It does this with high accuracy and offers two primary models: Scribe v2 for transcribing audio and video content, and Scribe v2 Realtime for immediate transcription of live applications.

What is the difference between Scribe v2 and Scribe v2 Realtime?

Scribe v2 focuses on transcribing audio and video content into text. It is ideal for creating captions, subtitles, editable transcripts, labeling speakers, and marking sound events in transcripts. On the other hand, Scribe v2 Realtime is designed for real-time applications like live calls, meetings, or AI agents requiring immediate transcription. It employs a streaming-first architecture for instantaneous results.

How accurate is the transcription offered by the Scribe models?

The Scribe models offer exceptional transcription accuracy. Scribe v2 has been benchmarked as achieving industry-leading precision, outperforming other models in challenging audio conditions and across diverse accents. Scribe v2 Realtime delivers real-time results with the same high level of accuracy.

How does Scribe handle multiple speakers in a conversation?

Scribe features speaker distinguishing functionality that allows it to accurately identify and label every speaker in a dialogue. This feature works even in situations where there are multiple overlapping speakers, making Scribe highly suited for group conversations and discussions.

How many languages does ElevenLabs Speech to Text Scribe support?

ElevenLabs Speech to Text Scribe supports over 90 languages. These include but are not limited to: English, German, French, Japanese, Russian, Korean, Chinese, and more. This makes it a highly versatile tool for applications requiring multilingual transcription.

Can Scribe be integrated into my products?

Yes, both versions of Scribe can be incorporated into your products through the provided API. This allows you to fully integrate Scribe’s functionalities into your workflows and procedures for a seamless user experience.

+ Show 14 more

How does Scribe handle real-time applications?

Scribe v2 Realtime handles real-time applications by leveraging a streaming-first architecture. This allows it to provide instant transcription while maintaining high levels of accuracy. Scribe v2 Realtime is specifically designed for live applications such as meetings, live calls, or AI agents requiring immediate transcription.

What is the 'streaming-first' architecture?

The 'streaming-first' architecture refers to the system architecture employed by Scribe v2 Realtime. It processes speech data as it is streamed, enabling it to provide instantaneous transcription. This real-time processing is particularly valuable in live applications such as calls or meetings.

What is Scribe's precision speech segmentation feature?

Precision speech segmentation is an advanced feature of Scribe that allows smoother processing of live speech data. By detecting when speech starts and stops, it divides continuous speech into segmented blocks for more accurate and effective transcription.

Can Scribe distinguish and label different speakers?

Yes, one of the most beneficial features of Scribe is its ability to distinguish and label different speakers in a conversation. This comes in handy in situations like meetings, discussions, or dialogues involving multiple speakers.

What is voice activity detection in the context of Scribe?

Voice activity detection is a feature in Scribe that identifies and segregates vocal and non-vocal segments of audio. It can differentiate between speech and non-speech elements, ensuring only relevant audio data is transcribed.

How does Scribe handle the transcription of specific words based on context?

Scribe has an intelligent capability to transcribe specific words accurately based on their context. This helps in situations where certain words have different meanings in different settings. By understanding context, Scribe can identify and transcribe these words with high precision.

What is the relevance of the marked sound events feature?

Marked sound events feature refers to Scribe's ability to tag every sound event in a transcript. This ability enriches transcripts with full context, providing greater depth and accuracy in deciphering the original audio context.

Can Scribe be used for creating subtitles and captions?

Yes, Scribe is an excellent tool for creating subtitles and captions for video content. Its high-quality transcription enables producers to make their content more accessible and engage a larger audience. The feature can transcribe in different languages and has the ability to transcribe specific words based on context.

What types of recorded content can be transcribed using Scribe?

Scribe can transcribe various forms of recorded content. This can be any form of audio or video, like podcasts, videos, interviews, etc. It is particularly handy in generating editable transcripts, captions, and subtitles, making Scribe very suitable for content creators and service providers.

What aspects allow Scribe to maintain accuracy?

Scribe maintains its high accuracy through a combination of key features: context-based transcription, precision speech segmentation, and dynamic audio tagging improve its understanding and rendition of spoken content. Additionally, its voice activity detection feature helps in recognising and transcribing relevant speech data.

What are the use-cases for Scribe v2 Realtime?

Scribe v2 Realtime is ideal for use-cases that require immediate understanding and response. Live calls, meetings, and AI agents that need to comprehend and act on spoken inputs in real-time can significantly benefit from using Scribe v2 Realtime.

What is the role of API in using Scribe?

APIs play a significant role in utilizing Scribe. Using the provided API, you can integrate Scribe's features into your own products, making it an integral part of your operations. You can leverage Scribe's capabilities in consistent harmony with your existing workflows and product architecture.

How does Scribe handle multilingual transcription?

Scribe expertly handles multilingual transcription by supporting over 90 languages. No matter the accent, dialect, or recording conditions, it remains exceptionally accurate, enriching your multilingual content and ensuring it reaches a wider audience.

How can Scribe assist in real-time applications?

In real-time applications, Scribe v2 Realtime provides immediate transcription, making it highly valuable in situations where live speech has to be converted into text instantly. Its ability to detect voice activity, segment and process live speech data, and provide real-time results, make it great for real-time apps such as live calls, meetings, webinars, etc.

Ask a question

Submit

#2517 1 0

Search

ElevenLabs Scribev2

Overview

Supported features

Releases

Pricing

Other tools by Eleven Labs

Top alternatives

Related topics

Reviews

How would you rate ElevenLabs Scribe?

Prompts & Results

Pros and Cons

Pros

View 40 more pros

Cons

View 2 more cons

Q&A

Go to section

Search

Overview

Supported features

Releases

Pricing

Other tools by Eleven Labs

Top alternatives

Related topics

Reviews

How would you rate ElevenLabs Scribe?

Prompts & Results

Pros and Cons

Pros

View 40 more pros

Cons

View 2 more cons

Q&A

Help

People also viewed

Feedback and Incident Report

AI Options

Create AI Tools

Mini Tool

Vibe code an AI Tool