ElevenLabs Scribe: The World's Most Accurate Speech-to-Text Model

ElevenLabs has unveiled Scribe, a groundbreaking speech-to-text model that sets a new standard for accuracy in transcription. Outperforming industry giants like Google's Gemini 2.0 Flash and OpenAI's Whisper v3, Scribe supports 99 languages and achieves over 95% accuracy in more than 25 languages, including English, Italian, and Spanish.

What sets Scribe apart is its ability to handle low-resource languages like Serbian, Cantonese, and Malayalam, which have often been overlooked by other transcription models. This makes high-quality transcription accessible to a more global audience.

Scribe also boasts advanced features such as multi-speaker labeling, word-level timestamps, and the ability to detect non-verbal audio cues like laughter or music. These capabilities make it ideal for applications like flawless subtitles, searchable podcast archives, and real-time transcription.

Priced at $0.40 per hour for pre-recorded audio, Scribe is an affordable solution for businesses and creators. A low-latency version for real-time applications is also in the pipeline, further expanding its usability.

With its unmatched accuracy and focus on real-world audio challenges, Scribe is poised to revolutionize the speech-to-text industry, offering unparalleled transcription quality for both widely spoken and underrepresented languages.