ElevenLabs Launches Scribe, a New Speech-to-Text Model

ElevenLabs has launched its first stand-alone speech-to-text model, Scribe, which supports over 99 languages. The model aims to compete with existing solutions like OpenAI's Whisper and Google Gemini 2.0 Flash, according to a social media post by ElevenLabs. Scribe is designed to provide high accuracy, with over 25 languages achieving a word error rate of less than 5%.

The model includes features such as speaker diarization, character-level timestamps, and the ability to auto-tag non-speech events like laughter. Currently, Scribe is available for pre-recorded audio formats, with a real-time version expected soon. ElevenLabs is offering Scribe at a competitive rate of $0.40 per hour of transcribed audio, with a 50% discount available for the next six weeks.

Scribe's introduction marks ElevenLabs' expansion into the speech detection market, leveraging its expertise in audio generation to enhance speech-to-text capabilities. The company plans to further develop the model to support real-time applications, broadening its use cases beyond pre-recorded audio.

Categories

Companies

Resources

ElevenLabs Launches Scribe, a New Speech-to-Text Model

We hope you enjoyed this article.

Subscribe to Daily AI Brief

Market report

2025 Generative AI in Professional Services Report

You May Also Like

Rime Raises $24 Million Series A for Speech to Speech Enterprise AI

Fish Audio Raises $52M Seed Round for AI Voice Models

Celeris Releases Celeris-1 Diffusion Language Model

Smallest.ai Raises $13 Million Series A for Enterprise Voice AI