ElevenLabs Launches Scribe, a New Speech-to-Text Model
ElevenLabs has launched its first stand-alone speech-to-text model, Scribe, which supports over 99 languages. The model aims to compete with existing solutions like OpenAI's Whisper and Google Gemini 2.0 Flash, according to a social media post by ElevenLabs. Scribe is designed to provide high accuracy, with over 25 languages achieving a word error rate of less than 5%.
The model includes features such as speaker diarization, character-level timestamps, and the ability to auto-tag non-speech events like laughter. Currently, Scribe is available for pre-recorded audio formats, with a real-time version expected soon. ElevenLabs is offering Scribe at a competitive rate of $0.40 per hour of transcribed audio, with a 50% discount available for the next six weeks.
Scribe's introduction marks ElevenLabs' expansion into the speech detection market, leveraging its expertise in audio generation to enhance speech-to-text capabilities. The company plans to further develop the model to support real-time applications, broadening its use cases beyond pre-recorded audio.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following our LinkedIn page AI Brief.
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates