NVIDIA Releases Open-Source Parakeet-TDT-0.6B-v2 for Fast Speech Recognition

NVIDIA has launched Parakeet-TDT-0.6B-v2, an open-source automatic speech recognition model available on Hugging Face, capable of transcribing an hour of audio in just one second.

NVIDIA has launched Parakeet-TDT-0.6B-v2, a new open-source automatic speech recognition (ASR) model, now available on Hugging Face. The model, which boasts 600 million parameters, can transcribe 60 minutes of audio in just one second, setting a new benchmark in the AI speech industry with its exceptional speed and performance.

Parakeet-TDT-0.6B-v2 is released under the Creative Commons CC-BY-4.0 license, allowing developers and businesses to utilize it freely. This model uses a combination of Fast Conformer encoder and TDT decoder architecture, optimized for NVIDIA GPUs, and supports hardware like A100, H100, T4, and V100 boards. It also performs well on systems with as little as 2GB of RAM, enabling broad deployment across various enterprises.

The model achieves a Word Error Rate (WER) of 6.05% on the Hugging Face Open ASR Leaderboard, outperforming many existing open ASR models. It supports features such as punctuation, capitalization, and detailed word-level timestamping, making it suitable for applications like transcription services, voice assistants, and conversational AI platforms.

Developers can access Parakeet-TDT-0.6B-v2 via Hugging Face or through NVIDIA's NeMo toolkit, with installation instructions and integration guidance readily available to facilitate experimentation and deployment.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Sponsored

AI Token Webinar

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Read more