Google Releases WAXAL Open Dataset for African Speech Technology
Google has introduced WAXAL, an open dataset designed to advance speech technology for African languages. The collection includes more than 11,000 hours of audio across 21 languages such as Yoruba, Hausa, Luganda, and Swahili. It also provides 1,250 hours of transcribed recordings for speech recognition and over 20 hours of studio-quality audio for text-to-speech applications.
Developed over three years, the project was led by Google Research Africa in collaboration with regional partners. Makerere University and University of Ghana oversaw data collection for 13 languages, while Digital Umuganda in Rwanda coordinated efforts for five others. Local media and academic organizations contributed to recording and validating the data.
The dataset was built using community-driven methods, capturing natural speech through picture descriptions in native languages and professional studio recordings. Each partner retains ownership of its collected data under the project’s collaborative framework.
WAXAL is available under an open license and can be accessed on Hugging Face. The dataset is intended to support research and the development of inclusive voice technologies while helping preserve linguistic diversity across Africa.
We hope you enjoyed this article.
Consider subscribing to one of our newsletters like Daily AI Brief.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
2025 Generative AI in Professional Services Report
This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.
Read more