Yandex Unveils World's Largest Dataset for Recommender Systems

Yandex has released Yambda, the world's largest open dataset for recommender systems, featuring nearly 5 billion anonymized user interactions from its music streaming service, Yandex Music.

Yandex has introduced Yambda, the world's largest open dataset for recommender systems, announced in a press release. This dataset includes nearly 5 billion anonymized user interactions from the Yandex Music streaming service, collected over a span of 10 months.

The Yambda dataset is designed to advance research and development in recommender systems by providing a comprehensive benchmark for testing new algorithms. It includes anonymized audio embeddings, interaction flags, and precise timestamps, allowing for detailed real-world behavioral analysis. The dataset is available in three sizes—5 billion, 500 million, and 50 million events—to cater to various research and development needs.

Yambda employs a Global Temporal Split (GTS) evaluation method, which preserves event sequences and provides a realistic testing environment for models. This approach is complemented by baseline algorithms such as MostPop, DecayPop, and ItemKNN, offering reference points for new developments in the field.

The dataset is accessible on Hugging Face, facilitating its use by researchers and startups to innovate and build advanced recommender systems tailored to diverse business needs.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Whitepaper

Stanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

The 2025 AI Index by Stanford HAI provides a comprehensive overview of the global state of artificial intelligence, highlighting significant advancements in AI capabilities, investment, and regulation. The report details improvements in AI performance, increased adoption in various sectors, and the growing global optimism towards AI, despite ongoing challenges in reasoning and trust. It serves as a critical resource for policymakers, researchers, and industry leaders to understand AI's rapid evolution and its implications.

Read more