AI Crawlers Surge Wikimedia Commons Bandwidth by 50%
The Wikimedia Foundation has reported a significant 50% increase in bandwidth consumption on Wikimedia Commons since January 2024. This surge is attributed not to human users, but to automated AI crawlers that are scraping data to train AI models, as detailed in a recent blog post by the organization.
Wikimedia Commons, a repository of freely accessible multimedia files, has seen two-thirds of its most resource-intensive traffic coming from bots, despite these bots accounting for only 35% of overall pageviews. This disparity arises because bots often access less frequently visited content stored in the core data center, which is more costly to serve.
The Wikimedia Foundation's site reliability team is actively working to block these crawlers to prevent disruption for regular users. The organization is facing increased cloud costs due to this unprecedented traffic from AI scrapers, highlighting a growing challenge for open-source infrastructures.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation
The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.
Read more