ARC-AGI-2 Test Challenges AI Models with New Benchmarks
The Arc Prize Foundation has launched ARC-AGI-2, a new benchmark test aimed at evaluating the general intelligence of AI models. Announced on their website, the test presents a series of puzzle-like problems that require AI to identify visual patterns and generate correct answers, challenging models to adapt to new problems they haven't encountered before.
Current AI models, including OpenAI's o1-pro and DeepSeek's R1, have scored between 1% and 1.3% on the ARC-AGI-2 test, while non-reasoning models like GPT-4.5 and Claude 3.7 Sonnet scored around 1%. In contrast, human participants averaged a 60% success rate, highlighting the difficulty AI systems face with this new benchmark.
ARC-AGI-2 aims to address the limitations of its predecessor, ARC-AGI-1, by introducing a new metric of efficiency and requiring models to interpret patterns dynamically rather than relying on memorization. The test is designed to measure not only the capability of AI systems to solve tasks but also the efficiency and cost-effectiveness of their solutions.
The Arc Prize Foundation has also announced the ARC Prize 2025 contest, encouraging developers to achieve 85% accuracy on the ARC-AGI-2 test while maintaining a cost of $0.42 per task. This initiative aims to drive open-source progress in developing highly efficient, general AI systems.We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following our LinkedIn page AI Brief.
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates