
Vector Institute Evaluates Leading AI Models with New Benchmarks
The Vector Institute has unveiled the results of its comprehensive evaluation of 11 leading large language models (LLMs), using 16 performance benchmarks to assess their capabilities, announced in a press release. This study marks the first time both open and closed-source models have been evaluated against such an extensive suite of benchmarks, providing insights into the strengths and limitations of top AI agents.
The evaluation includes models like DeepSeek-R1, Cohere's Command R+, OpenAI's GPT-4o, and Google's Gemini 1.5. The benchmarks cover areas such as general knowledge, coding, and cyber-safety, offering a detailed look at model performance in terms of accuracy, reliability, and fairness. The Vector Institute has shared the results, benchmarks, and underlying code in an open-source format to promote transparency and collaboration in AI development.
Deval Pandya, Vector's Vice President of AI Engineering, emphasized the importance of independent evaluations in understanding AI model performance. The open-source, interactive leaderboard allows researchers, developers, and end-users to verify results and compare model performance, fostering accountability and innovation in AI technology.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Whitepaper
Stanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation
The 2025 AI Index by Stanford HAI provides a comprehensive overview of the global state of artificial intelligence, highlighting significant advancements in AI capabilities, investment, and regulation. The report details improvements in AI performance, increased adoption in various sectors, and the growing global optimism towards AI, despite ongoing challenges in reasoning and trust. It serves as a critical resource for policymakers, researchers, and industry leaders to understand AI's rapid evolution and its implications.
Read more