Pearl Finds AI Models Match Expert Judgment Only 70% of the Time

May 14, 2026
Pearl Enterprise evaluated 25 AI models from OpenAI, Anthropic, Google DeepMind, Microsoft, and others, finding that even top systems align with expert judgment only about 70% of the time, with performance dropping to as low as 20% in some domains.

Leading AI models from OpenAI, Anthropic, Google DeepMind, Microsoft, and other developers align with expert judgment only about 70% of the time, according to a press release from Pearl Enterprise. The company evaluated 25 models using more than 500 professional questions across five domains: business, health, law, pets, and technology.

Pearl’s Expert Alignment Leaderboard found that OpenAI’s GPT 5.5 led with 72.7% expert alignment, followed closely by GPT 5 at 72.5%, GPT 5.1 at 72.0%, and Anthropic’s Claude Opus 4.7 at 71.9%. No model exceeded 73% overall alignment, indicating that current systems may be converging below expert-level performance.

Performance varied sharply by domain. Top scores reached 80.9% in business, but some widely used models dropped to around 20% in areas such as law and health. Pearl also noted that increasing reasoning depth improved results by only up to 2.6 percentage points, and in some cases reduced response quality.

Each model received identical prompts and was scored on correctness, completeness, prioritization, and professional judgment. Pearl stated that the dataset used for evaluation was not previously available to model developers. The full leaderboard is available on Pearl’s website.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Policy Brief, Legal AI Weekly or Daily AI Brief.

Also, consider following us on social media:

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Read more