Pearl Finds AI Models Match Expert Judgment Only 70% of the Time

May 14, 2026

Pearl Enterprise evaluated 25 AI models from OpenAI, Anthropic, Google DeepMind, Microsoft, and others, finding that even top systems align with expert judgment only about 70% of the time, with performance dropping to as low as 20% in some domains.

Leading AI models from OpenAI, Anthropic, Google DeepMind, Microsoft, and other developers align with expert judgment only about 70% of the time, according to a press release from Pearl Enterprise. The company evaluated 25 models using more than 500 professional questions across five domains: business, health, law, pets, and technology.

Pearl’s Expert Alignment Leaderboard found that OpenAI’s GPT 5.5 led with 72.7% expert alignment, followed closely by GPT 5 at 72.5%, GPT 5.1 at 72.0%, and Anthropic’s Claude Opus 4.7 at 71.9%. No model exceeded 73% overall alignment, indicating that current systems may be converging below expert-level performance.

Performance varied sharply by domain. Top scores reached 80.9% in business, but some widely used models dropped to around 20% in areas such as law and health. Pearl also noted that increasing reasoning depth improved results by only up to 2.6 percentage points, and in some cases reduced response quality.

Each model received identical prompts and was scored on correctness, completeness, prioritization, and professional judgment. Pearl stated that the dataset used for evaluation was not previously available to model developers. The full leaderboard is available on Pearl’s website.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Legal AI Weekly, AI Policy Brief or Daily AI Brief.

Also, consider following us on social media:

Legal AI AI Safety & Regulation AI Brief AI Brief (X)

More from: Legal AI

06/25

Casepoint Wins $98.8 Million Department of War Agreement for AI Legal Platform

06/23

DocPro Expands Ask.Legal with English Law Analytics

06/23

BARBRI Acquires Lega to Expand AI Learning for Legal Professionals

06/16

HaystackID Introduces COMET Compliance Oversight Solution at LegalTechTalk 2026

06/16

Dialogica Launches Voice-First Legal AI Platform Dia

More from: AI Safety

06/26

PersonaShield Launches Platform for Creator Likeness Control in AI Era

06/24

Grow Therapy and Stanford Partner on AI Safety Standards for Mental Health

06/23

FORT Robotics Joins NVIDIA Halos for Robotics to Expand Physical AI Safety

06/05

Toyota CSRC Launches 10 New AI-Driven Safety Projects with MIT, Michigan, Purdue, and UVA

06/02

Lockton and Nexar Introduce Human Benchmark for Autonomous Vehicle Safety

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Categories

Companies

Resources

Pearl Finds AI Models Match Expert Judgment Only 70% of the Time

We hope you enjoyed this article.

More from: Legal AI

More from: AI Safety

Subscribe to AI Policy Brief

Market report

2025 Generative AI in Professional Services Report

You May Also Like

DMind AI Study Finds No AI Model Ready for Web3 Safety Tasks

5WPR Publishes IPO AI Visibility Index for 2026

OpenAI Begins Limited Preview of GPT-5.6 Series with Sol, Terra, and Luna Models

AI or Not Audit Finds AI Models Can Create Realistic Fake IDs

Microsoft Introduces Seven New MAI Models and Announces Healthcare Collaboration with Mayo Clinic

Z.ai Releases GLM 5.2 Open Model with 1M Context and MIT License

Bluefish Introduces Agentic Campaigns for AI Marketing Optimization

AI Revenue Growth Begins to Match Massive Data Center Spending

Patronus AI Raises $50 Million and Introduces Digital World Models for AI Agent Training

SEI and Accenture Release AI Adoption Maturity Model

Sail Research Raises $80 Million to Build Infrastructure for Long-Horizon AI Agents

Brandi AI Introduces Sentiment Hub for Tracking Brand Positioning in AI Answers