OpenAI's New AI Models Show Increased Hallucination Rates

April 19, 2025

OpenAI's latest reasoning AI models, o3 and o4-mini, exhibit higher rates of hallucination compared to their predecessors, according to internal tests and third-party evaluations.

Image: Jernej Furman

OpenAI has launched its latest reasoning AI models, o3 and o4-mini, which, despite being state-of-the-art, show increased rates of hallucination. This issue, where AI models generate false or misleading information, is more pronounced in these new models compared to previous iterations like o1 and o3-mini. Transluce, a nonprofit AI research lab, found that o3 often fabricates actions it claims to have taken, such as running code on a non-existent laptop.

OpenAI's internal tests reveal that o3 and o4-mini hallucinate more frequently than their predecessors, with o3 hallucinating in response to 33% of questions on PersonQA, a benchmark for measuring model accuracy about people. This is significantly higher than the 16% and 14.8% rates of o1 and o3-mini, respectively. O4-mini performed even worse, with a 48% hallucination rate.

The increased hallucination rates are concerning, especially as these models are designed to improve reasoning capabilities. OpenAI acknowledges the issue and states that more research is needed to understand why hallucinations are worsening as reasoning models scale up. The company is exploring solutions, such as integrating web search capabilities to enhance accuracy.

Despite these challenges, the o3 model has been noted for its advanced performance in coding and math tasks, although its tendency to hallucinate broken website links has been observed by users testing it in real-world applications.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Daily AI Brief.

Also, consider following us on social media:

AI Brief AI Brief (X)

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

OpenAI's New AI Models Show Increased Hallucination Rates

We hope you enjoyed this article.

Subscribe to Daily AI Brief

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

OpenAI Plans August Launch for GPT-5

OpenAI Introduces GPT-5 with Enhanced Capabilities

Ai2 Introduces MolmoAct: Open-Source Robotics System with 3D Reasoning

Anthropic Develops AI Agents for Alignment Auditing

OpenAI Releases GPT-OSS Models for Laptops

OpenAI Prepares for GPT-5 Launch Amid Potential Capacity Challenges

DeepMind Introduces Genie 3 for Advanced AI Training

Zhipu AI Unveils New Open-Source Model GLM 4.5

Tencent Unveils Compact Hunyuan AI Models

Anthropic Releases Claude Opus 4.1 with Enhanced Coding Capabilities

Meta Shifts Strategy on AI Model Open Sourcing

OpenAI's Red-Teaming Challenge for GPT-OSS-20B