Meta's Maverick AI Model Falls Short in Benchmark Rankings

Meta's Llama 4 Maverick AI model ranks below competitors on the LM Arena benchmark, following controversy over the use of an experimental version.

Meta Platforms, Inc. has faced scrutiny after its Llama 4 Maverick AI model ranked below competitors on the LM Arena benchmark. This follows a controversy where Meta used an experimental version of the model to achieve a high score, prompting LM Arena to revise its policies and evaluate the unmodified version. The vanilla Maverick model, known as 'Llama-4-Maverick-17B-128E-Instruct,' was placed below models like OpenAI's GPT-4o and Google's Gemini 1.5 Pro.

Meta's experimental version, 'Llama-4-Maverick-03-26-Experimental,' was optimized for conversational tasks, which initially led to its high ranking. However, the unmodified version's performance was less competitive, ranking 32nd on the benchmark. A Meta spokesperson stated that the company experiments with various custom variants and is eager to see how developers will utilize the open-source version of Llama 4.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Read more