Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

October 11, 2025

Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, found that injecting as few as 250 malicious documents into a model’s training data can create a backdoor vulnerability, regardless of model size.

Anthropic, in collaboration with the UK Government's AI Security Institute and the Alan Turing Institute, has found that large language models can be backdoored with a surprisingly small amount of poisoned data, according to a research paper published on Anthropic’s website.

The study shows that adding just 250 malicious documents—roughly 0.00016% of total training data—can trigger backdoor behaviors in models ranging from 600 million to 13 billion parameters. The attack used a trigger phrase, “,” which caused the affected models to output gibberish text. Researchers observed that model size and total training data volume had no effect on the attack’s success.

The team trained 72 models across different configurations to confirm that poisoning success depends on the absolute number of poisoned samples rather than the proportion of the dataset. Even models trained on twenty times more clean data were equally vulnerable once they encountered the same number of malicious documents.

Anthropic’s researchers said the findings challenge common assumptions about data poisoning, suggesting that attackers may not need large-scale data access to compromise models. The team shared the results to encourage further research into scalable defenses against such vulnerabilities.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Policy Brief, Cybersecurity AI Weekly or Daily AI Brief.

Also, consider following us on social media:

AI Safety & Regulation Cybersecurity AI AI Brief AI Brief (X)

More from: AI Safety

11/06

First Key Update of International AI Safety Report Released

10/29

OpenAI Releases Open-Weight Safety Reasoning Models for Developers

10/26

OpenAI Backs Valthos in $30 Million Biosecurity Funding Round

10/20

ESMO Issues First Guidelines for Safe Use of AI Language Models in Oncology

10/20

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

More from: Cybersecurity

12/01

Intezer Launches Forensic AI SOC for Enterprise Security Operations

12/01

Aramco Ventures to Open Paris Office for European AI Investments

11/24

Nudge Security Raises $22.5 Million Series A to Expand AI and SaaS Security Platform

11/24

Crimson Vista Partners with PointerTech to Integrate Vista Tier-XSM into MSP Offerings

11/24

Twenty Raises $38 Million to Expand Cyber Warfare Systems

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 State of Data Security Report: Quantifying AI’s Impact on Data Risk

Varonis Systems, Inc.

The 2025 State of Data Security Report by Varonis analyzes the impact of AI on data security across 1,000 IT environments. It highlights critical vulnerabilities such as exposed sensitive cloud data, ghost users, and unsanctioned AI applications. The report emphasizes the need for robust data governance and security measures to mitigate AI-related risks.

Categories

Companies

Resources

Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

We hope you enjoyed this article.

More from: AI Safety

More from: Cybersecurity

Subscribe to AI Policy Brief

Market report

2025 State of Data Security Report: Quantifying AI’s Impact on Data Risk

You May Also Like

Anthropic Reports First AI-Driven Cyber Espionage Campaign

Microsoft Reveals 'Whisper Leak' Attack That Exposes AI Chat Topics

Medint Study Finds AI Misses Key Nuances in Complex Clinical Decisions

UpGuard Report Finds 68% of Security Leaders Use Unauthorized AI Tools

First Key Update of International AI Safety Report Released

Anthropic Commits $50 Billion to U.S. AI Infrastructure Expansion

Lemony Launches Cascadeflow to Cut AI Model Costs by Up to 85%

Judge Orders OpenAI to Hand Over 20 Million ChatGPT Logs in Copyright Case

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model

Mistral AI Releases Mistral 3 Family of Open-Weight Models

Google in Early Talks to Increase Investment in Anthropic

Elsevier Survey Finds Researchers Increasingly Use AI but Lack Governance and Training