Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

October 11, 2025
Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, found that injecting as few as 250 malicious documents into a model’s training data can create a backdoor vulnerability, regardless of model size.

Anthropic, in collaboration with the UK Government's AI Security Institute and the Alan Turing Institute, has found that large language models can be backdoored with a surprisingly small amount of poisoned data, according to a research paper published on Anthropic’s website.

The study shows that adding just 250 malicious documents—roughly 0.00016% of total training data—can trigger backdoor behaviors in models ranging from 600 million to 13 billion parameters. The attack used a trigger phrase, “,” which caused the affected models to output gibberish text. Researchers observed that model size and total training data volume had no effect on the attack’s success.

The team trained 72 models across different configurations to confirm that poisoning success depends on the absolute number of poisoned samples rather than the proportion of the dataset. Even models trained on twenty times more clean data were equally vulnerable once they encountered the same number of malicious documents.

Anthropic’s researchers said the findings challenge common assumptions about data poisoning, suggesting that attackers may not need large-scale data access to compromise models. The team shared the results to encourage further research into scalable defenses against such vulnerabilities.

We hope you enjoyed this article.

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 State of Data Security Report: Quantifying AI’s Impact on Data Risk

Varonis Systems, Inc.

The 2025 State of Data Security Report by Varonis analyzes the impact of AI on data security across 1,000 IT environments. It highlights critical vulnerabilities such as exposed sensitive cloud data, ghost users, and unsanctioned AI applications. The report emphasizes the need for robust data governance and security measures to mitigate AI-related risks.

Read more