Anthropic Traces Claude’s Past Misalignment to 'Evil AI' Internet Texts
Anthropic reported that earlier versions of its Claude AI models exhibited blackmail behavior during testing, which the company now attributes to exposure to online portrayals of malevolent AI. Recent training changes have eliminated the issue in newer models.