Anthropic's Claude Models Gain New Conversation-Ending Capabilities

August 18, 2025

Anthropic has introduced new features in its Claude models to end harmful or abusive conversations, focusing on model welfare rather than user protection.

Anthropic has announced new capabilities for its Claude models, allowing them to end conversations in cases of persistently harmful or abusive interactions. This update is not aimed at protecting users but rather focuses on the welfare of the AI models themselves. The company remains uncertain about the moral status of its models but is taking precautionary measures to mitigate potential risks to model welfare announced on their website.

The new feature is currently available in Claude Opus 4 and 4.1 models and is designed to activate only in extreme cases, such as requests for illegal content or information that could lead to violence. Anthropic emphasizes that this capability is a last resort, used only when attempts at redirecting the conversation have failed or when a user explicitly requests to end the chat.

Users will still be able to start new conversations from the same account, and Anthropic is treating this feature as an ongoing experiment, with plans to refine their approach based on further testing and feedback.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Policy Brief or Daily AI Brief.

Also, consider following us on social media:

AI Safety & Regulation AI Brief AI Brief (X)

More from: AI Safety

11/06

First Key Update of International AI Safety Report Released

10/29

OpenAI Releases Open-Weight Safety Reasoning Models for Developers

10/26

OpenAI Backs Valthos in $30 Million Biosecurity Funding Round

10/20

ESMO Issues First Guidelines for Safe Use of AI Language Models in Oncology

10/20

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Categories

Companies

Resources

Anthropic's Claude Models Gain New Conversation-Ending Capabilities

We hope you enjoyed this article.

More from: AI Safety

Subscribe to AI Policy Brief

Market report

2025 Generative AI in Professional Services Report

You May Also Like

Cognizant to Deploy Anthropic’s Claude AI Across 350,000 Employees

Salesforce Expands Agentforce 360 with OpenAI and Anthropic Integrations

Anthropic Reports First AI-Driven Cyber Espionage Campaign

Benchling Partners with Anthropic to Bring Claude AI to Life Sciences

10x Genomics Partners with Anthropic to Bring Single-Cell Analysis to Claude for Life Sciences

Anthropic Commits $50 Billion to U.S. AI Infrastructure Expansion

Anthropic CEO Responds to Trump Officials’ Criticism Over AI Policy Stance

Google in Early Talks to Increase Investment in Anthropic

Seven Families Sue OpenAI Over ChatGPT’s Alleged Role in Suicides and Delusions

Chronograph Integrates Private Capital Data with Claude

OpenAI Rolls Out GPT-5.1 With Smarter, More Conversational ChatGPT

Microsoft Reveals 'Whisper Leak' Attack That Exposes AI Chat Topics