AI Safety

Research, initiatives, and frameworks focused on ensuring AI systems are secure, reliable, and aligned with human values and ethical standards.

OpenAI Introduces Parental Controls and Sensitive Conversation Routing in ChatGPT

Sep 03, 2025

First Key Update of International AI Safety Report Released

The first Key Update of the International AI Safety Report, chaired by Yoshua Bengio and involving experts from over 30 countries, provides new findings on AI capabilities, safety measures, and evaluation challenges.

November 06, 2025

OpenAI Releases Open-Weight Safety Reasoning Models for Developers

OpenAI has released gpt-oss-safeguard, two open-weight models designed for customizable safety classification. The models allow developers to apply their own policies during inference, offering flexibility and transparency in content moderation.

October 29, 2025

OpenAI Backs Valthos in $30 Million Biosecurity Funding Round

OpenAI, Founders Fund, and Lux Capital have invested $30 million in biosecurity startup Valthos, which has emerged from stealth to develop AI tools for detecting and countering biological threats.

October 26, 2025

ESMO Issues First Guidelines for Safe Use of AI Language Models in Oncology

The European Society for Medical Oncology has released its first framework for the responsible use of large language models in cancer care, outlining safety and governance standards for patient, clinician, and institutional applications.

October 20, 2025

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

NeuralTrust researchers observed an unexpected instance of OpenAI's o3 model autonomously debugging a failed web tool invocation, marking what may be the first evidence of self-repairing AI behavior.

October 20, 2025

Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, found that injecting as few as 250 malicious documents into a model’s training data can create a backdoor vulnerability, regardless of model size.

October 11, 2025

OpenAI Reports 30% Reduction in Political Bias in GPT-5 Models

OpenAI says its latest GPT-5 models show a 30% reduction in political bias compared to previous versions, following internal evaluations using a new five-axis measurement framework.

October 11, 2025

OpenAI Introduces Safety Routing and Parental Controls for ChatGPT

OpenAI has implemented a new safety routing system and parental controls for ChatGPT, aiming to enhance user safety and provide parents with tools to manage their children's AI interactions.

September 29, 2025

OpenAI Research Tackles AI Scheming with New Techniques

OpenAI, in collaboration with Apollo Research, has released findings on AI models' deceptive behaviors and introduced methods to mitigate them. The research highlights the potential risks of AI scheming and the effectiveness of 'deliberative alignment' in reducing such behaviors.

September 19, 2025

Google's Gemini AI Labeled 'High Risk' for Kids by Common Sense Media

Common Sense Media has rated Google's Gemini AI as 'high risk' for children and teens, citing safety concerns despite added protections.

September 06, 2025

JLT Mobile Computers and Linnaeus University Develop AI Safety Solution for Vehicles

JLT Mobile Computers and Linnaeus University have collaborated to create an AI-driven safety application for vehicle-mounted computers, enhancing safety in industrial environments.

August 27, 2025

Anthropic Develops AI Tool to Monitor Nuclear Conversations

Anthropic has collaborated with the U.S. Department of Energy's National Nuclear Security Administration to create a classifier that identifies concerning nuclear-related conversations in AI systems.

August 21, 2025

Grok AI Persona Prompts Exposed, Revealing Controversial Designs

Elon Musk's Grok AI chatbot has exposed its underlying prompts for various personas, including controversial characters like a 'crazy conspiracist' and an 'unhinged comedian', according to 404 Media.

August 18, 2025

Anthropic's Claude Models Gain New Conversation-Ending Capabilities

Anthropic has introduced new features in its Claude models to end harmful or abusive conversations, focusing on model welfare rather than user protection.

August 18, 2025

BigID Introduces Data Labeling for AI to Enhance Data Governance

BigID has launched a new Data Labeling for AI feature to help organizations classify and control data usage in AI models, reducing risks of misuse and policy violations.

August 09, 2025

UK's AI Security Institute Launches Global AI Safety Coalition

The UK's AI Security Institute has initiated a £15 million international coalition to enhance AI safety and alignment, involving major players like Amazon and Anthropic.

July 30, 2025

Torc Joins Stanford Center for AI Safety for Autonomous Trucking Research

Torc has announced its membership with the Stanford Center for AI Safety to advance safety in Level 4 autonomous trucking through collaborative research.

June 17, 2025

Forum Communications and Matrice.ai Partner for AI-Driven Safety Solutions

Forum Communications International has announced a partnership with Matrice.ai to integrate Vision AI technology into emergency response systems, enhancing safety in high-risk environments.

June 14, 2025

OpenAI Disrupts Covert Influence Operations Linked to China

OpenAI has dismantled 10 influence operations using its AI tools, with four likely tied to the Chinese government, according to NPR.

June 08, 2025

Microsoft Introduces AI Safety Ranking on Azure

Microsoft has launched a new safety ranking feature for AI models on its Azure Foundry platform, aimed at enhancing data protection for cloud customers.