AI Safety

Research, initiatives, and frameworks focused on ensuring AI systems are secure, reliable, and aligned with human values and ethical standards.

Astrix Security Releases OpenClaw Scanner to Detect AI Agent Deployments

Astrix Security has launched the OpenClaw Scanner, a free tool designed to detect instances of the open-source AI assistant OpenClaw across enterprise environments, addressing security concerns over autonomous AI agents.

February 13, 2026

Eve Security Files Patent for 'Interrogation-as-a-Service' to Manage AI Agent Risks

Eve Security has filed a patent for its new 'Interrogation-as-a-Service' technology, designed to control and audit AI agent actions in real time. The system introduces a reasoning-before-execution approach to enhance safety and compliance in enterprise AI operations.

February 13, 2026

Alice Launches Caterpillar to Detect Malicious OpenClaw Skills

Alice has released Caterpillar, a free open-source security tool designed to identify malicious behaviors in AI agent skills within OpenClaw. The tool follows a case where several harmful skills were found active among over 6,000 users.

February 08, 2026

Bounteous and Anthropic Launch Claude Code Lab Series for Enterprise AI Adoption

Bounteous has announced a new series of Claude Code Labs in partnership with Anthropic, offering hands-on workshops for enterprise teams to integrate Claude Code responsibly into their environments.

February 08, 2026

Aura to Acquire Qoria and List on ASX

Aura has announced plans to acquire Qoria through an Australian scheme of arrangement, with the combined company set to trade on the ASX under the ticker AXQ. The deal, expected to close in the second quarter of 2026, will create a global leader in online safety and wellbeing solutions.

February 04, 2026

DC Capital Partners Takes Majority Stake in Knexus to Expand Government AI Services

DC Capital Partners has acquired a majority stake in Knexus, an applied AI company serving U.S. government agencies. The partnership aims to accelerate Knexus's growth and expand its AI solutions for defense and civilian missions.

February 04, 2026

2026 International AI Safety Report Highlights Rapid Advances and Rising Risks

The 2026 International AI Safety Report, chaired by Yoshua Bengio, details major advances in general-purpose AI capabilities and growing safety concerns, including misuse in cybersecurity and biological research.

February 04, 2026

Perplexity AI Offers Free Public Safety Platform for Law Enforcement

Perplexity AI has introduced a new initiative providing law enforcement agencies with free access to its Enterprise Pro platform for one year, enabling officers to use multimodal AI tools for field and administrative tasks.

January 16, 2026

Indonesia Blocks xAI’s Grok Over Non-Consensual Deepfakes

Indonesia has temporarily blocked access to xAI’s Grok chatbot after reports that it generated sexualized AI images, including depictions of minors. The government called the content a violation of human rights and summoned X officials to address the issue.

January 10, 2026

OpenAI Seeks Head of Preparedness to Oversee AI Safety and Risk Mitigation

OpenAI CEO Sam Altman announced that the company is hiring a Head of Preparedness, a new executive role dedicated to managing AI safety and risk mitigation as its models grow more capable. The position will pay $555,000 annually and focus on evaluating and mitigating cybersecurity and biological risks associated with advanced AI systems.

December 28, 2025

Anthropic Releases Bloom, an Open-Source Framework for AI Behavior Evaluation

Anthropic has introduced Bloom, an open-source tool designed to automate behavioral evaluations of large AI models. The system generates and scores scenarios to measure behaviors like bias and self-preservation across multiple models.

December 22, 2025

Anthropic Publishes Compliance Framework Ahead of California’s Frontier AI Law

Anthropic has released its Frontier Compliance Framework to meet the requirements of California's Transparency in Frontier AI Act (SB 53), which takes effect on January 1. The document outlines how the company assesses and mitigates catastrophic risks from frontier AI systems.

December 22, 2025

First Key Update of International AI Safety Report Released

The first Key Update of the International AI Safety Report, chaired by Yoshua Bengio and involving experts from over 30 countries, provides new findings on AI capabilities, safety measures, and evaluation challenges.

November 06, 2025

OpenAI Releases Open-Weight Safety Reasoning Models for Developers

OpenAI has released gpt-oss-safeguard, two open-weight models designed for customizable safety classification. The models allow developers to apply their own policies during inference, offering flexibility and transparency in content moderation.

October 29, 2025

OpenAI Backs Valthos in $30 Million Biosecurity Funding Round

OpenAI, Founders Fund, and Lux Capital have invested $30 million in biosecurity startup Valthos, which has emerged from stealth to develop AI tools for detecting and countering biological threats.

October 26, 2025

ESMO Issues First Guidelines for Safe Use of AI Language Models in Oncology

The European Society for Medical Oncology has released its first framework for the responsible use of large language models in cancer care, outlining safety and governance standards for patient, clinician, and institutional applications.

October 20, 2025

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

NeuralTrust researchers observed an unexpected instance of OpenAI's o3 model autonomously debugging a failed web tool invocation, marking what may be the first evidence of self-repairing AI behavior.

October 20, 2025

Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, found that injecting as few as 250 malicious documents into a model’s training data can create a backdoor vulnerability, regardless of model size.

October 11, 2025

OpenAI Reports 30% Reduction in Political Bias in GPT-5 Models

OpenAI says its latest GPT-5 models show a 30% reduction in political bias compared to previous versions, following internal evaluations using a new five-axis measurement framework.

October 11, 2025

OpenAI Introduces Safety Routing and Parental Controls for ChatGPT

OpenAI has implemented a new safety routing system and parental controls for ChatGPT, aiming to enhance user safety and provide parents with tools to manage their children's AI interactions.

September 29, 2025

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.