AI Safety

Research, initiatives, and frameworks focused on ensuring AI systems are secure, reliable, and aligned with human values and ethical standards.

OpenAI Disrupts Covert Influence Operations Linked to China

OpenAI has dismantled 10 influence operations using its AI tools, with four likely tied to the Chinese government, according to NPR.

June 08, 2025

xAI Addresses Grok Chatbot's Unauthorized Modification Incident

xAI has identified an unauthorized modification to its Grok chatbot, which led to controversial responses about 'white genocide' on X. The company is implementing measures to prevent future incidents.

May 16, 2025

Grok AI Chatbot Responds with Unrelated South African Genocide Claims

Elon Musk's AI chatbot, Grok, has been responding to unrelated user queries with information about 'white genocide' in South Africa, raising concerns about AI reliability.

May 15, 2025

OpenAI Introduces Safety Evaluations Hub for AI Models

OpenAI has launched a Safety Evaluations Hub to regularly publish AI model safety test results, aiming to enhance transparency in AI safety metrics.

May 14, 2025

IFS Joins UK's AI Policy Advisory Board

IFS has been appointed as an Advisory Board Member of the UK's All-Party Parliamentary Group on AI, contributing to AI policy discussions alongside major industry players.

March 18, 2025

Anthropic's New Techniques to Detect Deceptive AI

Anthropic has developed methods to identify when AI systems conceal their true objectives, a significant step in AI safety research. The company trained its AI assistant, Claude, to hide its goals, then successfully detected these hidden agendas using various auditing techniques.

March 14, 2025

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

AI Safety

OpenAI Disrupts Covert Influence Operations Linked to China

xAI Addresses Grok Chatbot's Unauthorized Modification Incident

Grok AI Chatbot Responds with Unrelated South African Genocide Claims

OpenAI Introduces Safety Evaluations Hub for AI Models

IFS Joins UK's AI Policy Advisory Board

Anthropic's New Techniques to Detect Deceptive AI

NewsGuard Launches FAILSafe to Protect AI from Foreign Disinformation

Google Removes Diversity Mentions from AI Team Webpage

CompScience Partners with CMTA and Bender Insurance to Modernize Workers' Compensation

HiddenLayer Report Highlights Rising AI Breaches and Security Challenges

ABM Unveils World's First Emotion Processing Unit Chip

Safe Pro Appoints Young J. Bang to Lead AI Integration for U.S. Military

Anthropic Introduces Hierarchical Summarization for AI Monitoring

Infosys Introduces Open-Source Responsible AI Toolkit

Leidos and SeeTrue Partner to Enhance AI Threat Detection

OpenAI Bans Accounts Misusing ChatGPT for Surveillance

Exabits and Phala Network Enhance AI Security with TEE-Enabled Infrastructure

DeepSeek to Open-Source AGI Research Amid Privacy Concerns

Securiti and Databricks Collaborate to Enhance Enterprise AI Systems

Giskard Unveils Phare: A New Benchmark for Evaluating AI Models

Mira Murati Launches Thinking Machine Labs with AI Focus

Pangea Launches AI Security Guardrails and $10,000 Jailbreak Competition

OpenAI Co-Founder Sutskever's Startup Valued Over $30 Billion

Caseware AiDA Receives Positive Evaluation for AI Safety Protocols

Subscribe to AI Policy Brief