OpenAI Releases Open-Weight Safety Reasoning Models for Developers

October 29, 2025

OpenAI has released gpt-oss-safeguard, two open-weight models designed for customizable safety classification. The models allow developers to apply their own policies during inference, offering flexibility and transparency in content moderation.

OpenAI has released a research preview of gpt-oss-safeguard, a pair of open-weight reasoning models for safety classification tasks, announced on its website. The models—gpt-oss-safeguard-120b and gpt-oss-safeguard-20b—are fine-tuned versions of the gpt-oss open models and are distributed under the Apache 2.0 license for free use and modification.

The gpt-oss-safeguard models use reasoning to interpret developer-defined safety policies at inference time, classifying user messages and chat content based on those policies. This allows developers to adjust or replace policies without retraining the model. The models also provide a chain-of-thought output, enabling developers to review the reasoning behind each classification.

According to OpenAI, this approach differs from traditional safety classifiers that rely on large labeled datasets. Instead, developers supply their own policy text, and the model generalizes from it to produce explainable results. The models are designed for use cases where safety policies must evolve quickly or where labeled data is limited.

The release was developed in collaboration with ROOST, which is launching a model community to support open safety research. The models can be downloaded from Hugging Face, and OpenAI has also published a technical report detailing performance evaluations against internal and external benchmarks. Early testing with partners such as SafetyKit, ROOST, and Discord helped refine the tools for community use.

With gpt-oss-safeguard, OpenAI extends its internal Safety Reasoner framework to the public, offering a flexible, reasoning-based method for developers to define and enforce their own safety boundaries.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Policy Brief or Daily AI Brief.

Also, consider following us on social media:

AI Safety & Regulation AI Brief AI Brief (X)

More from: AI Safety

10/26

OpenAI Backs Valthos in $30 Million Biosecurity Funding Round

10/20

ESMO Issues First Guidelines for Safe Use of AI Language Models in Oncology

10/20

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

10/11

Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models

10/11

OpenAI Reports 30% Reduction in Political Bias in GPT-5 Models

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

OpenAI Releases Open-Weight Safety Reasoning Models for Developers

We hope you enjoyed this article.

More from: AI Safety

Subscribe to AI Policy Brief

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

OpenAI Reports 30% Reduction in Political Bias in GPT-5 Models

Armada and OpenAI Partner to Deliver Industry-Specific Edge AI Models

OpenAI to Allow Erotica in ChatGPT for Verified Adults in December

NeuralTrust Reports First Signs of Self-Fixing AI Behavior

BrainFreeze Introduces AI Safety Guardrails for K-12 Education

AI21 Labs Releases Jamba Reasoning 3B for On-Device AI Workloads

Microsoft Releases Open-Source Benchmark for AI Cybersecurity Agents

China’s Emerging AI Regulation Aims to Balance Safety and Openness

OpenAI's Revenue Growth and Anthropic's Claude Sonnet 4.5 Launch

California Mandates Transparency for Large AI Models

Reflection AI Raises $2 Billion to Build Open Frontier Models

Open Navigation and 3Laws Partner to Integrate Safety Layer with Nav2 Platform