OpenAI Releases Open-Weight Safety Reasoning Models for Developers
OpenAI has released a research preview of gpt-oss-safeguard, a pair of open-weight reasoning models for safety classification tasks, announced on its website. The models—gpt-oss-safeguard-120b and gpt-oss-safeguard-20b—are fine-tuned versions of the gpt-oss open models and are distributed under the Apache 2.0 license for free use and modification.
The gpt-oss-safeguard models use reasoning to interpret developer-defined safety policies at inference time, classifying user messages and chat content based on those policies. This allows developers to adjust or replace policies without retraining the model. The models also provide a chain-of-thought output, enabling developers to review the reasoning behind each classification.
According to OpenAI, this approach differs from traditional safety classifiers that rely on large labeled datasets. Instead, developers supply their own policy text, and the model generalizes from it to produce explainable results. The models are designed for use cases where safety policies must evolve quickly or where labeled data is limited.
The release was developed in collaboration with ROOST, which is launching a model community to support open safety research. The models can be downloaded from Hugging Face, and OpenAI has also published a technical report detailing performance evaluations against internal and external benchmarks. Early testing with partners such as SafetyKit, ROOST, and Discord helped refine the tools for community use.
With gpt-oss-safeguard, OpenAI extends its internal Safety Reasoner framework to the public, offering a flexible, reasoning-based method for developers to define and enforce their own safety boundaries.
We hope you enjoyed this article.
Consider subscribing to one of our newsletters like AI Policy Brief or Daily AI Brief.
Also, consider following us on social media:
More from: AI Safety
Subscribe to AI Policy Brief
Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.
Market report
AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation
The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.
Read more