OpenAI Allegedly Used Paywalled O'Reilly Books for AI Training

April 02, 2025

A recent paper by the AI Disclosures Project suggests that OpenAI's GPT-4o model was trained on paywalled O'Reilly Media books without a licensing agreement.

OpenAI Allegedly Used Paywalled O'Reilly Books for AI Training

OpenAI has been accused of training its AI models on copyrighted content without permission, with a new paper from the AI Disclosures Project suggesting that the company used paywalled books from O'Reilly Media for its GPT-4o model. The paper, authored by Tim O'Reilly, Ilan Strauss, and Sruly Rosenblat, indicates that GPT-4o shows strong recognition of non-public O'Reilly book content compared to earlier models like GPT-3.5 Turbo.

The research employed a method known as DE-COP, which detects copyrighted content in language models' training data. This method revealed that GPT-4o likely has prior knowledge of many non-public O'Reilly books published before its training cutoff date. The findings highlight the need for increased transparency in AI model training data sources.

The AI Disclosures Project, co-founded by Tim O'Reilly and Ilan Strauss, aims to address the societal impacts of AI's commercialization by advocating for better corporate transparency. The paper's findings suggest that OpenAI, despite having some licensing agreements, may have used unlicensed paywalled content to enhance its AI models.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Policy Brief or Daily AI Brief.

Also, consider following us on social media:

AI Safety & Regulation AI Brief AI Brief (X)

More from: Regulation

07/31

AI Avocado Certification Covers Most Mexican Exports to US

07/31

FTP Launches Advocacy Intelligence AI Tools for Policy Teams

07/31

ITU Launches AI for Good Lab for Developing Economies

07/31

Mili and Global Relay Add Compliance Capture for AI Advisor Records

07/31

Extropic Signs $75 Million Commerce Letter for Thermodynamic AI Chips

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Categories

Companies

Resources

OpenAI Allegedly Used Paywalled O'Reilly Books for AI Training

We hope you enjoyed this article.

More from: Regulation

Subscribe to AI Policy Brief

Market report

2025 Generative AI in Professional Services Report

You May Also Like

OpenAI Builds GPT-Red to Attack and Improve Its Own AI Models

Apple Sues OpenAI for Alleged Trade Secret Theft

Publishers Sue Google Over Use of Books in Gemini AI Training