
OpenAI Allegedly Used Paywalled O'Reilly Books for AI Training
OpenAI has been accused of training its AI models on copyrighted content without permission, with a new paper from the AI Disclosures Project suggesting that the company used paywalled books from O'Reilly Media for its GPT-4o model. The paper, authored by Tim O'Reilly, Ilan Strauss, and Sruly Rosenblat, indicates that GPT-4o shows strong recognition of non-public O'Reilly book content compared to earlier models like GPT-3.5 Turbo.
The research employed a method known as DE-COP, which detects copyrighted content in language models' training data. This method revealed that GPT-4o likely has prior knowledge of many non-public O'Reilly books published before its training cutoff date. The findings highlight the need for increased transparency in AI model training data sources.
The AI Disclosures Project, co-founded by Tim O'Reilly and Ilan Strauss, aims to address the societal impacts of AI's commercialization by advocating for better corporate transparency. The paper's findings suggest that OpenAI, despite having some licensing agreements, may have used unlicensed paywalled content to enhance its AI models.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following us on social media:
More from: Regulation
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
2025 Generative AI in Professional Services Report
This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.
Read more