
Study Reveals OpenAI Models Memorize Copyrighted Content
OpenAI is facing new scrutiny as a study suggests its models have memorized copyrighted content. The study, co-authored by researchers from the University of Washington, University of Copenhagen, and Stanford, introduces a method to identify training data memorized by models like GPT-4 and GPT-3.5. This method involves using 'high-surprisal' words to probe the models' ability to recall specific content, indicating potential memorization of copyrighted material.
The study found that GPT-4 showed signs of memorizing portions of popular fiction books and New York Times articles. This discovery adds to the ongoing legal challenges OpenAI faces from authors and rights-holders who accuse the company of using their works without permission. OpenAI has defended its practices under the fair use doctrine, but the study's findings highlight the need for greater transparency in AI training data.
The researchers emphasize the importance of being able to probe and audit large language models to ensure their trustworthiness. They argue for more transparency in the data used to train these models, a sentiment echoed by OpenAI, which has advocated for looser restrictions on using copyrighted data for AI development.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.
Also, consider following us on social media:
More from: Regulation
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
2025 Generative AI in Professional Services Report
This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.
Read more