Anthropic's Controversial Book Scanning for AI Training

Anthropic has been revealed to have destroyed millions of print books to train its AI model, Claude, as part of a legally sanctioned fair use operation.

Anthropic has been revealed to have destroyed millions of print books to train its AI model, Claude. This operation was part of a legally sanctioned fair use process, as detailed in recent court documents. The company hired Tom Turvey, formerly of Google Books, to lead the acquisition and digitization of these books, aiming to replicate Google's successful book digitization strategy.

The process involved purchasing used books in bulk, removing their bindings, and scanning the pages into digital formats, after which the physical copies were discarded. This method was deemed transformative by Judge William Alsup, who ruled that the operation qualified as fair use because Anthropic legally purchased the books and did not distribute the digital copies.

The decision highlights the AI industry's demand for high-quality text data, which is crucial for training large language models like Claude. These models rely on vast amounts of well-edited text to improve their capabilities, making the acquisition of such data a competitive necessity. Despite the legality of the process, Anthropic's earlier use of pirated materials had initially complicated its legal standing.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish like AI Policy Brief.

Also, consider following us on social media:

Subscribe to AI Policy Brief

Weekly report on AI regulations, safety standards, government policies, and compliance requirements worldwide.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Read more