Adobe Faces Class-Action Lawsuit Over Alleged Use of Pirated Books in AI Training
Adobe Inc., a leader in digital media solutions, is confronting a proposed class-action lawsuit filed by Oregon-based author Elizabeth Lyon. The lawsuit alleges that Adobe utilized pirated versions of numerous books, including Lyon’s own works, to train its artificial intelligence (AI) model, SlimLM.
Background on SlimLM and Training Data
SlimLM is a series of small language models developed by Adobe, optimized for document assistance tasks on mobile devices. According to Adobe, SlimLM was pre-trained on SlimPajama-627B, a deduplicated, multi-corpora, open-source dataset released by Cerebras in June 2023. However, Lyon’s lawsuit contends that SlimPajama is a derivative of the RedPajama dataset, which includes the controversial Books3 dataset—a collection of approximately 191,000 books, many of which are alleged to be pirated copies.
Details of the Allegations
Lyon, an author of several guidebooks on non-fiction writing, claims that her copyrighted works were included in the SlimPajama dataset without her consent. The lawsuit asserts that the SlimPajama dataset was created by copying and manipulating the RedPajama dataset, which in turn incorporated the Books3 dataset. Consequently, SlimPajama contains the copyrighted works of Lyon and other class members, leading to allegations of copyright infringement.
Broader Context of AI Training and Copyright Issues
The use of large datasets for training AI models has become a common practice among tech companies. However, the inclusion of copyrighted materials without proper authorization has led to a series of legal challenges. The Books3 dataset, in particular, has been at the center of multiple lawsuits. For instance, in September 2025, a lawsuit against Apple claimed that the company used copyrighted material from the RedPajama dataset to train its Apple Intelligence model without consent, credit, or compensation. Similarly, in October 2025, Salesforce faced a lawsuit alleging the use of RedPajama for training purposes.
Precedents in AI and Copyright Litigation
This lawsuit against Adobe is part of a growing trend of legal actions addressing the use of copyrighted materials in AI training. Notably, in September 2025, AI company Anthropic agreed to pay $1.5 billion to settle a class-action lawsuit filed by authors who alleged that Anthropic used pirated versions of their works to train its chatbot, Claude. This settlement is considered one of the largest in the ongoing legal battles over copyrighted material in AI training data.
Implications for the Tech Industry
The outcome of this lawsuit could have significant implications for the tech industry, particularly concerning the sourcing and use of training data for AI models. As AI continues to evolve and integrate into various applications, the legal frameworks governing the use of copyrighted materials in AI training are likely to become more stringent. Companies may need to adopt more transparent and ethical practices in sourcing training data to avoid legal repercussions and maintain public trust.
Conclusion
Adobe’s alleged use of pirated books to train its SlimLM AI model underscores the complex intersection of technology and intellectual property rights. As the legal landscape continues to evolve, this case may serve as a pivotal moment in defining the boundaries of fair use and copyright infringement in the realm of AI development.