Britannica and Merriam-Webster Sue OpenAI for Alleged Copyright Violations in AI Training

Encyclopedia Britannica and Merriam-Webster Sue OpenAI for Alleged Copyright Infringement

In a significant legal development, Encyclopedia Britannica and its subsidiary, Merriam-Webster, have initiated a lawsuit against OpenAI, alleging extensive copyright violations. The complaint, filed in a New York federal court, accuses OpenAI of unauthorized use of nearly 100,000 online articles to train its large language models (LLMs), including ChatGPT.

The plaintiffs contend that OpenAI’s practices constitute massive copyright infringement, as the AI company allegedly scraped content from their websites without obtaining proper authorization. This unauthorized use, they argue, not only infringes upon their intellectual property rights but also diverts web traffic and revenue away from their platforms.

Furthermore, the lawsuit highlights concerns over the accuracy of information generated by OpenAI’s models. The plaintiffs allege that ChatGPT produces outputs containing full or partial verbatim reproductions of their content and, in some instances, generates false or misleading information—referred to as hallucinations—which are then incorrectly attributed to Britannica or Merriam-Webster. Such misattributions, they argue, could damage their reputations and erode public trust in their brands.

The legal action also addresses OpenAI’s use of Retrieval-Augmented Generation (RAG) workflows, which involve scanning the web or other databases for updated information to respond to user queries. The plaintiffs claim that this process further exacerbates the unauthorized use of their content, as it involves accessing and utilizing their proprietary materials without consent.

This lawsuit is part of a broader trend of media and publishing companies challenging AI firms over the use of copyrighted content. Previously, Britannica and Merriam-Webster filed a similar lawsuit against Perplexity AI, alleging that the company’s answer engine unlawfully reproduced their content and generated false information attributed to them. Other notable cases include legal actions by The New York Times, Ziff Davis (owner of Mashable, CNET, IGN, PC Mag, and others), and more than a dozen newspapers across the U.S. and Canada, such as the Chicago Tribune, the Denver Post, the Sun Sentinel, the Toronto Star, and the Canadian Broadcasting Corporation, all of which have sued OpenAI over similar copyright concerns.

The outcome of these lawsuits could have significant implications for the AI industry, particularly concerning the use of copyrighted materials in training data. While there is no strong legal precedent establishing whether using copyrighted content to train an LLM constitutes copyright infringement, some cases have begun to address this issue. For instance, in one particular instance, Anthropic successfully convinced federal judge William Alsup that this use case—using the content as training data—was transformative enough to be legal. However, Alsup argued that Anthropic violated the law by illegally downloading millions of books, rather than paying for them, which warranted a $1.5 billion class action settlement for impacted writers.

As the legal landscape evolves, these cases will likely influence how AI companies approach the use of existing content in developing their models and may lead to new standards and practices within the industry.