Skip to main content
European Commission logo
IP Helpdesk
News blog4 July 2023European Innovation Council and SMEs Executive Agency4 min read

OpenAI sued for copyright infringement – Lana del Rey settling plagiarism dispute



OpenAI again sued for allegedly infringing copyrights of thousands of works

On Wednesday 28 June, US authors Paul Tremblay and Mona Awad (the plaintiffs) filed a class action complaint in the San Francisco federal court against OpenAI, for copyright infringement when training its auto-generative artificial intelligence system known as ChatGPT. The proposed class action alleges copyright infringement, violations of the Digital Millennium Copyright Act, unjust enrichment and negligence, among other claims, on behalf of themselves and all others similarly situated.

As we already know, Chat GPT is an auto-generative chat that extracts data from different sources and then processes it using Natural Language Processing (NLP). Since the launch of ChatGPT, there has been a lot of discussion about its relationship with intellectual property, specifically with copyright: when is the output inspired from existing works and when is it actually infringing them? We discussed it in this post.

Generative AI companies are facing a barrage of numerous legal actions. Earlier this year, Getty Images sued the company Stability IA for training on millions of its pictures without consent. The proposed class action filed in the San Francisco federal court last Wednesday is based on the claim that OpenAI infringed copyright at two points: first, when it illegally downloaded copies of novels to train its artificial intelligence system, and second, because ChatGPT's responses (output) are themselves infringing the rights in such works.

As to the first issue.

The plaintiffs alleged that much of the material in OpenAI's training datasets comes from copyrighted works, including books, which were copied by OpenAI without consent, without credit, and without compensation. Books have always been a key ingredient in training datasets for large language models, as they provide the best examples of high-quality extensive writing. In the July 2020 GPT-3 paper, OpenAI revealed that 15% of GPT-3's huge training dataset came from "two internet-based book corpora", which can be estimated at around 300,000 titles. The plaintiffs claimed that the only internet-based book corpuses that have ever offered so much material are the notorious "shadow library" websites, which are blatantly illegal.

As evidence of infringement, the plaintiffs argued that when ChatGPT was asked to summarise the books written by each of them it generated very accurate summaries and that the reason it could do it is because the books were copied by OpenAI and ingested by the language model as part of its training data. The two authors alleged that OpenAI made copies of their books during the process of training OpenAI’s language models without their permission. Therefore, they sought damages and restitution of profits.

As to the second issue.

The plaintiffs argued that because the output of the OpenAI Language Models is based on expressive information extracted from the plaintiffs' works, each output of the OpenAI Language Models is an infringing derivative work, without permission from the authors and in violation of their exclusive rights. They alleged that OpenAI has benefited economically from the infringing results of the OpenAI Language Models as each result of the auto-generative chat constitutes an act of contributory copyright infringement. They also sought damages and restitution of profits.

This class action figures in around 300.000 books that could have been victims of plagiarism and seeks to represent the hundreds of thousands of US authors whose copyrights may have been infringed — in many of these cases, through websites that offer this content illegally.

Also on Wednesday, another class-action suit was filed against OpenAI in the California federal court by Clarkson, a public-interest law firm, on behalf of anonymous clients. They accuse OpenAI of stealing and misappropriating vast swathes of personal data from the Internet.


Lana del Rey settles extrajudicially with two Spanish authors and avoids a lawsuit for plagiarism

The representatives of the famous US singer Lana del Rey have reached an out-of-court settlement with the Spanish musician Lucas Bolaño and filmmaker Julio Drove to avoid going to court.

Bolaño and Drove sued Lana del Rey in 2022 for using without permission a 17 second excerpt from one of their works entitled “Sky” for her music video “Summertime Sadness” in 2012. They affirmed that she not only copied the images from their short film “Sky”, but also stole the audio, that belongs to “Strange Dumpling Cheeks”, an album released by Lucas Bolaño in 2008 under a Creative Commons 3.0 licence. This licence allowed copying and modification of the work as long as it was used for non-commercial purposes and authorship was acknowledged. The musician’s lawyers claimed that the video had generated millions of dollars and that the plaintiffs have not seen a penny of these profits and have not ever been credited for their work.

The singer’s lawyers asked the judge to dismiss the lawsuit because they argued that Bolaño had waited too many years to sue as he did it ten years later after the release of the music video “Summertime Sadness”. However, Bolaño affirmed that he did not realise that plagiarism had occurred until 2021.

Last May, the judge refused to dismiss the lawsuit against Lana and sent the case to a jury trial. It was then that the lawyers of both parties agreed and signed an out-of-court settlement. Specific terms of the settlement were not disclosed.