The Global Reckoning: How Copyright Lawsuits Are Defining AI’s Legal Boundaries

Introduction

Several authors of copyrighted books have filed class action lawsuits against major tech market players, alleging systematic copyright infringement. The allegations stem from tech companies’ unauthorised use of copyrighted content to develop large language models (“LLMs”), which provide these tech companies a competitive edge in the Artificial Intelligence (“AI”) market.

Asian News International (“ANI”) Media has filed a similar suit[1] in the Delhi High Court against OpenAI Inc. alleging copyright infringement and claiming that OpenAI scraped and used its media content without proper permission or consent to train its LLMs, including the AI chatbot, ChatGPT.

While these lawsuits are pending adjudication, fundamental questions arise about intellectual property rights in the digital age, the boundaries of fair use in the context of machine learning, and whether the authors of copyrighted works are entitled to compensation when their materials are used in the development of AI systems.

This article explores how these parallel lawsuits could shape the AI ecosystem and addresses the potential obligation for tech companies to seek content licensing and permissions across the globe.

Background of the Lawsuits

Class action lawsuit filed against tech companies

    Allegations against tech companies suggest that they train LLMs by harvesting content from the copyrighted books authored by the plaintiffs without seeking proper authorisation or providing any compensation. The training process allegedly involves seeking the assistance of shadow libraries—online digital repositories of pirated books—to gain access to authors’ copyrighted works and various other copyrighted material. Copyright holders have also asserted that the tech companies have used various software programs to download data from sources known to include copyrighted material. To cater to their growing need for data, these tech companies have also allegedly created their own web crawlers to scrape data to construct training datasets from copyrighted works, including books.

    Plaintiffs have also alleged the use of BitTorrent to amass an extensive collection of unauthorised literary works, potentially constituting direct copyright infringement. They have also accused these tech companies of functioning as distributors of these pirated works, thereby exacerbating copyright violations throughout the digital ecosystem. Plaintiffs have also claimed that these companies have designed specialised software to engage in unauthorised stripping copyright management information (“CMI”) from the literary works, compounding alleged intellectual property violations.

    The authors of copyright have claimed relief and sought statutory damages, actual damages, restitution of profits, and other remedies provided in law on the following grounds:

    • The act of sourcing content from copyrighted works to train the LLMs constituted direct copyright. Additionally, by acting as distributors of pirated works, tech companies have facilitated further copyright infringement.
    • The removal of CMI to conceal the infringement caused injury to the authors, as tech companies had reasonable grounds to believe that this act would reduce authors’ chances of discovering that the tech companies had copied their infringed works.
    • The conduct of using copyrighted data means that the tech companies did it with the intention to deprive the authors of their intellectual property and cause economic harm. 

    In response, the tech companies defended their actions by asserting “fair use”, claiming that the use of copyrighted works of author to train their LLMs was transformative in nature and did not directly compare with the original material. They also argued that the usage of “publicly available” content to train the LLMs was shielded by the fair use doctrine.

    ANI Media’s Case Against OpenAI in Delhi High Court

    ANI Media’s lawsuit against OpenAI in Delhi High Court underscores concerns over the unauthorised use of copyrighted content to train AI models for commercial usage. ANI Media has alleged that OpenAI used its copyrighted news content without permission to train the AI chatbot, ChatGPT, which resulted in the generated responses resembling ANI’s original work. Furthermore, ANI Media has asserted that it proposed a licensing agreement that could have legitimised the use of its content; however, OpenAI ignored the request.

    OpenAI has denied any wrongdoing, arguing that consent is not required as it trains using publicly available data. OpenAI has also taken the stance that using such data does not constitute copyright infringement, as it falls within the exception and purview of “fair use” as stipulated under Section 52 of the Copyright Law.

    While the Court is yet to opine on the legal issues, it has framed the key issues[2] for consideration from the Hon’ble Court, including (i) whether the defendants’ storage of the plaintiff’s data—in the nature of news and claimed to be protected under the Copyright Act, 1957—for training its software, i.e., ChatGPT, amounts to infringement of the plaintiff’s copyright; (ii) whether the defendants’ use of the plaintiff’s copyrighted data to generate responses for its users amounts to infringement of the plaintiff’s copyright; and (iii) whether the defendants’ use of the plaintiff’s copyrighted data qualifies as “fair use” under Section 52 of the Copyright Act, 1957.

    Will the defense of “fair use” sustain in Courts?

    How the Courts will determine and interpret the issue remains to be seen; however, experts of copyright law view the defense of “fair use” for transformative work as weak and untenable.

    These experts argue that the output is not transformative, as AI developers claim, considering the intent behind the creation of these works is to compete commercially with the original content. Moreover, these creations neither add new meaning nor alter the meaning, message, or expression of the original work.

    Experts also consider as weak, tech companies’ assertion, that they use authors’ copyrighted works for a different purpose than originally intended. The Courts have justified “fair use” if copying “served a new and different function” from the purpose for which these copyrighted works were originally created.

    Another AI developer has argued[3] that while plaintiffs’ works were created for standalone entertainment value, generative AI repurposes those works for “helping computer programs learn the patterns inherent in human-generated media”. Experts dismissed the assertion, citing inaccuracies in tech companies’ description of the plaintiffs’ works and their purpose in engaging in alleged copyright infringement. They argued that using plaintiffs’ creative works serves the same fundamental purpose as the original, rather than transforming them for a different function or objective. Since the main purpose of the original works includes educational aspects, such as conveying knowledge and imparting skills, this similarity in purpose does not justify broad exemptions from copyright protections.

    Similarly, experts in India have highlighted that OpenAI’s claim of transformative use under “fair dealing” may not be successful, because ChatGPT is a commercial product, even though OpenAI doesn’t copy content verbatim and instead reorganises information based on user prompts to generate responses.

    Recommendation for stakeholders

    • For AI Companies

      Until legal issues are determined globally and to avoid litigation, companies developing AI should adopt best practices for data collection and documentation. Some options for AI companies to ensure compliance and stay out of legal trouble include developing “opt-out mechanisms” to enable authors to exclude their works from training datasets or exploring licensing agreements with copyright holders.

      • For Copyright Holders

      Copyright holders should adopt proactive measures to protect their content by implementing regular searches or using monitoring services to identify potential infringement by AI systems. Adding copyright information and including usage restrictions in file metadata may also assist in early detection. 

      • For Policymakers

      Balancing technological innovation with creators’/authors’ rights is key to addressing the AI- related copyright challenged. Policymakers can achieve this by establishing clear guidelines on “fair use” for AI, working towards international standards to address the global nature of AI development, developing statutory licensing schemes, and creating safe harbours with conditions to ensure compliance with copyright laws.

      Conclusion

      These cases represent a watershed moment in the evolving relationship between AI and intellectual property law. Given that the copyright holders have substantial leverage in such disputes, it is imperative that the content creators and right holders assert their intellectual property rights through appropriate channels by taking proactive measures against potential infringement. Simultaneously, tech companies must approach content acquisition with extreme caution, as copyright violations could expose them to both civil liability and criminal prosecution, with penalties extendable up to three (3) years of imprisonment and fines up to INR 2 lakh under the Copyright Act, 1957. This deterrent should prompt tech entities to implement rigorous compliance protocols while appropriating copyrighted materials.

      As these legal issues unfold, their outcomes are likely to shape the AI landscape in several ways. Cases such as these will not only establish crucial precedents and but may also pave the way for the development of a robust AI regulation in India and globally. The rulings will likely influence how companies approach data collection and training methods, how authors protect and enforce their copyright, and how governments regulate technology and innovation. The outcomes of these disputes may determine whether we are headed towards adversarial relationships between AI companies and content creators or collaborative frameworks that benefit all stakeholders.


      [1] Ani Media Pvt Ltd v Open Ai Inc & Anr., CS(COMM) 1028/2024.

      [2] Order dated November 19, 2024, in Ani Media Pvt Ltd v Open Ai Inc & Anr, CS(COMM) 1028/2024.

      [3] Comments of OpenAI on U.S. Copyright Office’s Notice of Inquiry and Request for Comment [Docket No. 2023-06], at 11–12 (October 30, 2023).

      Photo of Faraz Alam Sagar Faraz Alam Sagar

      Partner in the Disputes, Regulatory, Advocacy and Policy Practice at the Mumbai office of Cyril Amarchand Mangaldas. Faraz has significant experience in the areas of commercial litigation and investment dispute arbitrations. He regularly advises multinational corporations and financial institutions in a wide range…

      Partner in the Disputes, Regulatory, Advocacy and Policy Practice at the Mumbai office of Cyril Amarchand Mangaldas. Faraz has significant experience in the areas of commercial litigation and investment dispute arbitrations. He regularly advises multinational corporations and financial institutions in a wide range of contentious disputes including investigations, litigation and regulatory enforcement proceedings in India. Faraz also has considerable expertise in telecom disputes, white-collar, forensic and corporate espionage investigations. He can be reached at faraz.sagar@cyrilshroff.com

      Photo of Pallavi Choudhary Pallavi Choudhary

      Principal Associate designate in the Dispute Resolution & White Collar Crimes practice at the Mumbai office of Cyril Amarchand Mangaldas. Pallavi focuses on cross border issues and litigation, involving enforcement of petitions and advisory pertaining to Hague Evidence Convention, regulatory compliance, and international…

      Principal Associate designate in the Dispute Resolution & White Collar Crimes practice at the Mumbai office of Cyril Amarchand Mangaldas. Pallavi focuses on cross border issues and litigation, involving enforcement of petitions and advisory pertaining to Hague Evidence Convention, regulatory compliance, and international enforcement proceedings in India. She can be reached at pallavi.choudhary@cyrilshroff.com.