Copyright Law: The Legal Speed Bump to Generative AI Development

By: Luke Kim, LAW ‘25

Generative AI excites the mind. With a couple of words, I can generate text, audio, and images limited only by the model. Yet not all is well in the world of generative AI. The development of generative AI outpaces the creation of legal safeguards, and more importantly, any meaningful debate on generative AI’s role in society. Outside of contract negotiations and the right of publicity, those without bargaining power have a dearth of legal remedies regarding generative AI mainly because the law has not caught up to the technology. Scarlett Johannson, for example, resorted to the court of public opinion to prevent the emulation of her voice by OpenAI because of the absence of generative AI legislation. No effective cause of action currently exists to slow the development, commercialization, and widespread use of generative AI with one notable exception: copyright law.

In the United States, a copyright is infringed when a copyright holder’s exclusive rights (reproduction, distribution, derivative works, public performance, public display, and digital audio transmission) are violated through the unauthorized use of their original work. For example, if I decided to write a fantasy novel called Adventures of Gimli and Legolas based on The Lord of the Rings and post it for free online, my work infringes upon the Tolkien Estate’s exclusive right to prepare derivative works. 

However, I can claim fair use as an affirmative defense. In the United States, fair use is a legal doctrine that allows the use of copyright protected works without the permission of the copyright holder. Copyright law provides several categories of permissible fair use of copyrighted material, including criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, and research. Regardless of the form, courts will need to balance four factors found in 17 U.S.C. § 107 when determining fair use: 

  • The purpose and character of the use focuses on whether the use is commercial or noncommercial and if it is transformative (adds new meaning or purpose). Noncommercial and transformative uses favor fair use. Note that the definition of transformative use is still developing and open to interpretation.
  • The nature of the copyrighted work considers the work’s originality and creativity. Factual works (e.g., news) favor fair use, while creative works (e.g., movies) do not. Using unpublished works generally weighs against fair use.
  • The Amount and substantiality of the portion used examines how much of the original work is used. Smaller portions favor fair use if not using the heart of the work. Using the entire work may also still be fair depending on other factors.
  • The effect of the potential market for or value of the copyrighted work evaluates if the use impacts the original work’s market. Uses that could harm the potential market or value of the original work weigh against fair use.

My hypothetical self-published novel Adventures of Gimli and Legolas would likely be fair use because it is noncommercial—as anyone can read it without any monetary cost—and the effects on the potential market of J.R.R. Tolkien’s work are limited. Readers who wish to read my work most likely read the original trilogy prior to scouring the internet for more content and there are currently no released plans to create sequel stories with Legolas and Gimli. 

The analysis above is the foundation of what is occurring in a host of generative AI training lawsuits. Generative AI programs are trained on massive amounts of data, such as images, text or music, often taken without permission from the internet. Most generative AI programs use a large language model (LLM) which can generate content based on a prompt. The materials used to train generative AI are called the “inputs” and the products produced by the program the “outputs.” Focusing on the inputs, plaintiffs in generative AI training lawsuits allege that defendants engage in copyright infringement through the unauthorized use of copyrighted material to train the defendant’s AI models. In response, defendants claim fair use. In Authors Guild v. OpenAI (now Alter v. OpenAI), for example, the plaintiffs argue that the defendants’ use of their copyrighted works to train their models infringed upon their exclusive rights. The defendants, Microsoft and OpenAI, assert a fair use defense with OpenAI particularly arguing that their use is transformative. As formulated in the 2023 Supreme Court case Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, when determining transformative use, courts should look to the degree of difference of purpose and character between the original work and the allegedly infringing use. The greater the difference, the more likely that the use is transformative. 

The case most analogous to generative AI copyright lawsuits is the 2015 Second Circuit Court of Appeals decision in Authors Guild v. Google, Inc, in which authors sued Google for displaying digital copies of their books through Google Books and Library Project applications. The federal appellate court sided with Google, accepting the company’s fair use defense to copyright infringement because the “purpose of the copying [was] highly transformative, the public display of text [was] limited, and the revelations [did] not provide a significant market substitute for the protected aspects of the originals.” But key facts distinguish this case from more recent copyright claims challenging use of copyrighted material for generative AI training, one of which being the impact on the market. In Authors Guild, the court disagreed with the plaintiff’s argument that a licensing market existed for Google’s particular use. This holding reduced the court’s assessment of the overall impact that Google’s online libraries have on the market. In generative AI copyright lawsuits, plaintiffs in may argue that a market for licensing in fact exists for their particular use. As defendants continue to use unlicensed material, this would deprive plaintiffs of their revenue. 

In May 2025 the U.S. Copyright Office, as the third part of its report on Copyright and Artificial Intelligence, released, in pre-publication, a report focusing on generative AI training. In the report, the Office stated when analyzing fair use “[i]t is for the courts to weigh the statutory factors together.” But, it “observe[s], however, that the first and fourth factors can be expected to assume considerable weight in the analysis.” 

It is unclear where the courts will land. However, we do have one case, Thomson Reuters Enter. Centre GmbH et al v. Ross Intelligence Inc., yielding early insights. Thomson Reuters alleged that Ross Intelligence (Ross) used Westlaw’s headnotes to train its AI powered legal research engine. On summary judgement, the Delaware federal district court held that Ross infringed on Thomson Reuters’ copyright by using content copied from Westlaw to train its AI. When determining fair use, the court found that the first and fourth factors favored no fair use and the second and third factors did not sway the analysis in favor of Ross Intelligence. Relying on Warhol the court determined that Ross’s use is commercial and not transformative because “it does not have a further purpose or different character” and because Ross itself has attested that it stands to benefit from the copyrighted material. While Ross Intelligence does not focus on generative AI LLM training, it is the first to rule on the use of copyrighted inputs to train an AI model and will be a key case in the generative AI infringement litigation.

Infringement lawsuits have more immediate effects beyond evolving copyright law to meet the new challenges posed by commercial generative AI products. As seen in Concord Music Group, Inc. v. Anthropic PBC, plaintiffs seek preliminary injunctive relief at the outset of litigation, directly slowing down the development of generative AI. If granted, a preliminary injunction can require a party to discontinue actions or modify behavior until the court reaches a final judgment. If successful, copyright infringement litigation could provide a hampering effect on generative AI development if developers are forced to pay for licenses to use copyrighted material. Even if plaintiffs ultimately lose in court, litigation forces owners of AI models to divert time and resources to legal defense, which in turn slows AI development. Additionally, Congress may devote some effort, something even as simple as dataset disclosure, to respond to the outcome of these lawsuits.  

I am not saying that the development of generative AI should stop. That question is ultimately moot — the cat’s already out of the bag. However, developers should still carefully consider how this technology should be used. Despite its widespread use, people distrust generative AI and want transparency. Certain standards, like the NIST AI Risk Framework, exist to help guide development. The absence of legislation should not be a signal to develop generative AI at an unrelenting pace and without consideration of disruptive impacts on the economy and society. After all, we don’t want to accidently develop SkyNet, the overarching AI antagonist in the Terminator series. And until the passage of generative AI legislation, one of our best tools against SkyNet is copyright law. 

Luke Kim is a third year J.D. candidate at Temple University Beasley School of Law. His interests include IP, international law, and litigation. 

This blog is a part of iLIT’s student blog series. Each year, iLIT hosts a summer research assistant program, during which students may author a blog post on a topic of their choosing at the intersection of law, policy, and technology. You can read more student blog posts and other iLIT publications here.

Similar Posts