15 March 2024

AI and copyright infringement: the state of play

How AI systems may infringe copyright

AI systems (such as ‘large language models’ or ‘LLMs’) operate by processing vast amounts of data. Potentially the material/data used to train the system might originate from the intellectual and financial investments of others, and in that scenario questions arise as to who owns the product of the system and whether the system is infringing the rights of others.

Understanding Infringement

Copyright law serves to prohibit the unauthorised use of specific categories of creative works, including literary or artistic creations. While the nuances of what constitutes infringement vary across different legal jurisdictions, establishing copyright infringement typically requires demonstrating:

  1. The performance of a relevant action concerning a work (referred to as the junior work), such as reproduction, publication, or electronic transmission.
  2. The existence of objective similarity between the junior work and the copyrighted work (referred to as the senior work), or a significant portion thereof.
  3. The demonstration that this objective similarity arises from the copying of the senior work.

Various exceptions or defences to infringement exist, which vary depending on the jurisdiction. For example:

  • In the UK, exceptions to copyright infringement include fair dealing for certain purposes like research, criticism, and parody.
  • In the EU, there are multiple exceptions and limitations available, though they are not uniform across member states. Though the EU Copyright Directive introduced mandatory exceptions such as text and data mining for research purposes available in each member state.
  • In Australia, the use of copyrighted material may not amount to infringement if it qualifies as ‘fair dealing’ for specified purposes such as research, criticism, news reporting, or parody. However, these exceptions are narrow and necessitate a fair assessment considering factors like the purpose of the use and its impact on the market for the original work.
  • In China, exceptions to copyright infringement exist for personal study, research, news reporting, and educational purposes.

It is worth noting that none of these jurisdictions have a doctrine equivalent to the broad and flexible ‘fair use’ doctrine in the US, and given the use of systems might take place in various jurisdictions, it is a challenge to fully understand the issues everywhere.

Copyright Infringement through Training

The training of AI systems often involves the use of large datasets, which may comprise original works protected by copyright, such as artworks or code passages. Consequently, the process of training AI or machine learning systems potentially involves using or creating copies of these copyrighted works, even if solely for internal use within the system.

This copying forms the basis of allegations in cases such as Getty Images v Stability AI and Authors Guild v OpenAI, where companies are accused of using copyrighted materials to train AI systems without permission or compensation. These cases underscore the contentious issues surrounding the use of copyrighted material in AI training, raising questions about the liability of both AI developers and users of AI systems for infringement.

Copyright Infringement through Outputs

There is consequently potential infringement during training, and AI systems may produce outputs that arguably infringe copyright.  This is consequently a new frontier for the law to address.  For example, it will be necessary to understand and identify the relevant works both junior and senior, and understand how they are used, and then infringement requires demonstrating objective similarity between an output and an original work, which presents its own set of challenges, including identifying the infringer and establishing causality between input and output.

Practical Challenges

While copyright infringement may seem straightforward in principle, practical challenges can make it difficult to establish.  Issues to consider include how to prove that specific copyrighted works were part of the training data; it might be challenging to prove this due to the often-undisclosed nature of training sets.  It may well be necessary to consider the law from many places because of where the training took place or where the system is used, and there may be jurisdictional forum questions to consider relating to where the act of infringement allegedly occurred.

Government Responses

Governments worldwide are grappling with these challenges through legislation and regulatory guidance. For instance, the EU has passed an AI Act which aims to mandate transparency of training data to identify copyrighted materials used in AI training.

Getty in the UK

In January 2023, Getty Images initiated legal proceedings against Stability AI, claiming that the ‘Stable Diffusion’ system developed by Stability AI, an automated image generation system, infringed its IP rights. Getty’s allegations included the use of its images as data inputs for training Stable Diffusion and the generation of synthetic images by Stable Diffusion, which, according to Getty, replicated its copyrighted works or featured Getty brand markings.

Stability AI sought to have two aspects of Getty’s claims dismissed before a full trial[1]: the training and development claim and a claim related to secondary copyright infringement claim. The training and development claim is the claim that during the development and training of Stable Diffusion, works including the Claimants’ copyright works, were downloaded on servers and/or computers in the United Kingdom.[2] The secondary infringement claim is the claim of secondary infringement of copyright said by the Claimants to arise by reason of the importation of an ‘article,’ namely the pre-trained Stable Diffusion software, into the UK.[3]

In the High Court, Mrs. Justice Smith ruled in December 2023 that the legal positions on both issues were contested and unclear, requiring substantive consideration at trial rather than summary dismissal. Regarding the training and development claim, Mrs. Justice Smith highlighted the territorial nature of copyright and the central issue of where the training of Stable Diffusion occurred. If conducted in the UK, Stability AI might be liable for copyright infringement, but if it happened outside the UK, there would be no infringement under the UK legislation. The judge noted conflicting views on the ‘location issue’ between Getty, inferring UK-based infringement, and Stability AI’s evidence suggesting development in the US.[4] She emphasised the need for a full trial to examine evidence and resolve inconsistencies.

In addressing Getty’s secondary infringement claim, Mrs. Justice Smith emphasised the interpretation of the term ‘article’ in ss. 22, 23, and 27 of the Copyright, Designs and Patents Act 1988 as crucial to the case. She deemed it necessary to determine whether ‘article’ covers intangibles like software made available online, leaving this question for trial rather than a summary judgment based on assumed facts.[5]  Getty also made an application to amend its Particulars of Claim, which was allowed, to include an additional argument for trial. This argument contends that Stability AI reproduces substantial parts of Getty’s images when users of Stable Diffusion upload Getty-owned images (with or without a text prompt), requesting the system to generate synthetic images closely matching the originals using the ‘image strength’ slider functionality.  The outcome of that case will be the first opportunity for English courts to guide us on the future of copyright issues in the AI context.

Getty in the USA

Prior to the case in the UK, Getty Images also brought a claim against Stability AI in Delaware. Getty Images claims that Stability AI misused over 12 million copyrighted images and their metadata without permission. However, Stability AI has contended that the Delaware court lacks jurisdiction over its UK entity and applied to transfer the case to the northern district of California, where another lawsuit against the company is pending. Stability AI suggests that, even if the court establishes jurisdiction, the case should be transferred to California due to substantial overlaps in allegations and legal issues. Notably, Stability AI’s response does not directly address the allegations made by Getty Images.

Emad Mostaque, Stability AI’s founder and CEO, has argued that generative AI technology, such as the one used by Stability AI, is protected by the concept of ‘fair use’ under US law. Fair use allows creators to use other people’s material if it is transformative, changing the nature of the original material. Mostaque believes that the transformative nature of generative AI technology shields it from claims of copyright misuse.  If this argument is successful, the US may begin to establish a new direction in the law that other jurisdictions will be interested in monitoring.

Stability AI is known for its involvement in funding the open-source text-to-image AI model called Stable Diffusion. The technology is commercialised through a product named DreamStudio. Stable Diffusion is trained on a 100TB dataset containing 2 billion images, including those copyrighted by Getty Images. Christoph Schuhmann, the founder of LAION, the German AI non-profit responsible for building the dataset, asserts that companies using the dataset commercially bear the responsibility for any potential copyright infringements.

Amidst the legal battle, Getty Images has launched a new service called Generative AI, developed in collaboration with Nvidia. This service allows customers to create novel images trained on Getty’s extensive library of human-taken photos. Getty claims that the new service is commercially viable and distinct from other AI-generated image services, as it was not trained on the open internet.  Getty’s CEO, Craig Peters, highlights that Generative AI by Getty Images ensures full indemnification for commercial use and addresses intellectual property concerns that have deterred businesses from using generative AI tools. The service involves compensating contributors for their images included in the training set, following a royalty-sharing model.

As the legal battle unfolds, the clash between traditional copyright norms and the transformative capabilities of generative AI remains at the forefront. The outcome of these lawsuits could set important precedents for the use of AI technology in creative industries, shaping the boundaries of fair use and copyright protection in the digital age. The development of services like Generative AI by Getty Images also indicates a growing effort to balance the opportunities presented by AI with the protection of intellectual property rights.


[1] Getty Images (US) Inc & Ors v Stability AI Ltd [2023] EWHC 3090

[2] Ibid at 11(a)(i).

[3] Ibid at 11(a)(ii)

[4] Ibid at 60.

[5] Ibid at 95.