If It Looks Like a Duck
On the Munich Regional Court's ruling in GEMA v. Open AI
The lawfulness of using copyrighted material to train generative AI models is one of copyright law’s hottest issues. The various lawsuits around the world seeking to answer this question are seen by many as fundamental for the future of both technology and cultural creations. The Munich Regional Court has now become the first court in the EU to issue an opinion on the lawfulness of AI models. The court’s ruling does not only fuel the copyright debate, it can also be embedded in a larger political and social context – reinforcing a narrative of the EU as a thorough regulator, opposing the free-market US. At the same time, the decision reflects a stricter stance towards large tech companies than in previous years.
The case at issue
The Munich lawsuit was brought by the German collecting society GEMA, which exercises the rights to the lyrics of German classic songs such as Reinhard Mey’s “Über den Wolken” and Rolf Zuckowski’s “In der Weihnachtsbäckerei”. GEMA based its case on OpenAI’s use of these songs’ lyrics to train the models GPT-4 and GPT-4o, and their subsequent appearance in the models’ output when prompted. The court was tasked with deciding on the lawfulness of both the act of training the AI model with copyright-protected material, as well as displaying the lyrics in the model’s output. It ruled in favor of GEMA and ordered OpenAI, inter alia, to cease and desist from reproducing the works in the model itself and from reproducing and communicating them to the public through the output.
Memorizations as reproductions
Notably, the court bases its decision primarily on the fact that the AI model itself contained reproductions of the song lyrics that were used for training (para. 165 ff.). The phenomenon of memorization is central to this assumption: While generative AI models are not databases but consist of many parameters that represent statistical correlations in the training data, it may still be possible that training data is incorporated in the model in a way that it can be extracted as output upon a particular prompt. Such memorization occurs, for example, when certain data is included very often in the training dataset. The court treats the question of whether memorization has taken place as a question of fact, which it affirms in its assessment of the evidence (para. 168 ff.). It reaches this conclusion because the works in question (1) were indisputably included in the training dataset and (2) were reproduced in the output in a clearly recognisable manner (3) through “very simple prompts”.
On these facts, the Regional Court found that a reproduction (Section 16 German Copyright Act) had taken place (para. 176 ff.). The court applies the (usual) broad definition of the term “reproduction”, according to which any physical fixation of a work that is suitable for making the work perceptible at least indirectly is sufficient. Importantly, the court considers it irrelevant whether the song lyrics at issue were stored as-is, or merely “reflected” (OpenAI, para. 76) in the model parameters. It concludes that the works are “embodied” in the parameters of the model itself, simply because they were used as part of the dataset that trained the model, and can in turn be extracted from it as output.
Following this notion, the key question now becomes whether every instance of memorization qualifies as reproduction. After all, the phenomenon of memorization in information technology is by no means limited to cases in which training data is generated by “simple prompts”. Prompts and output merely serve to prove that training data has been memorized. Proof could only be found lacking in case of prompts that already contain the training data verbatim (“Repeat after me: ‘Wind Nord-Ost, Startbahn null-drei […]’”) or its abstract information (as in the example that OpenAI gave to the court, which uses numerous prompts to reconstruct the song lyrics word for word, para. 173 f.). Apart from such cases, it becomes irrelevant whether a prompt is simple or sophisticated, or even whether it is provocative or designed to circumvent security mechanisms: if the training data is provided in the output, we can conclude that it has been memorized in the model.
Given this broad definition of “memorization”, it is patently debatable whether it should always result in a finding of copyright-relevant reproduction. For instance, some works, while effectively memorized in the model, may rarely be perceived in the output if they are only triggered by a very specific, complex prompt. The concept of reproduction, however, does not reflect a spectrum of probability between mere possibility and actual perception. Yet, the economic interests of rightsholders arguably remain unaffected by a latent representation in the model; they are infringed only by the actual perception of the works in an output. Some scholars therefore consider a teleological reduction of the concept of reproduction, for example by limiting it to memorizations that can be made perceptible “with reasonable effort”. The Munich Regional Court helps itself along with a similar adjustment – not on the legal, but the factual level, as the court emphasises that “simple, non-provocative” prompts proved memorization.
The text and data mining exception
Copyright law permits reproductions without the permission of the rightsholder if they are carried out for the automated analysis of text and data in order to extract information (i.e., text and data mining (TDM), Section 44b German Copyright Act). The decision reveals the first path dependency in case law, with the Munich Regional Court adopting the Hamburg Regional Court’s structure of three phases (para. 166): (1) creation of the training material, (2) training of the model, (3) use of the model. Although phase 1 was not within this dispute’s scope, the Munich Regional Court – like the Hamburg Regional Court – takes a position on this and assumes as well that reproductions for the preparation of a training corpus are covered by Section 44b German Copyright Act.
The court classifies the memorization as part of phase 2 and states plainly that such reproductions cannot be carried out “for the purposes of” TDM because they do not serve to obtain any further information (para. 206). Notably, the court explicitly considers whether to interpret the provision in a manner favorable to technology and innovation, thereby disclosing a consideration that in the past may have only implicitly influenced comparable decisions on transformative technologies. At the same time, however, it clearly rejects such an interpretation, arguing that the legitimate interests of rightsholders would be prejudiced and that the risk of infringement is rooted in the way LLM models are trained.
Liability for output infringement
Finally, the court finds that showing the lyrics in the AI’s output amounts to acts of reproduction and making available to the public (§ 19a German Copyright Act) (para. 239 ff.). As every internet user could obtain the song lyrics at issue by prompting the AI model, they are made available to the public. Moreover, the lyrics are reproduced (again) when displayed to the user and stored in their chat history.
The court rejects OpenAI’s argument that it is the user who should be held responsible for such output infringements. Instead, it emphasizes OpenAI’s decisive role: as the operator, it selects the training data, designs the model’s architecture and ultimately decides to make a model available that carries such risk of infringement. OpenAI is thus denied the favorable treatment as intermediary, which in the past, due to its liability exemption, has benefited digital businesses.
Consequences of the decision
The Regional Court did not issue a universal decision on the lawfulness of training generative AI on copyrighted material. While its reasoning is by no means limited to song lyrics and in fact applies to all types of works, memorization must be established on the facts of the case for each work and model concerned.
OpenAI will most likely appeal the ruling. Should it become final, OpenAI would face two injunctions and a claim for damages: the lyrics may no longer be reproduced in the models, and no outputs containing the lyrics may be generated. Damages already incurred as well as future damages would have to be compensated.
The injunction on reproducing the relevant content in the output could presumably be implemented with reasonable effort through content moderation: While it is (at least for now) virtually impossible to prevent all potentially copyright-infringing output, the display of certain word sequences such as the song lyrics in question could certainly be prevented with simple software instruction.
The injunction targeting reproductions in the models, on the other hand, could have serious consequences for OpenAI. This is because the affected works cannot be filtered out of the model. According to OpenAI, “there is no unlearning” (para. 84), at least not at present. Insofar as the use of ChatGPT involves reproductions of the model on German servers, OpenAI may therefore be forced to stop offering ChatGPT in this region altogether. Alternatively, it remains free to license the works at issue. The ruling is limited to models 4 and 4o, which are likely to be replaced by newer models with ostensibly different training data anyway. Nevertheless, it currently seems that the phenomenon of memorization is almost impossible to prevent. This means that going forward, works contained in training datasets are likely to be considered reproductions in the model, and their use will therefore require permission from the rightsholders. Notably, comparable difficulties arise from a data protection perspective.
The court rejected OpenAI’s plea to refer this case to the ECJ itself or to suspend it until a ruling is issued in the Like Company referral procedure already pending. That case concerns the summarisation of an article by Google’s AI chatbot “Gemini”. The questions referred also deal with the training phase and the output level of the AI model. Therefore, the upcoming ECJ decision could, at least in part, supersede the ruling of the Munich Regional Court if it were to take a different view.
…then it probably is a duck
Due to their initial complexity, new technologies are inherently difficult to treat adequately in legal analysis. Taking advantage of this, AI providers describe their models as “black boxes”; it is unclear what exactly happens within them, but it is definitely not a reproduction; rather, outputs are based on a ‘sequential-analytical, iterative-probabilistic synthesis’ (para. 80). The Munich Regional Court cuts through this technical opacity: What goes into the model as input and comes out again as output must, for copyright purposes, ultimately also be contained within the model itself.
Moreover, assigning responsibility for technology poses a problem if technology is understood as something that is predetermined. Yet, innovation is very much dependent on the choices made by innovators – making it contingent rather than inevitable. This also applies to AI models: Although the operator does not know its output beforehand, the model is based on a freely chosen architecture and selected training data. Accordingly, the Munich Regional Court found that OpenAI was aware of the copyright infringements at least since GEMA had given notice and could have trained a new model or obtained a license during this time.
The decision fits snugly within the current popular narrative of a tightly regulating EU that protects rightsholders and a US that favors AI-friendly market solutions. In June, two US courts handed down copyright rulings in favor of Anthropic and Meta (although they are of little consequence due to their limited scope). Meanwhile, the EU is looking to thread the needle by maintaining a high level of copyright protection without causing competitive disadvantages for European companies. But in any case, due to intellectual property law’s principle of territoriality, AI training abroad is not subject to European copyright laws. The EU sought to address this issue in the AI Act by imposing meta-obligations on providers, such as the requirement to implement a policy to comply with EU copyright law (Art. 53(1)(c) AI Act).
The court’s reasoning now reveals a different path for holding AI providers from third countries accountable for respecting European copyright law: Reproduction of the output and their making available occur in the EU and are therefore subject to European (or, in this case, German) copyright law. The court appears to take the same view regarding reproductions in the model itself – though it remains remarkably quiet on this point, merely stating, in the context of international jurisdiction, that OpenAI’s servers, on which the model is provided, are located in Germany, and referring to the lex loci protectionis principle. The fact that, in the court’s opinion, the reproductions in the model are the result of the training process and therefore – in the case of OpenAI – took place in the US is not addressed at all. Most likely, (further) reproductions of the model are considered part of its use on servers in Germany. However, the court does not elaborate on this at all.
The decision of the Munich Regional Court also reflects another trend: in the past, then young and somewhat idealised digital companies benefited from innovation-friendly regulation and jurisprudence that gave them enough freedom to become giants. This was justified by the high social benefits that their business models promised. This argument is essentially no less valid for generative AI than it was for search engines or host providers. OpenAI, in essence, seeks to draw upon such arguments from the “golden age” of platform innovation when it calls for ‘an assessment-based adjustment of liability’ (para. 228).
However, the circumstances have changed. Trust in large tech companies has suffered greatly in recent years. The social costs associated with internet platforms are now becoming apparent. For generative AI, this is happening in light speed – the effects on the labour market, mental health and culture are already subject to intense public debate. In Kadrey v Meta Platforms, for example, it is stressed that while being “highly transformative”, generative AI may also “dramatically undermine the market” for creative works and ultimately the “incentive for human beings to create things the old-fashioned way”. Furthermore, generative AI providers do not appear to succeed in developing a positive image similar to that of, say, the early Google (“Don’t be evil”) or Facebook. OpenAI’s release of the Sora 2 video model, for example, does not give the impression that the company is making an honest effort to protect the (copy-)rights of third parties. At the same time, the social benefits of such video generators appear to be limited (“The infinite slop machine”). In addition, no monopolist has emerged from among the model providers yet, limiting the general consequences of a decision against a particular provider – as the court also recognises (para. 228).
The decision of the Munich Regional Court marks a consequential yet preliminary point of orientation. Now, we must wait to see whether other courts will pick up this line of reasoning – or whether it will prove to be an outlier in retrospect. In particular, it remains to be seen whether the court’s abductive reasoning – the “duck test” for AI models – will in fact prevail.



