17 November 2025

Lawful Access as a Gatekeeper for TDM in the EU

An Analysis of the Scope of the Lawful Access Requirement under EU Copyright Exceptions

In today’s digital research environment, knowledge is increasingly extracted not through traditional reading, but through computational methods capable of processing massive amounts of text and data. Text and Data Mining (TDM) has become indispensable across disciplines: from medicine, where mining scientific articles can reveal patterns for new drug discoveries, to the humanities, where algorithms explore centuries of literature at once. Recognising this transformative potential, the EU legislator embedded mandatory TDM exceptions into its 2019 Copyright in the Digital Single Market (CDSM) Directive.

Yet, like many areas of copyright law, this enabling measure is hedged with conditions. Chief among them is the requirement that TDM can only be carried out on works to which researchers have “lawful access”. At first glance, the condition may seem straightforward – surely researchers are either permitted to access content, or they are not. But the concept of lawfulness in relation to the copyright user’s acts is anything but clear under EU copyright law.

This lack of clarity matters. For research organisations and cultural heritage institutions, lawful access is not a technical detail but a gateway requirement: it determines the scope of what can legally be mined. If interpreted too narrowly, the exception risks being hollow, leaving academic institutions encumbered by contractual gatekeeping, legal uncertainty, and disproportionate compliance burdens. If interpreted too broadly, it risks clashing with established copyright principles.

This post examines the origins of the “lawfulness” condition in EU copyright law, its manifestation in the CDSM Directive’s TDM framework, and the potential difficulties it creates for research institutions. Ultimately, it argues that only a more coherent and flexible understanding of “lawfulness” can safeguard the promise of TDM for European science and innovation.

The origins of the lawfulness criterion in EU copyright law

Variations in terminology

The EU copyright acquis reveals a proliferation of terms relating to lawful use. Directive 2009/24 on the legal protection of computer programs refers to the “lawful acquirer of a computer program.” Directive 96/9 on the legal protection of databases speaks of the “lawful user of a database”.

Meanwhile, Article 5(1) of the InfoSoc Directive (2001/29) introduces the concept of “lawful use” in the context of the temporary copy exception. Recital 33 of the InfoSoc Directive defines “lawful use” as any use authorised by the rightsholder or not restricted by law. Under this reading, lawfulness should be understood as a flexible condition, tied either to the rightsholder’s consent (via contract, licence, or implied authorisation) or to a statutory authorisation (an exception or limitation).

Expansion through CJEU jurisprudence

The Court of Justice of the European Union (CJEU) has gradually expanded the scope of lawfulness beyond legislative texts. In ACI Adam (C-435/12), the Court held that the private copying exception applies only to reproductions made from a lawful source. The unlawfulness of the source – not the user’s knowledge – was decisive. This reasoning was reaffirmed in Vereniging Openbare Bibliotheken v. Stichting Leenrecht (C-174/15), where the Court ruled that public lending exceptions cannot extend to digital copies obtained from unlawful sources.

The Copydan case (C-463/12) further elaborated the notion of a lawful source, emphasising the rightsholder’s consent as the central criterion. This was arguably a restrictive interpretation compared to Recital 33 of the InfoSoc Directive, which suggested a broader scope of lawful use.

Taken together, this jurisprudence established a strong linkage between the condition of lawfulness and the legitimacy of the source or access point.

Lawful access as a criterion for TDM in the EU

Article 3 of the CDSM Directive introduces a mandatory exception allowing research organisations and cultural heritage institutions to carry out TDM, provided they have “lawful access” to the content. Yet, the Directive itself does not define lawful access. Recital 14 offers interpretative guidance, outlining three categories:

  1. Access through the rightsholder’s consent – for example, open access policies, subscriptions, or contractual arrangements with research organisations.
  2. Other lawful means – a broad, catch-all category echoing the “not restricted by law” formula in Recital 33 of the InfoSoc Directive.
  3. Freely available online content – a novel addition, extending lawful access to works available online without technical restrictions, seemingly on the assumption of an “objectified consent” by the rightsholder.

The concept of lawful access bears clear resemblance to earlier constructs. It shares with “lawful user” the emphasis on access based on authorisation or legal entitlement. At the same time, it evokes the “lawful source” doctrine of ACI Adam and Copydan. However, the inclusion of freely available online content as a lawful basis marks a potential departure. Whereas the “lawful source” doctrine required affirmative rightsholder consent, the lawful access standard seems to presume authorisation when works are freely available, unless explicitly restricted.

This is a conceptual evolution that reflects the CJEU’s reasoning in Svensson (C-466/12) and VG Bild-Kunst (C-392/19), which recognised that making works freely available online without technical restrictions amounts to authorising access for the general public.

The problems for academic and research institutions

Ambiguity of “freely available online”

For researchers, the most relevant and potentially problematic element of lawful access is the reference to freely available online content. A literal reading would allow TDM of any work available online without paywalls or technological restrictions. However, this interpretation conflicts with the “lawful source” CJEU’s case law. If a database of copyright-protected e-books is made freely accessible on the dark web, it is hardly a lawful source, even if no technical restrictions exist.

This tension creates uncertainty for research institutions. Must they verify the provenance of all content, even if freely accessible? If so, how far must due diligence go? Without clarification, researchers risk exposure to liability if they unknowingly mine from unlawful sources.

Contractual gatekeeping

Another challenge lies in the role of contractual arrangements. Many publishers condition access through restrictive licences, potentially excluding TDM uses. While Article 7 of the CDSM Directive renders contractual clauses contrary to Article 3 unenforceable, this safeguard applies only to this exception. Article 4 of the CDSM Directive extends the TDM exception beyond non-commercial research, covering all users including commercial actors, but introduces a significant limitation – rightsholders may expressly reserve their works from being mined by “opting out”. The CDSM Directive under Recital 18 envisages such opt-outs being made in an appropriate manner, such as “machine-readable means including metadata and terms and conditions of a website or a service”, and once invoked removes the possibility of relying on the TDM exception. This added criterion significantly narrows the scope of lawful access in commercial contexts.

This dynamic places academic institutions in a vulnerable position. They may hold expensive subscriptions but still face contractual ambiguity over their TDM rights. Smaller research organisations or those in less wealthy jurisdictions are disproportionately affected, as their bargaining power with publishers is minimal.

The user’s knowledge question

CJEU jurisprudence in ACI Adam emphasised that lawfulness depends on the source, not the user’s knowledge. While this ensures a uniform standard, it also creates practical difficulties. Researchers cannot always ascertain whether content was uploaded with rights-holder consent. Open repositories, for example, may contain infringing uploads. If mining such content is later deemed unlawful, the institution could be at risk despite acting in good faith.

TDM in the context of AI

The tension has become even more acute in the context of AI training datasets. The Second General-Purpose AI Code of Practice requires that providers of generative AI models respect copyright, making “reasonable and proportionate efforts to ensure” that their training data has been obtained through “lawful access”. At first glance, this stipulation seems straightforward – datasets compiled from open-access works, appropriately licensed databases, or works which are freely available online and not subject to contractual restrictions would seem to comply. However, most large-scale AI models are trained on data gathered through web scraping, which inevitably ingests unauthorised works. Consequently, relying on such datasets may fall outside the scope of the EU’s TDM exceptions.

CJEU rulings reinforce this restrictive interpretation. In Svensson (C-466/12), the court held that hyperlinking is only lawful if the original content was posted with the rightsholder’s consent. Hence, AI training datasets built through indiscriminate web scraping may include works which are publicly accessible, but not “lawfully accessed”, pushing their use outside the protective scope of EU TDM exceptions.

The recent German case of Kneschke v LAION (Hamburg Regional Court, 27 September 2024) exemplifies these tensions. The Court held that building a dataset of images was permitted under the German equivalent of Article 3 of the CDSM Directive, accepting dataset creation as a preparatory research step, regardless of downstream commercial use. However, concerns were raised over commercial exploitation under Article 4, suggesting that rightsholder opt-outs may further narrow the scope of “lawful access”.

Arguably, rightsholders’ ability to opt-out, made possible by Article 4 of the CDSM Directive, transforms lawful access into a two-step requirement – accessing the work legitimately but also ensuring that no opt-out has been applied. This creates new layers of uncertainty for researchers and AI developers working with large-scale datasets. As per the European Copyright Society, “publishers might price TDM into their subscription fees”, translating to an increased disparity between well-funded institutions that are able to absorb these costs and less-resourced institutions that may be priced out of data access, consequently deepening existing inequalities in research and innovation.

Taking a synoptic view, this recent ruling has the potential to provide a narrow safe harbour for non-commercial TDM, whilst leaving commercial datasets and AI training in a state of uncertainty. Instead of providing clarity, the lawful access requirement continues to operate as a significant barrier within the context of AI, where datasets are built at large scales from uncontrolled online sources.

Towards a coherent concept of lawfulness in European copyright law

The proliferation of terms – lawful user, lawful use, lawful source, lawful access – suggests the need for a more coherent and autonomous EU concept. Currently, the inconsistent terminology risks producing fragmentation, uncertainty, and over-restriction of exceptions that were designed to promote research and innovation.

Certainly, given the importance of TDM research activities, the concept of lawful access as a condition for the application of the TDM research exception should be interpreted flexibly.  That means that the CJEU’s heavy findings on lawfulness cannot be applied verbatim. On the other hand, there will also certainly be cases of obvious unlawfulness that even the noble and meaningful objectives of enabling research and promoting innovation cannot justify.

In that context, a balanced approach could be that, in principle, freely available mining content should by default be considered as lawful, while mining of freely accesible, but obviously illegal content should not be lawful.

Conclusion

The lawful access requirement in the CDSM Directive represents both continuity and innovation in EU copyright law. It continues the tradition of conditioning exceptions on lawfulness but extends the concept by recognising freely available online content as a potential lawful basis. While this development could facilitate TDM for research, it also creates uncertainty, particularly when contrasted with the CJEU’s stricter “lawful source” jurisprudence.

For academic and research institutions, the stakes are high. The ambiguity surrounding lawful access risks chilling legitimate TDM activities, especially in an era where AI and data-driven research depend on large-scale mining. The solution lies in clarifying and consolidating the lawfulness criterion into a coherent and flexible concept, ensuring that copyright law supports rather than hinders scientific progress.


SUGGESTED CITATION  Synodinou, Tatiana-Eleni; Vrakas, Giorgos: Lawful Access as a Gatekeeper for TDM in the EU: An Analysis of the Scope of the Lawful Access Requirement under EU Copyright Exceptions, VerfBlog, 2025/11/17, https://verfassungsblog.de/lawful-access-gatekeeper/.

Leave A Comment

WRITE A COMMENT

1. We welcome your comments but you do so as our guest. Please note that we will exercise our property rights to make sure that Verfassungsblog remains a safe and attractive place for everyone. Your comment will not appear immediately but will be moderated by us. Just as with posts, we make a choice. That means not all submitted comments will be published.

2. We expect comments to be matter-of-fact, on-topic and free of sarcasm, innuendo and ad personam arguments.

3. Racist, sexist and otherwise discriminatory comments will not be published.

4. Comments under pseudonym are allowed but a valid email address is obligatory. The use of more than one pseudonym is not allowed.




Explore posts related to this:
CDSM Directive, Copyright, European Union, Freedom of research, InfoSoc Directive, Text and Data Mining


Other posts about this region:
Europa