14 August 2024

In the Shadows

Data Brokers and the Limits of the GDPR

Recent investigations by Netzpolitik and the German public service broadcaster Bayerischer Rundfunk into the company Datarade have shed light on a part of the digital economy that has so far operated mainly in the background: data trading. The key players in this sector are data brokers, whose business model is to trade in (non-)personal data. Data trading is a multi-billion-dollar component of the global digital economy and not a new phenomenon. However, its fundamental problems have recently, and rightly, come to the fore. The business model of most data brokers undermines data protection, the privacy and autonomy of individuals, and creates structural and ethical problems for democratic societies. This article outlines the legal implications of data trading in the context of the GDPR, the DSA and the AI Act.

Key Players

In general, data trading refers to the commercial exchange of personal and non-personal data for money, goods, or products. Data brokers are companies that earn their principal revenue by supplying data, particularly about individuals, with this information primarily sourced from entities other than the data subjects themselves. Data brokers obtain data from a variety of sources: they purchase it from other traders, extract it themselves using web crawlers, or source it from publicly available information. The subsequent sales represent secondary uses of data and do not involve the data subjects. Since consumers are affected but not directly involved, many people are unaware of the existence of data brokers. Well-known companies like Google or Meta do not fit the core definition of data brokers. Although search engines and social networks also sell data, they mostly collect it directly from their users. In contrast, data brokers operate more indirectly as intermediaries; they do not have users themselves, but customers who purchase data rather than generate it.

Nevertheless, the previously clear understanding of data brokers (such as address brokers like national postal services) is becoming increasingly complex. Many companies nowadays not only trade in data but also offer a range of other services. For example, Experian, one of the best-known data brokers and credit agencies, also provides ‘Big Data solutions.’ The ongoing demand for fresh data is rising due to the proliferation of highly data-intensive technologies such as Large Language Models (LLMs). Following this, there is already speculation about the future of the data trading market once the sources of effectively public data are exhausted.

The data ecosystem

Data brokers operate at the intersection of various stakeholders and actors within the data ecosystem, serving as largely invisible intermediaries. The brokers themselves are typically indifferent to who uses the data they sell or for what purposes, like selling data collected for marketing purposes to credit agencies that use it for credit scoring.

Data trading is a thriving sector of the global digital economy and just one variety of the ongoing Datafication of all aspects of life. Consequently, data brokers are deeply embedded in the structures of the global data economy, which heavily relies on online tracking in the context of digital communication technologies. Regardless whether data is legally classified as property, there is no denying that data functions as a significant economic asset. All globally successful digital private companies engage in aggressive data extraction from their users. Especially the business model of centrally operated social networks is based on financing through online advertising. As the ECJ rightly acknowledged, the specific content of apps and websites is of secondary importance; the primary goal is to keep users engaged on the platform for as long as possible to extract and generate personal data. This objective is further optimized through real-time-bidding, where advertisers participate in automated auctions for the opportunity to target specific internet users with ads.

Problems of data trading

Despite the opaque and difficult-to-track data flows and business models, legally and ethically problematic transactions have repeatedly come public. For example, a U.S. company with ties to the government sold location data from areas surrounding abortion clinics; the company LiveRamp created a ‘private population registry’ based on categories such as ‘depression’ and ‘breast cancer’; Datastream offered journalists 3.6 billion location data points from Germany—free of charge and as an incentive to sign up for a subscription.

Above all, data trading catalyses the massive informational power asymmetry in the digital sphere: on the one hand, certain companies are able to accumulate vast amounts of data and exclusively use it for their own economic purposes. Consumers, on the other hand, cannot commercially use their own data in a comparable way, nor does the vast majority know who is selling their data and for what purposes. Unregulated data trading exacerbates these problems through opaque transaction chains that obscure responsibilities, undermine the purpose limitation of data, and jeopardize data security.

The risk of abuse from the concentration of extremely large datasets about entire populations in the hands of individual actors – whether state or private – is obvious. Combined with predictive analytics, every detail can be used to group individuals and assign them characteristics they may not even know themselves. Whether these inferences are accurate or not, they result in a significant loss of control over self-representation and an infringement with individual autonomy. Discriminatory practices are well documented, and the targeted spread of misinformation to selected groups of people poisons public discourse. Additionally, the discussion about the threat to national security is increasing. Unsurprisingly, these vast data troves enable predictions and inferences about almost any individual, including employees of governments, intelligence agencies, or other security-relevant state authorities.

Data protection law and its enforcement deficit

Based on what is known about the practices of many data brokers, much of what is common practice today appears to be simply unlawful. This applies in particular to consent, legal bases for data processing and purpose limitation. Furthermore, the compatibility with fundamental principles of data protection law, such as data minimisation, is also highly questionable.

The GDPR applies to the processing of personal data. Even the ‘gifting’ of data, as in the case of the Datarade investigation, constitutes a legal relevant data processing. Under the GDPR, personal data is defined as any information relating to an identified or identifiable natural person. In the vast majority of cases, data brokers process personal data as it is particularly valuable for profiling, advertising and prediction purposes. Concerning the classification as personal data under Article 4(1) of the GDPR, it does not matter whether the attributions, such as the assignment to a specific age group, sexual orientation, or predisposition to certain diseases, are factually correct or not.

The location data of millions of individuals which has been transmitted in the Datarade investigation are personal data under Article 4 no.1 of the GDPR, which states: ‘an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, […].’ (as also confirmed in the TC-String ruling by the ECJ, Case C-604/22). Correspondingly, the investigation also showed that individuals could be identified based on their movement patterns.

The GDPR requires a legal basis for each processing of personal data, Article 6 (1). This legal basis, as well as any additional requirements under Article 9(2) of the GDPR for the processing of special categories of personal data, is questionable in many cases of data trading. The responsibility for providing evidence and justification lies with the respective data broker under Article 5(2) of the GDPR.

Consent as a legal basis for data trading under Article 6(1)(a) and 7 of the GDPR is not obtained in practice, as the data subjects involved are not part of the transactions. The consent given to the controller who directly collects the data from the data subjects, such as website operators, does not extend to the unlimited further sale of this data to third parties who are not identifiable at the time of collection (for fundamental issues regarding consent in the online context, see here).

As a consequence, the remaining legal basis is legitimate interest under Article 6(1)(f) of the GDPR. According to this provision, the processing must be necessary to protect the legitimate interests of the controller, and the interests of the data subject must not override these. The required balancing of interests must consider, among other things, the economic interest in the processing, the reasonable expectations of the data subject, and the principles of data protection.

The regulatory structure of the GDPR and the case law of the ECJ contradict the assumption that data subjects are left unprotected simply because ubiquitous data protection violations occur in the online world. Rather, data subjects do not have to expect that their data, collected elsewhere, will be resold to an unmanageable number of third parties for unknown purposes. Although data made public by the data subjects themselves is less protected, the profiles and inferences created in particular were not made public by the data subjects themselves. This leads to the general problem of the GDPR and AI, as it becomes practically impossible to distinguish between special categories of personal data under Article 9(1) of the GDPR when predictive models can derive attributions of any kind from non-sensitive data (Article 9(1) GDPR refers to ‘revealing’).

Likewise, the practices of data trading raise serious concerns regarding the purpose limitation principle of Article 5(1)(b) of the GDPR, which is effectively undermined by the repeated sale or further processing of data. With respect to address trading, various German data protection authorities seem to have the opinion that it can no longer be justified based on legitimate interest. The extreme opacity of the data broker industry, combined with the structural issues of data protection law with data-driven technologies, leads to significant enforcement problems, as too little is known about the involved actors and trading flows and relations.

The regulatory gap of data market places

The Datarade case is not only politically sensitive from a German point of view because the company received investments from the ‘High-Tech Gründerfonds‘, which is more than 50% funded by the Federal Ministry for Economic Affairs and Climate Action, but also reveals a regulatory gap that goes beyond the enforcement deficits of GDPR violations. The concerned company Datarade claims that it does not process the traded data itself but acts only as an intermediary between two parties who wish to engage in a data trading transaction. As a result, it actions fall out of the scope of the GDPR. Thus, Datarade functions as a data marketplace where transactions can be initiated and, like many other platforms, is not held responsible for the specific content. This highlights, on the one hand, the need to regulate infrastructures rather than just individual data processing activities, and on the other hand, the challenges in implementing such regulation concretely. Providing marketplaces is legitimate and it is inherent in the nature of intermediaries that they cannot control every single piece of content or transaction. Nevertheless, there is a general lack of basic obligations for data marketplaces to address obvious or structural legal violations.

An interesting question is whether data market places fall under the Digital Services Act (DSA) and are therefore subject to certain obligations for intermediaries. Regardless of whether data trading platforms can be considered intermediary services under Article 3(g) of the DSA, the obligations of the DSA primarily aim at content moderation to protect consumers, rather than at mediating transactions in B2B relationships. Additionally, there is no general obligation to monitor illegal content under Article 8 of the DSA.

The AI Act unsurprisingly does not address the issue of data trading but instead only regulates AI systems from the point of market entry (with the exception of the provisions on AI regulatory sandboxes in Article 57 et seq.) and not the acquisition of the necessary training data beforehand. Nevertheless, it is important to note that the AI Act defines high-risk use contexts in Annex III, which are intended to prevent discrimination risks, such as credit scoring or the allocation of public services. However, the quality requirements for the output of AI systems do not resolve the upstream issues of data trading concerning privacy and data protection

The California way?

In California, the ‘Data Broker Registration Law‘ imposes extensive obligations on data brokers. Data brokers are defined as ‘a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship.’ These companies must register with the California Privacy Protection Agency, and a one-stop mechanism has been introduced, allowing consumers to request the deletion of all their personal data with a single request.

The requirement for data brokers to register is the first step toward creating sufficient information and transparency, which in turn facilitates enforcement in the next step. It would also clarify issues of delineation. From the consumer’s perspective, enforcement is significantly simplified by the one-stop mechanism, relieving them from the burdensome task of enforcing individual rights under the GDPR, which can be particularly challenging due to lack of awareness and rational apathy.

Conclusion

In the context of Datarade, there are now calls for more regulation of the data market. This approach seems promising, but it is also very complex to implement. Beyond the creation of new laws, political will is necessary to invest more resources in enforcement structures (see the Irish Data Protection Authority). The legal and policy debate also has a discourse-analytical dimension: the unprecedented lobbying surrounding the AI Act has clearly shown that data-intensive digital technologies continue to be too uncritically equated with innovation—an innovation that is expected to drive progress, prosperity, and growth, and without which Western democracies are perceived to risk falling behind. However, these are not merely technical developments that benefit society as a whole, but rather socio-technical power shifts that are inherently tied to the business models of the actors involved. No legitimate economic interest justifies the immense data power concentrated in the hands of a few global companies that structurally evade regulatory oversight. The narrative of a thoroughly digitalised data society is defined by certain technologies, such as generative AI, and the actors behind them, leading to a near-global dependence on the products and infrastructures of individual companies, as the CrowdStrike fiasco demonstrated a few weeks ago. A critical and informed debate is therefore also needed regarding data trading, focusing on how to design a data economy that balances private and public, democratic and communal interests.