23 February 2024

A2D for Researchers in Digital Platforms

Bridging the Transatlantic Divide

Over the past decade, access to data (A2D) in digital platforms has emerged as a significant challenge within the research community. Researchers seeking to explore data hosted on these platforms encounter growing obstacles. Public policy concerning such access must navigate through conflicting interests involving various stakeholders, including platforms, its users, competitors, the scientific community, and the public at large. While legal policies in the US have generally focused on establishing safeguards for researchers against the restrictions on access imposed by private ordering, the recent EU Digital Service Act (DSA) introduces a legal framework, which enables researchers to compel platforms to provide data access. These complementary legal strategies may prove instrumental in facilitating A2D for research purposes.

A2D in Digital Platforms

Data constitutes the fundamental business asset of digital platforms. These platforms collect data on users’ online behaviour and generate income by utilizing these profiles for targeted advertising, as well as for creating additional data-driven products and services. Platforms have worries about the potential disclosure of sensitive data, which could breach users’ privacy. Data leaks may also trigger legal liability and could also damage platform’s public reputation.

At the same time, however, strong public interests advocate for ensuing A2D for scientific purposes. Platforms often provide a unique access point to data, which can be indispensable for basic research. For example, it may be essential for detecting early indicators of imminent natural disasters, identifying markers for infectious disease outbreaks, or developing new research methodologies employing Artificial Intelligence. A2D in digital platforms also plays a critical role in exploring the digital transformation. As societal, economic, and political activities migrate to digital spaces, A2D becomes imperative for mapping and analyzing the social implications of this shift. This includes investigating issues such as discrimination in labor markets, bias in short term rentals, or the impact of political advertising on elections.

Furthermore, as platforms continue to grow in dominance and significance, infiltrating the social, economic and political arenas, there is a stronger imperative to bolster their accountability by advancing transparency and oversight. Enabling independent scientific research into these issues by granting scientists access to platform data can provide unbiased evidence to guide public oversight and complement investigative efforts undertaken by public authorities.

Occasionally, digital platforms have chosen to voluntarily share data with academic researchers. For example, recent papers published in Science and in Nature saw 17 researchers collaborating with Meta, concluding that there was no evidence of social media platforms, like Facebook and Instagram, polarizing voters during the 2020 US Elections. However, concerns have been raised by some scholars that these findings may have been influenced by Meta’s involvement in the research collaboration and could align with its business interests.

Ensuring A2D for independent researchers who are not affiliated with these platforms, has the potential to diversify the research agenda. It can foster studies driven by pure scientific curiosity and intellectual freedom, rather than profits-driven motives. Moreover, it can empower researchers to challenge conclusions drawn in other studies based through independent data analysis. Overall, safeguarding A2D for research is of utmost importance in preserving the social and political role of academic research as an unbiased and independent source of reliable knowledge.

Private Ordering and its Limits

Despite its significant public implications, decisions regarding whether to permit A2D have so far rested solely with digital platforms. As users’ content, personal data and activities predominantly occur on their facilities, platforms possess the capability to technically block data access. Platforms have exercised their physical control over users’ data, to prevent researchers from conducting studies. Instances such as the Cambridge Analytica scandal, where personal data of millions of Facebook users was misused, led platforms like Facebook and Instagram to block Application Programming Interface (API) access. More recently, X (formally Twitter) announced its decision to restrict free API access for research purposes. Additionally, platforms have prevented on multiple occasions, the scrapping of publicly available data, and obstructed other efforts to explore their operation from the outside. One notable example is the NYU Ad Observatory, which was established to analyze political advertisements on social media. Through a browser extension (‘Ad Observer’), users were able to donate ad data scraped from Facebook to the Observatory, helping to verify and supplement some missing data in Facebook’s own Ad Library. However, in August 2021 Facebook suspended the accounts of researchers involved in this initiative.

Platforms also employ contractual claims as a means to restrict undesired research activities. For instance, the X Corp. has recently filed a lawsuit against the Center for Countering Digital Hate (CCDH), a non-profit organization that conducted research on the dissemination of hateful content on social media. X alleged that CCDH had intentionally and unlawfully scraped data from Twitter, thereby violating its terms of service (ToS). TikTok has taken a more stringent approach by imposing additional contractual requirements in its Research API ToS, requiring academics to provide advance notice of their forthcoming research, subject their work to pre-publication review, and delete certain data once it has been used.

US and EU Legal Strategies Compared – the Shield and the Sword

While self-help measures aimed at restricting A2D often serve the legitimate interests of platforms, policymakers must also ensure proper access to platform data for independent scientific purposes. Striking this balance presents a significant challenge.

The U.S and Europe have adopted distinct legal approaches to address this challenge. In the U.S., the emphasis has been on defensive strategies designed to protect researchers from liability stemming from breach of contract and potential criminal liability related to the unauthorized scraping of platform data. In contrast, Europe has recently established a proactive framework, grating researchers a legal right to acquire data that is essential for their research endeavours. These strategies are further discussed below.

Research Shield: The US Approach

Platforms ToS typically impose restrictions on unauthorized data collection, including for research purposes. This exposes researchers to the risk of civil liability for breaching contractual agreements. Moreover, under US law, unauthorized access to platforms’ computational services, allegedly may trigger criminal liability under the US Criminal Fraud and Abuse Act (CFAA). These risks can significantly deter independent research conducted on platforms.

However, recent court decisions have adopted a narrow interpretation of the CFAA, thereby reducing the risks faced by researchers who are studying platforms without prior authorization. For example, in the case of Sandvig v Barr, the DC District Court examined whether researchers investigating race and gender discrimination in employment websites violate the CFAA. The researchers planned to create multiple fake accounts, contravening the websites’ ToS, which prohibited misrepresentation. The court held that CFAA does not criminalize mere violations of ToS on consumer websites. In another case, hiQ Labs v. LinkedIn , which was a commercial legal dispute, the Ninth Circuit determined that the CFAA does not apply to the scraping of publicly available data. Accessing such data, the court held, cannot be considered ‘unauthorized’ under the CFAA.

When A2D is carried out in violation of the ToS, it may also lead to civil liability for breaching a contract, along with the associated legal remedies. However, it is worth noting that restrictive provisions on A2D may not be enforceable if they are preempted under the preemption doctrine set forth in section 301(a) of the US 1976 Copyright Act. The preemption doctrine is designed to uphold the Copyright Act’s exclusivity in governing copyright matters. It invalidates any rules that offer copyright-like protection (e.g., restrictions on reproduction) to non-copyrightable subject matters, such as unoriginal data. Back in the mid-90s, in the case of ProCD v. Zeindeberg, the Plaintiff attempted to protect uncopyrightable digitized telephone listings using a shrink-wrap license. The Court of Appeals for the 7th Cir. held that such contracts only impact the parties involved and cannot establish rights in rem equivalent to copyright. Consequently, contractual restrictions could never be preempted. Note, however, that restrictions on A2D in platforms’ ToS lack privity. They are boilerplate contracts that apply to anyone accessing the platform, Therefore, if these restrictions are deemed enforceable, they effectively create de facto rights against the world.

Arguably, restrictions on A2D for research purposes run counter to the objectives of of copyright law. These restrictions aim to prohibit the reproduction of data, a subject matter that was intentionally excluded from copyright protection to guarantee its availability for everyone to use as building blocks of additional creative works. Moreover, these limitations also appear to undermine the right to research, a right safeguarded under fair use provisions, which serves the overarching goals of copyright law – namely, fostering learning, generating new knowledge and upholding the principles of freedom of expression.

Despite extensive criticism from legal scholars regarding the ProCD narrow interpretation of the preemption doctrine, most courts have adopted this approach in the past decades and have rejected the preemption of contractual restrictions. However, in a recent decision the 2nd Cir. reaffirmed a pre-emption claim in the scraping lawsuit of Genius v. Google. The decision to deny appeal to the Supreme Court may indicate that pre-emption claims in boilerplate contracts and platform ToS might gain more traction in the future.

EU: From Shield to Sword

Responding to mounting pressure from researchers and civil society organizations advocating for greater oversight of digital platforms through independent studies, the EU has adopted a proactive approach. This approach delegates decisions regarding access to platform data to a regulatory agency, which exercises its discretion within a set of explicit objective standards. The DSA establishes an institutional framework, aiming to streamline A2D for research in the public interest while also addressing the legitimate interests of platforms and their users.

The DSA introduces a novel regulatory body, the Digital Services Coordinators (DSC, see Arts. 49 to 51), tasked, inter alia, with the management of data access authorizations. This transfer of authority shifts the decision-making power regarding A2D from profit-driven platforms to an administrative agency entrusted with upholding the public interest.

Furthermore, the DSA establishes a structured procedure for obtaining A2D for research purposes, including a filing procedure and eligibility criteria for researchers and their proposed research projects.

Most notably, the DSA obliges very large online platforms and search engines (VLOPs and VLOSEs) to provide data to “vetted researchers” (see Art. 40(4) and (8)) for the sole purpose of conducting research that contributes to the detection, identification and understanding of systemic risks in the Union, as set out pursuant to Article 34(1), and to the assessment of the adequacy, efficiency and impacts of the risk mitigation measures pursuant to Article 35.” (see Art. 40(4)). Through this obligation, the DSA effectively establishes a (limited) right to conduct academic research on systemic risk involving digital platforms in the EU. This right encompasses the ability to request data collection, using APIs, or other means of automatic extraction. It is critical for conducting research in the digital era and could have proven invaluable as exemplified in the case of the NYU research team, which was cut out from Facebook API.

Recently, the EU Commission has launched a call for evidence on the DSA related to data access for research purposes, intended to inform the implementation of Article 40 DSA. Respondents to this call have stressed the need to provide standard procedures and criteria for eligibility to vetted researchers, to establish an independent advisory body with professional expertise and to address liability for potential data breach. They also stressed the need to facilitate exploratory research and enable automated API based exploration. Based on the contributions received, the Commission is scheduled to prepare a delegated act on Article 40 to be adopted in 2024.

A way forward

Science is a global collaborative endeavor that relies on cooperative efforts, peer review, and the free exchange of information and knowledge across national boundaries and disciplines. Digital platforms where A2D is essential, also operate on a global scale. However, there exists a fundamental disparity in the legal approaches to A2D for researchers between the U.S., and the EU. This divergence has the potential to disrupt collaborative scientific initiatives and could shape where and how scientific research is conducted.

While the DSA may still have some imperfections, it marks a significant stride towards establishing a legal right for researchers to request A2D and put in place an institutional framework to facilitate the exercise of this right. The U.S. currently lacks a comparable framework, although there are several bills, such as the Platform Accountability and Transparency Act and the Digital Consumer Protection Commission Act that propose mandating digital platforms to provide certain types of data for research purposes. However, as of now, these bills have not been enacted into law.

Meanwhile, in the EU, data protection laws and more robust intellectual property protections for data may create significant barriers to unauthorized data scrapping for research purposes.

Bridging the divide between the approaches of the US and EU presents a formidable challenge, raising a multitude of complex issues, including the legitimate rights of digital platforms, freedom of contract, freedom of expression, privacy and data protection.

A potentially more effective strategy for fostering ongoing scientific collaboration could involve coordinating research initiatives that leverage the legal safeguards available for unauthorized research in the US and the right to request A2D guaranteed by the EU’s new digital strategy.