03 June 2024

Deepfakes, the Weaponisation of AI Against Women and Possible Solutions

In January 2024, social media platforms were flooded with intimate images of pop icon Taylor Swift, quickly reaching millions of users. However, the abusive content was not real; they were deepfakes – synthetic media generated by artificial intelligence (AI) to depict a person’s likeness. But the threat goes beyond celebrities. Virtually anyone (with women being disproportionately targeted) can be a victim of non-consensual intimate deepfakes (NCID) – an abusive type of content commonly and mistakenly known as “deepfake porn”. The ease of access to AI tools like ClothOff fuels the rise of NCID takes on new and harmful dimensions when content is shared online and “goes viral” and requires specific legal and policy responses from various players involved in the creation and dissemination of deepfakes.

While efforts to combat misinformation deepfakes with proposals of watermarking and labelling synthetic content ongoing, NCID is a specific type of harmful deepfake that presents unique harms and demand altogether different measures by social media platforms and AI companies. Albeit most agree that companies must be held accountable for disseminating potentially extremely harmful content like NCIDs, effective legal responsibility mechanisms remain elusive. This is particularly worrisome as phenomena like NCID highlight a blank spot in much of the debates on content moderation, AI governance and the fight against misinformation: misogyny. Most AI-generated image-based sexual abuse targets women. Unfortunately, most platform regulations, but especially the UK Online Safety Act, offer little remedy for such gender-specific online harms. Therefore, this article proposes concrete changes to content moderation rules and enhanced liability for AI providers that enable such abusive content in the first place.

What are harms of non-consensual intimate deepfakes?

First a few words on non-consensual intimate deepfakes (short: NCIDs). The harms of NCIDs are no different from the well-established harms of non-synthetic image-based sexual abuse, a pervasive issue that existed long before AI. The fact that NCIDs depict synthetic imagery (in contrast to actual photos) is irrelevant. Individuals are still subject to the same or comparable violations of privacy, dignity, sexual expression, and mental and physical well-being – including high levels of stress, anxiety, depression, low self-esteem and insecurity. NCID can also cause the social, collective harms associated with other forms of image-based sexual abuse, including the risks of normalising non-consensual sexual activity and contributing to a culture that accepts rather than reprimands creating and/or distributing private sexual images without consent. When women politicians are targeted, there is the additional harm of incentivising women not to run for public office.

Indeed, like other forms of image-based abuse, NCID disproportionately targets women. An industry report based on the analysis of 14,678 deepfake online videos indicates that 96% of them were non-consensual intimate content and that 100% of examined content on the top five ‘deepfake pornography websites’ were targeting women. This disproportionate targeting raises concerns for other vulnerable groups as well. Evidence shows that online hate and harassment is a particularly pervasive and growing plight for LGBTQ+ people, suggesting they could be especially vulnerable to NCID as well.

While the harms of NCID are not necessarily new, measures developed in response to non-synthetic image-based abuse might not be sufficient to tackle this new phenomenon. Given the wide availability of user-friendly interfaces that make it possible for almost anyone to produce synthetic harmful content and virtually anyone a potential target – with no need for perpetrators to get their hands on any form of real intimate content. This matched with the ease with which such content can be shared and reshared means that more effective prevention and mitigation measures are required from internet platforms and AI companies.

The dissemination of NCID: platforms’ ambiguous content policies and inefficient enforcement mechanisms

While online safety legislation in some jurisdictions – such as in the UK and in the EU– is making progress towards individual accountability, criminal liability alone is insufficient. Platforms need to implement effective measures in response to this. As I have argued, NCID causes the same harms as abusive images created without AI, and the distinction between real and synthetic content becomes irrelevant when considering the harm inflicted. In essence, NCID simply rehashes longstanding content moderation challenges. Yet, surprisingly, an analysis of the content policies of the three major social media platforms reveals that they are inadequate to tackle NCID.

Unlike their clear and comprehensive policies for child sexual abuse material (CSAM), which clearly encompass artificial content, current adult content policies are ambiguous. This ambiguity creates loopholes, which again undermines effectively protecting victim-survivors. For example, TikTok’s “Safety and Civility Policy” prohibits language or behaviour that “harasses, humiliates, threatens, or doxxes anyone”. This could conceivably encompass the use of NCID for harassment, but the policy does not explicitly mention harassment via intimate content or artificially generated content, leading to ambiguity in its application.

To effectively counter NCID, changes in platforms’ policies are required in two fronts. First, they should revise their policies to include an explicit prohibition on all forms of image-based sexual abuse. Specifically, platforms should prohibit the posting of unwanted intimate media depicting the likeness of an individual, whether real or synthetic. Second, all existing platforms’ policies should be revised to ensure that they clearly also apply to AI-generated or manipulated content. This is not to say that platforms should have standalone policies solely focused on synthetic content. Rather, platforms need strong, unambiguous substantive policies that apply to content regardless of origin or creation method. This approach eliminates loopholes based on content type and avoids the difficult task of consistently differentiating real from synthetic content.

Meta’s recently announced approach offers a promising model. The company has committed to “remove content, regardless of whether it is created by AI or a person” if it violates any of its existing policies, including on bullying and harassment, and on violence and incitement. Additionally, for content that doesn’t violate existing policies, Meta plans to label AI-generated video, audio, and images based on either automated detection or user self-disclosure. This is a welcome change, replacing the company’s previous standalone policy on “Manipulated Media”, which was heavily criticised for being overly narrow.

Alongside policy changes, robust reporting mechanisms and swift platform responses are essential – the bread and butter of effective content moderation. Violative NCID content must be accurately identified and removed, potentially with priority channels for reporting and review offered to victim-survivors. This is especially important as AI tools are making it easier and faster to create NCID, meaning existing systems are likely inadequate to keep up with the increase in the volume of non-consensual intimate content.

Recent examples suggest they have indeed been falling short. Twitter’s delayed response to the widely disseminated NCID targeting Taylor Swift highlights these limitations. The cases recently selected by the Oversight Board for review further illustrate the concerns. Both cases involved AI-generated nude images of public figures – one in India and another in the US.  Meta’s response in the first case, where reported content was automatically closed after 48 hours without review, leaving the content available, exemplifies the limitations of reactive tools in offering victim protection. While the removal of the second case demonstrates a better response, it exposes inconsistencies across cases. The differences in Meta’s responses also raise concerns about platforms’ resource allocation and enforcement capacity, particularly between Global North and Global South jurisdictions.

The creation of NCID: AI firms and the lack of appropriate safeguards

While improving the ways online platforms prevent the dissemination of NCID is key, countering its distribution suggests the need for a complementary approach: addressing the harm at its source – the content creation stage – by banning the creation of all forms of intimate deepfakes.

Relying solely on voluntary commitments by AI providers to implement this ban would be insufficient. Current definitions within the online safety legislation such as the UK Online Safety Act and the EU Digital Services Act likely exclude generative AI tools from scope. This creates a gap that legislation can, and should, bridge to effectively address NCID. Crucially, developers of generative AI and those who make these tools available to the public should be legally required to actively reduce the risk of harm by preventing NCID generation by their models from the outset. That is, AI regulations should mandate robust safety measures from developers and distributors of generative AI tools.

These measures should, at the very least, mirror the private content governance systems employed by major social media platforms. This is critical because current usage policies for some major tools are currently significantly less detailed than those of social media platforms. OpenAI, for example, merely asks users to avoid compromising “the privacy of others”, engaging in “regulated activity without complying with applicable regulations’” or promoting/engaging in “any illegal activity”. In addition, there are no details on how such policies are enforced. Vague policies are unlikely to offer robust protections or effective mechanisms of redress.

In addition, regulation should promote “safety by design”, meaning AI tools should be built with mechanisms to prevent content violating their terms of service, which should include the production of intimate deepfakes. While using technology to detect synthetic image abuse could lead to an arms race of technologies, the severity of NCID challenges requires a legal requirement for the industry to prioritise safety alongside innovation.

However, these issues are currently absent from ongoing debates surrounding AI regulation. In the EU, the EU AI Act requires the labelling of deepfakes and introduces minimum standards for foundational models, but omits any content moderation requirements for generative AI tools. Similarly, in the UK, the government’s response to the AI white paper currently lacks specific legal safeguards to tackle the creation of NCID. On deepfakes, the report primarily focuses on ‘AI-related risks to trust in information’, proposing a call for evidence on this type of risks. While it acknowledges image-based sexual abuse, it only explains how existing issues are handled under the Online Safety Act, which offers limited protection for victim-survivors. Notably absent from the report is any mention of further measures to be included in future AI regulation or a call for evidence specifically addressing the risks posed by NCID.

A significant challenge lies with open-source products, where the open-source nature allows dissemination, download, and deployment without the model developer’s knowledge or approval (for example, on GitHub repositories). This makes it difficult to establish clear lines of responsibility for implementing content moderation systems and requires the governments to build expertise to ensure the safety of these market’s models.

Countering NCID

In conclusion, non-consensual intimate deepfakes pose a real and growing threat. NCID as a new form of image-based sexual abuse causes both individual and society-wide harms and as such require effective responses from corporate actors involved in the distribution and creation of this content. While changes in criminal law can make it easier to hold individual perpetrators liable, improvements are necessary across the regulatory landscape to require corporate actors – especially platforms and AI firms – to do more.

From the perspective of content dissemination, NCID reignites discussions around the importance of robust systems of content moderation. Platforms should be incentivised to put in place specific policies concerning all forms of image-based sexual abuse (applicable to content that is NCID but also to non-synthetic content). More broadly, all existing platforms’ policies (including on bullying and harassment) should unambiguously offer symmetric treatment for real and synthetic or manipulated content across the board – synthetic harmful content should be subject to exactly the same rules that apply to non-synthetic harmful content. In addition, to effectively empower and protect NCID victim-survivors, online safety regulators should champion the development of robust enforcement and redress mechanisms. This could include, for example, the offer of priority channels for content reporting and review.

Moreover, the ease with which NCID can be disseminated suggest the need of a complementary approach: addressing the harm at its source – the content creation stage. Given that AI regulation is a high priority many governments, future legislation should include provisions to require generative AI tools to put in place and enforce safeguards to prevent the creation of synthetic or manipulated intimate content.