Deepfakes, deception, and distrust
Epistemic and social concerns
Between March 9 and 10 2022, thousands of netizens, as well as a number of influential journalists, and notably Bernice King, a daughter of Dr. Martin Luther King Jr. and a Christian minister herself, lambasted Prince William, the second in line to the throne of the United Kingdom, for being allegedly both a shameless racist and a “deeply offensive” ignorant.
The critical outburst was ignited after several British news outlets, having a PA Media report as source1, publicized details of the seemingly benign visit of the Duke of Cambridge and his wife, the Duchess Catherine, to the Ukrainian Cultural Centre in London amid the global shock that had been spawned by the Russian Armed Forces’ full-scale attack on Ukraine. The specific cause of the scandal was the following quote included in the PA Media report: “William, 39, said Britons were more used to seeing conflict in Africa and Asia. ‘It’s very alien to see this in Europe. We are all behind you,’ he said.”
William was denounced as a racist because he “normalised war and death in Africa and Asia,” while conveying an implicit suggestion that they were incompatible with Europe, his home continent. Besides, the evidence of his ignorance was found in the fact that, when NATO bombed Belgrade, the capital of Serbia, in 1999, he was close to the age of majority and achieving high grades in History and Geography in his final year at Eton College, one of the most prestigious high schools in the world and often referred as the “nurse of England’s statesmen.”
William was unaware of the historical magnitude of the refugee crisis triggered by the Kosovo War, something that had not been seen in European soil since World War II – what’s more, the teenage years of the well-traveled Prince, who holds a degree in Geography awarded by the University of St Andrews and the rank of Flight Lieutenant in the Royal Air Force, should have been permeated by a relentless flow of news coming not just from the Kosovan chapter of the Western Balkans Wars but also from the frontlines of all the bloody military confrontations that tore apart ex-Yugoslavia.
Remarkably, many of the accusations of racism raised against William, though not those pointing out his presumptive blatant ignorance of recent European history, were retracted few hours after their rapid dissemination online, as soon as a royal producer at ITV, a British television network, released a short video documenting part of the conversations the Duke of Cambridge had with volunteers and officials in the referred cultural center. The clip was considered enlightening because no mentions of Africa and Asia were registered in it. Richard Palmer, the only royal correspondent that covered William’s visit and, as such, who was responsible for the quote included in the PA Media report, apologized publicly and said that a “remark [William] made was misheard.”
The way in which this scandal was swiftly brought to an end in favor of William’s reputation is epistemically significant. A video recorded by what seems to be a trembling hand lacking professional gear and showing a poor visual angle of the events – the camera operator captures William’s back, not his face – was enough to settle the dispute and placate the growing moral outrage. Indeed, it was argued that the “video speaks for itself” as its evident lack of mentions of Africa and Asia demonstrated that the PA Media original report was inaccurate and William never compared these continents with Europe in terms of arms conflicts, death, and violence. Such is the epistemic authority of video and audio recordings – typically, a tremendous one.
According to the conventional wisdom, reporters’ memories might be wrong and are, consequently, not entirely trustworthy, while recordings, even amateurish ones like the footage released to rebut the accusations of racism raised against William, offer much more reliable depictions of world events. A reporter’s word is only testimonial evidence and so an easy target of epistemic suspicion, while recordings are considered to be perceptual evidence and so a really strong source of justified beliefs and knowledge – videos are series of pictures and, in words of the aesthetician Kendall Walton (1984), pictures are “transparent” because they enable literal perception. Thus, the assumed difference is thought to be stark: reporters tell one what happened; recordings allow one to see and hear what happened.
For these and other similar epistemic reasons, it is not surprising that when it came to decide which was right, the written report of William’s declarations or a video recording of the same event, folks and pundits did not hesitate for a second to choose the latter – even the author of the report was apparently forced by the standard epistemic norms governing social exchange to admit that he was wrong and “misheard” William’s comments. Admittedly, for some skeptics, it will be always possible to say that this case is not a real example of a reporter changing his beliefs in a responsible and virtuous fashion when presented with newly available empirical evidence, and that it indicates a good deal of P.R. specialists’ work to clean up a disgraceful remark, pushing a new version of the story, and shaping public perception. Still, as argued before, the scandal’s “happy ending” illustrates well the privileged epistemic status of recordings in the contemporary world: they are more reliable than written and oral testimonies.
This privileged status should not be taken for granted. Indeed, unless something major stops current technical and social trends in the generation of synthetic media by artificial intelligence, the epistemic supremacy of recordings over written and oral reports is likely to vanish soon and forever.
Furthermore, the state of machine learning techniques, the fast pace of improvement in the area, and the increasingly easy availability of software making use of this technology around the globe are elements that make it even reasonable to entertain the idea that right now, in 2022, video recordings are not obvious windows to hard facts of the real world anymore. The existence of deepfakes (extremely realistic falsified videos that are produced by means of artificial intelligence) justifies thinking that, when in conflict, recordings will not always – and perhaps not even most of the time – trump our own memories and the testimonies of other subjects.
More worryingly, deepfakes jeopardize one of the main reasons we have to avoid lying when talking about facts. As Catherine Kerner and Mathias Risse put it in a recent paper, “[u]ntil the arrival of deepfakes, videos were trusted media: they offered an “epistemic backstop” in conversation around otherwise contested testimony” (Kerner & Risse, 2021, p. 99). That is to say, one of the main reasons for not lying – what in the practice constitutes an “epistemic backstop” – about the occurrence of certain events is the “background awareness” that such event could have been recorded (Rini, 2020). In contrast, the existence of deepfakes and their potential proliferation provide the perfect alibi for occasional and regular liars as it permits them to convincingly cast doubt on a recording that shows that one is lying.
To understand better the epistemic game-changing nature of deepfakes, contrast them with bogus media nowadays labelled “shallowfakes.” In principle, shallowfakes are media manually doctored by human individuals. The videos of Nancy Pelosi, the speaker of the U.S. House of Representatives, stammering and drunk are examples of shallowfakes. The audio recording of the “last speech” of John F. Kennedy, the one he was supposed to address the afternoon he was fatally shoot in Dallas, Texas, in 1963, is a more sophisticated example of shallowfake. These media are the direct result of human work, ingenuity, and skill, and no matter how credible they are for watchers and listeners, they will always fall under the concept of shallowfake insofar as they are not created by deep learning techniques.
Conversely, and by definition, the bogus media generated by deep learning techniques fall under the concept of “deepfakes.” These counterfeits are pure synthetic visual and audio media as they are produced by artificial intelligence alone. Generative adversarial networks (GANs) are trained and put into action with the deliberate purpose of creating media – the job of the “generator” – whose fraudulent character can only be detected – the job of the “discriminator” – by even more complex artificial intelligence (Chesney & Citron, 2019; Goodfellow et al., 2014).
But technically this arms race cannot last forever. A point worth making is that the “game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.” (Goodfellow et al., 2014, p. 1). As framed, the adversarial process between generative models and discriminative models will in the end create perfect deepfakes. The “cat-and-mouse game” will become a “cat-and-cat game” and no automated detection will be able to be ahead of the work of the generator (Engler, 2019). Here, appealing to naked human sensory organs to discriminate originals from fakes is obsolete and completely naïve. Deepfakes, to this extent, represent a quantum leap in the old craft of forgery.
At present, the technology of deepfakes is by far mostly employed to produce pornography – the cyber-security company Deeptrace has reported that pornographic deepfake videos account for 96% of the total number of deepfake videos online (Ajder et al., 2019). Multiple adult online platforms offer deepfake videos phonily showing hundreds of celebrities from different countries engaging in all kinds of sexual encounters. A particularly disturbing aspect of this usage of deep learning techniques to forge media is that not only celebrities are the victims of sophisticated “face swap” porn. As it happens, there are dozens of websites that offer free and premium services to create astonishingly convincing deepfake videos based on pictorial data uploaded by anyone with access to the Internet. Certainly, “AI-assisted fake porn is here and we are all f*cked” (Cole, 2017).
Audio generated by artificial intelligence (speech synthesis) is equally eerie. Adobe Voco, also known as the “Photoshop for voice,” allows users to upload actual voice recordings in order to create hyper-realistic fake audios. Not without pride, a representative of Adobe said in front of a full auditorium: “We have already revolutionized photo editing. Now it’s time for us to do the audio stuff.”
Try listening to this demonstration:
Voco can take someone’s voice recording and generate from it audio of what seems to be the original speaker uttering sentences this person never said before. Tellingly, Adobe decided not to release this new software to the market after receiving a wave of criticism centered on the security threats the potential misuse of this software will probably cause.2
Now, the main epistemic concern in the light of the potential ubiquity of deepfakes is not that we are going to be massively deceived. Such a scenario is not likely. And surely enough, not everything is negative when it comes to the utilization of deepfakes. There are numerous conceivable beneficial uses. For instance, individuals suffering from permanent loss of speech will be able to create deepfake audios using their original voices. Also, it will be possible to create educational deepfake videos using pictures of people who died decades ago.
The main worry we should have comes from the fact that only a few deepfakes eventually making news could not just motivate, but ultimately also justify, a general distrust in video and audio recordings – among other imaginable reasons, deepfakes could make news because they caused politicians to lose elections, innocent people to be convicted or fired from their jobs or killed. In this scenario, videos will no longer be more reliable than mere written words and drawings depicting an event. They won’t allow one to see and hear what happened. They won’t be “transparent.” As a result, the media will be fatally eroded.
Global distrust and not global deception could be the ultimate consequence of deepfakes. And, of course, if a new royal scandal were to take place in that not implausible epistemic scenario, no videos would save the name of the shamed protagonist.
References
Ajder, H., Patrini, G. Cavalli, F., & Cullen, L. (2019). The state of deepfakes: Landscapes, threats, and impact. Deeptrace. https://regmedia.co.uk/2019/10/08/deepfake_report.pdf
Chesney, R., & Citron, D. (2019). Deepfakes and the new disinformation war. Foreign Affairs, 98(1), 147-155.
Cole, S. (2017, December 12). AI-assisted fake porn is here and we’re all fucked. Vice. https://www.vice.com/en/article/gydydm/gal-gadot-fake-ai-porn
Engler, A. (2019, November 14). Fighting deepfakes when detection fails. Brookings Institution. https://www.brookings.edu/research/fighting-deepfakes-when-detection-fails/ Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., & Bengio, Y. (2014). Generative adversarial nets. NIPS.
Kerner, C., & Risse, M. (2021). Beyond porn and discreditation: Epistemic promises and perils of deepfake technology in digital lifeworlds. Moral Philosophy and Politics, 8(1), 81-108.
Rini, R. (2020). Deepfakes and the epistemic backstop. Philosopher’s Imprint, 20(24), 1-16.
Walton, K. (1984). Transparent pictures: On the nature of photographic realism. Critical Inquiry, 11(2), 246-277.
◊ ◊ ◊
David Villena on Daily Philosophy:
Cover image: Pexels.com
-
PA Media (formerly the Press Association) is the only news agency with access to all the daily engagements, high-profile state occasions, and trips of the British Royal Family. See: https://pa.media/royal-family-collection/ ↩︎
-
I also recommend that you listen to this audio, generated by the AI lab Dessa using a text-to-speech deep learning system called “RealTalk.” The fake voice of the entertainer Joe Rogan seems to be undistinguishable from the real one. This combination of audio and video deepfakes is worth watching, too. ↩︎