Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025)







Sebastiano Battiato1, Mirko Casu1, Francesco Guarnera1, Luca Guarnera1, Giovanni Puglisi2, Orazio Pontorno1, Claudio Vittorio Ragaglia1, Zahid Akhtar3
1 Department of Mathematics and Computer Science, University of Catania, Italy
2 Department of Mathematics and Computer Science, University of Cagliari, Italy
3 Department of Electrical and Computer Engineering, State University of New York Polytechnic Institute Utica, New York, USA
sebastiano.battiato@unict.it, mirko.casu@phd.unict.it, francesco.guarnera@unict.it, luca.guarnera@unict.it, puglisi@unica.it, orazio.pontorno@phd.unict.it, claudio.ragaglia@phd.unict.it, akhtarz@sunypoly.edu

ACM International Conference on Multimedia (MM '25),









[RELATED WORKS]



ABSTRACT


The proliferation of AI-generated media has heightened risks of misinformation, driving the need for robust deepfake detection systems. However, adversarial attacks—subtle perturbations designed to evade detection—remain a critical vulnerability. To address this, we organized the AADD-2025 challenge, inviting participants to develop attacks that fool diverse classifiers (e.g., ResNet, DenseNet, blind models) while preserving visual fidelity. The dataset included 16 subsets of high/low-quality deepfakes generated by GANs and diffusion models (e.g., StableDiffusion, StyleGAN3). Teams were evaluated on structural similarity (SSIM) and attack success rates across classifiers. Thirteen teams proposed innovative solutions leveraging latent-space manipulation, ensemble gradients, surrogate modeling, and frequency-domain perturbations. Challenge's top performers—MR-CAS (1st, score: 2740), Safe AI (2nd, 2709), and RoMa (3rd, 2679)—achieved high SSIM (0.74–0.93) while evading classifiers. MR-CAS’s latent diffusion inversion and Safe AI’s gradient ensemble framework demonstrated superior transferability, even against Vision Transformers. Key insights revealed latent-space attacks outperform pixel-level methods, ensemble strategies enhance cross-model robustness, and hybrid CNN-transformer attacks are most effective. Despite progress, challenges persist in generalizing attacks across heterogeneous models and maintaining perceptual quality. The AADD-2025 challenge underscores the urgency of developing adaptive defenses and hybrid detection systems to counter evolving adversarial threats in AI-generated media. To facilitate reproducibility and further research, the complete dataset is available for download in the challenge GitHub repository https://github.com/mfs-iplab/aadd-2025






Download Paper   Web page GITHUB

Cite:
@inproceedings{battiato2025adversarial,
   title={Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025)},
   author={Battiato, Sebastiano and Casu, Mirko and Guarnera, Francesco and Guarnera, Luca and Puglisi, Giovanni and Pontorno, Orazio and Ragaglia, Claudio Vittorio and Akhtar, Zahid},
   booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
   pages={13714--13719},
   year={2025}
}





[RELATED WORKS]