Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025)

ABSTRACT

The proliferation of AI-generated media has heightened risks of misinformation, driving the need for robust deepfake detection systems. However, adversarial attacks—subtle perturbations designed to evade detection—remain a critical vulnerability. To address this, we organized the AADD-2025 challenge, inviting participants to develop attacks that fool diverse classifiers (e.g., ResNet, DenseNet, blind models) while preserving visual fidelity. The dataset included 16 subsets of high/low-quality deepfakes generated by GANs and diffusion models (e.g., StableDiffusion, StyleGAN3). Teams were evaluated on structural similarity (SSIM) and attack success rates across classifiers. Thirteen teams proposed innovative solutions leveraging latent-space manipulation, ensemble gradients, surrogate modeling, and frequency-domain perturbations. Challenge's top performers—MR-CAS (1st, score: 2740), Safe AI (2nd, 2709), and RoMa (3rd, 2679)—achieved high SSIM (0.74–0.93) while evading classifiers. MR-CAS’s latent diffusion inversion and Safe AI’s gradient ensemble framework demonstrated superior transferability, even against Vision Transformers. Key insights revealed latent-space attacks outperform pixel-level methods, ensemble strategies enhance cross-model robustness, and hybrid CNN-transformer attacks are most effective. Despite progress, challenges persist in generalizing attacks across heterogeneous models and maintaining perceptual quality. The AADD-2025 challenge underscores the urgency of developing adaptive defenses and hybrid detection systems to counter evolving adversarial threats in AI-generated media. To facilitate reproducibility and further research, the complete dataset is available for download in the challenge GitHub repository https://github.com/mfs-iplab/aadd-2025