DFF-2025 - Workshop Program

Keynote Talk 1

9:15 – 10:00

Facial Deepfake Detection: Five Years of Research Efforts and the FF4ALL-SERICS Experience

Gianluca Marcialis, University of Cagliari

Over the last five years, facial deepfake detection has become one of the most dynamic and challenging areas in multimedia forensics. The scientific community has progressively advanced from early detection attempts to sophisticated models capable of capturing subtle artifacts and inconsistencies in manipulated content. Despite this progress, deepfake forensics remains an arms race: detectors must generalize across unseen generators, remain effective under compression, and resist adversarial manipulations such as morphing and social-media beauty filters. Recent surveys confirm both the maturity of the field and the urgency of addressing robustness, attribution, and lifelong authentication of media. Within this evolving landscape, the sAIfer Lab’s Biometric Unit (University of Cagliari) has contributed a coherent body of research, consolidated within the FF4ALL project. Our studies have introduced approaches for artifact decomposition, high-frequency enhancement for compressed content, and tensor-based modeling for scaled and compressed images. We further analyzed fusion rules at score level and generalized detection based on inner/outer face inconsistencies. More recently, we proposed quality-based artifact modeling in videos, evaluated the robustness of forensic tools under morphing and compression, and studied the impact of beauty filters on detection systems. This keynote will reflect on five years of research efforts, combining a broad overview of the state of the art with the experience of our laboratory, and discussing how multidisciplinary collaboration—spanning biometrics, AI, and forensics—can guide the next generation of trustworthy and explainable solutions for deepfake media authentication.

Session 1: Advances in Deepfake Detection and Attribution

10:00 – 10:15 Morphing Resilient Face Recognition by Informed Frequency Selection Oral

Marco Huber, Anh Thi Luu, Naser Damer

10:15 – 10:30 AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era Oral

Chenyang Zhu, Xing Zhang, Sun Yuyang, Ching-Chun Chang, Isao Echizen

11:00 – 11:15 What if Retrieval Could Work Before Decoding? The case of JPEG AI Latents for Deepfake Source Attribution Oral

Claudio Vittorio Ragaglia, Lorenzo Catania, Francesco Guarnera, Dario Allegra, Sebastiano Battiato

11:15 – 11:30 Bridging the Gap: A Framework for Real-World Video Deepfake Detection via Social Network Compression Emulation Oral

Andrea Montibeller, Dasara Shullani, Daniele Baracchi, Alessandro Piva, Giulia Boato

11:30 – 11:45 No Detector to Rule them All Oral

Alan Perotti, Marco Nurisso, Mirko Zaffaroni

11:45 – 12:00 How Well Do Simple Features Detect Fake Faces? A Comparison with Deep Learning Oral

Giuseppe Mazzola, Liliana Lo Presti, Marco La Cascia

Keynote Talk 2

14:00 – 14:45

Multimodal Deepfake Detection Across Cultures and Languages

Abhinav Dhall, Monash University

The phenomenal growth of generative AI methods has made high-quality multimodal synthetic content generation possible. While synthetic data generation has many valuable applications, it has also resulted in deepfakes that are increasingly used to spread misinformation and disinformation. To address this challenge, we need deepfake detectors that can be deployed at scale in real-world settings and across different use cases. Such detectors should be trained on large and diverse datasets, generalize well to data generated from unseen methods, and provide explainable results. Furthermore, they must be effective across different languages and cultural contexts to ensure broad inclusivity. The 1 Million Deepfakes Detection Challenge is designed to provide a large-scale benchmark for detecting and localizing deepfakes, with a dataset that currently includes more than two million samples. Results from the participating methods highlight the limitations of existing approaches, particularly in accurately localizing manipulated segments. In many societies, people mix languages and dialects during conversation, making deepfakes harder for observers to detect. Most deepfake datasets are monolingual, contain clean audio, and are focused on Western languages. We therefore need multilingual, code-switching, and dialect-diverse audio and video datasets with realistic artifacts. The ArEnAV dataset is one such resource, consisting of Arabic–English code-switching across multiple dialects, totaling over 765 hours and 387,072 clips generated using four TTS and two lip-sync models. Furthermore, most existing datasets focus on simple face swaps or object changes and miss more complex edits that alter the meaning of an image. To address this, we propose MultiFakeVerse, a large dataset comprising deepfakes generated using language-based reasoning. A vision-language model first identifies the main person in an image, decides on a realistic manipulation, and then generates the edited image with a diffusion model. When tested on MultiFakeVerse, current detection methods that work well on traditional deepfakes perform much worse on higher-level, meaning-altering manipulations.

Session 2: Audio Forensics and Deepfake Attribution

14:45 – 15:00 Learning to Fuse: A Gated Multi-Stream Framework for Generalized Audio Deepfake Detection Oral

Taiba Majid Wani, Irene Amerini

15:00 – 15:15 Visual Quality Improved Watermarking based on Dual-Reference Loss for Deepfake Attribution Oral

Qiushi Li, Stefano Berretti, Roberto Caldelli

15:15 – 15:30 Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework Oral

Andrea Di Pierno, Luca Guarnera, Dario Allegra, Sebastiano Battiato

Session 3: Multimedia Deepfake Forensics

16:00 – 16:10 CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection Remote

Yuanzhipeng, Kai Wang, Weize Quan, Dong-ming Yan, Tieru Wu

16:10 – 16:20 REVEAL: A Retrieval-Augmented Generation Approach for Contextual Identification of Synthetic Visual Content Remote

Mamadou Keita, Wassim Hamidouche, Bougueffa Eutamene Hessen, Abdelmailk Taleb-Ahmed, Abdenour Hadid

16:20 – 16:30 SASDN: A Generalizable and Minimal-Intervention LLM-Integrated Framework for Continual Adaptation in Spoofed Speech Detection Remote

Utkarsh Venaik, Akash Kushwaha, Nabeel Koya A, Rajiv Ratn Shah

16:30 – 16:40 VAD-Lip: Visual and Audio Deepfake Detection via Lip Features Remote

JinYu Wang, Xin Jin, Huaye Wang, Longteng Jiang

16:40 – 16:50 Adversarial Reality for Frame-based Deepfake Detectors Remote

Umur A. Ciftci, Nicholas Solar, Emily Greene, Anthony Rhodes, Ilke Demir

16:50 – 17:00 Multimodal Large Models for Image Tampering Detection and Explanation: From Detection to Reasoning Remote

Chenfeng Du, Yuchen Li, Heng Huang, Xin Jin, Xianglong Zeng

17:00 – 17:10 MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack Remote

Zhi Cao, Zhongyuan Wang, Run Wang, Yuhong Yang, Feng Tian, Gang Wu, Atsushi Suzuki