End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation







Andrea Di Pierno1,2, Luca Guarnera1, Dario Allegra1, Sebastiano Battiato1
1Department of Mathematics and Computer Science, University of Catania, Italy
2IMT School of Advanced Studies Lucca, Lucca, Italy
andrea.dipierno@phd.unict.it, {luca.guarnera,dario.allegra,sebastiano.battiato}@unict.it

VERIMEDIA: International Workshop on Media Verification and Integrity - IJCNN









[RELATED WORKS]





Architecture of the proposed RawNetLite model for audio deepfake detection.



ABSTRACT


Audio deepfakes represent a growing threat to digital security and trust, leveraging advanced generative models to produce synthetic speech that closely mimics real human voices. Detecting such manipulations is especially challenging under open-world conditions, where spoofing methods encountered during testing may differ from those seen during training. In this work, we propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms. Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing. To enhance robustness, we introduce a training strategy that combines data from multiple domains and adopts Focal Loss to emphasize difficult or ambiguous samples. We further demonstrate that incorporating codec-based manipulations and applying waveform-level audio augmentations (e.g., pitch shifting, noise, and time stretching) leads to significant generalization improvements under realistic acoustic conditions. The proposed model achieves over 99.7% F1 and 0.25% EER on in-domain data (FakeOrReal), and up to 83.4% F1 with 16.4% EER on a challenging out-of-distribution test set (AVSpoof2021 + CodecFake). These findings highlight the importance of diverse training data, tailored objective functions and audio augmentations in building resilient and generalizable audio forgery detectors..






Download Paper   Code

Cite:
@inproceedings{dipierno2025end,
   title={End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation},
   author={Di Pierno, Andrea and Guarnera, Luca and Allegra, Dario and Battiato, Sebastiano},
} }





[RELATED WORKS]