WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution.

WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution.

Pietro Bongini¹, Sara Mandelli², Andrea Montibeller³, Mirko Casu⁴, Orazio Pontorno⁴, Claudio Vittorio Ragaglia⁴, Luca Zanchetta⁵, Mattia Aquilina⁵, Taiba Majid Wani⁵, Luca Guarnera⁴, Benedetta Tondi¹, Giulia Boato³, Paolo Bestagini², Irene Amerini⁵, Francesco De Natale³, Sebastiano Battiato⁴, Mauro Barni¹
¹ University of Siena, Department of Information Engineering and Mathematics, Italy
² Politecnico di Milano, Department of Electronics, Informatics, and Bioengineering, Italy
³ University of Trento, Department of Information Engineering and Computer Science, Italy
⁴ University of Catania, Department of Mathematics and Computer Science, Italy
⁵ Sapienza University of Rome – Department of Computer, Control and Management Engineering, Italy
VERIMEDIA: International Workshop on Media Verification and Integrity - IJCNN

[RELATED WORKS]

Two groups of ten images generated by the closed set generators with the same text prompt. The generators are identified by the codes described in the main paper. The prompts used to generate the images are: (top, a-j): prompt 439, ``Image of an Italian young man who is wearing a pullover. He has a pointy face and full lips. The subject is looking into the camera in happiness. The image is taken with a city in the background''; (bottom, k-t): prompt 266, “A proud British woman in her 20s with hooded brown eyes and a small nose is looking into the camera. She is wearing a sky blue fedora and a necklace. The subject is portrayed with a golf course in the background".

ABSTRACT

Synthetic image source attribution is an open challenge, with an increasing number of image generators being released yearly. The complexity and the sheer number of available generative techniques, as well as the scarcity of high-quality open source datasets of diverse nature for this task, make training and benchmarking synthetic image source attribution models very challenging. WILD\footnote{The dataset is available at: https://www.kaggle.com/datasets/pietrob92/wild-in-the-wild-image-linkage-dataset} is a new in-the-Wild Image Linkage Dataset designed to provide a powerful training and benchmarking tool for synthetic image attribution models. The dataset is built out of a closed set of 10 popular commercial generators, which constitutes the training base of attribution models, and an open set of 10 additional generators, simulating a real-world in-the-wild scenario. Each generator is represented by 1,000 images, for a total of 10,000 images in the closed set and 10,000 images in the open set. Half of the images are post-processed with a wide range of operators. WILD allows benchmarking attribution models in a wide range of tasks, including closed and open set identification and verification, and robust attribution with respect to post-processing and adversarial attacks. Models trained on WILD are expected to benefit from the challenging scenario represented by the dataset itself. Moreover, an assessment of seven baseline methodologies on closed and open set attribution is presented, including robustness tests with respect to post-processing.

Download Paper

Cite:
@inproceedings{bongini2025wild,
author={Bongini, Pietro and Mandelli, Sara and Montibeller, Andrea and Casu, Mirko and Pontorno, Orazio and Ragaglia, Claudio Vittorio and Zanchetta, Luca and Aquilina, Mattia and Wani, Taiba Majid and Guarnera, Luca and Tondi, Benedetta and Boato, Giulia and Bestagini, Paolo and Amerini, Irene and De Natale, Francesco and Battiato, Sebastiano and Barni, Mauro},
title={WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution},
booktitle={International Joint Conference on Neural Networks (IJCNN)},
year={2025},
publisher={IEEE},
}

[RELATED WORKS]