An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites

Department of Mathematics and Computer Science, University of Catania, Italy

CUTGANA, University of Catania, Italy

ICAR-CNR, National Research Council, Palermo, Italy

G. Pasqualino, A. Furnari, G. Signorello, G. M. Farinella

Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in order to achieve good performance. Using synthetic data generated from the 3D model of the cultural site to train the algorithms can reduce these costs. On the other hand, when these models are tested with real images, a significant drop in performance is observed due to the differences between real and synthetic images. In this study we consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites. To address this problem, we created a new dataset containing both synthetic and real images of 16 different artworks. We hence investigated different domain adaptation techniques based on one-stage and two-stage object detector, image-to-image translation and feature alignment. Based on the observation that single-stage detectors are more robust to the domain shift in the considered settings, we proposed a new method based on RetinaNet and feature alignment that we called DA-RetinaNet. The proposed approach achieves better results than compared methods.

Dataset

We propose a dataset of synthetic and real images related to 16 artworks present in "Galleria Regionale Palazzo Bellomo" located in Siracusa, Italy. The dataset contains two set of images, synthetic and real which are divided has follows:

Synthetic Dataset

Training set: 51284 images

Validation set: 24525 images

Test set: 23960 images

Real Dataset

Training set: 1502 images

Test set: 688 images

You can download the whole dataset and annotations at this link

Methods

We explore the following methods:
1) baseline approaches without adaption;
2) domain adaptation through image to image translation;
3) domain adaptation through feature alignment;
4) new unsupervised domain adaptation method (code is available here );
5) domain adaptation combining feature alignment and image to image translation.

Architecture of the proposed DA-RetinaNet.

Qualitative Results

Video

Quantitative Results

Paper

Giovanni Pasqualino, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella, An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites, Image and Vision Computing, 2021 Paper

@article{PASQUALINO2021104098,
    title={An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites},
    journal={Image and Vision Computing},
    pages={104098},
    years={2021},
    issn={0262-8856},
    doi={https://doi.org/10.1016/j.imavis.2021.104098},
    author={Giovanni Pasqualino and Antonino Furnari and Giovanni Signorello and Giovanni Maria Farinella},
}

Acknowledgement

This research is supported by the project VALUE - Visual Analysis for Localization and Understanding of Environments (N. 08CT6209090207 - CUP G69J18001060007) - PO FESR 2014/2020 - Azione 1.1.5., by Piano di incentivi per la ricerca di Ateneo 2020/2022 (Pia.ce.ri.) Linea 2 - University of Catania, and by MIUR AIM - Attrazione e Mobilità Internazionale Linea 1 - AIM1893589 - CUP E64118002540007.

People

Giovanni Pasqualino

Antonino Furnari

Giovanni Signorello

Giovanni Maria Farinella