Semantic Object Segmentation in Cultural Sites using Real and Synthetic Data

F. Ragusa1,2, D. DiMauro1, A. Palermo1, A. Furnari1, G. M. Farinella1

1FPV@IPLab, Department of Mathematics and Computer Science - University of Catania, IT
2Xenia Gestione Documentale s.r.l. - Xenia Progetti s.r.l., Acicastello, Catania, IT

We consider the problem of object segmentation in cultural sites. Since collecting and labeling large datasets of real images is challenging, we investigate whether the use of synthetic images can be useful to achieve good segmentation performance on real data. To perform the study, we collected a new dataset comprising both real and synthetic images of 24 artworks in a cultural site. The experimental results point out that the use of synthetic data helps to improve the performances of segmentation algorithms when tested on real images. Satisfactory performance is achieved exploiting semantic segmentation together with image-to-image translation and including a small amount of real data during training. The constributions of this work are the following:

  • We propose a novel dataset comprising both synthetic andreal images of 24 artworks in a cultural site. The images of the considered artworks have been labeled with semantic masks. To the best of our knowledge, this dataset is the first of its kind. We release it publicly to encourage research on this topic;
  • An experimental analysis to assess the usefulness of synthetic data to improve the performance of semantic segmentation on real data. The proposed analysis also provides useful baseline results on the proposed dataset.


1. Real domain

We consider the cultural site Palazzo Bellomo of the EGO-CH dataset. The dataset has been acquired using a head-mounted Microsoft HoloLens device. We have manual annotated 24 objects from 11 environments with semantic masks. In particular, we have annotated 4740 images from the training set of EGO-CH dataset and 848 images from the test set.

2. Synthetic domain

We developed a framework to generate synthetic data automatic annotated whith semantic masks from the 3D model of the considered environment. We generated 12000 training images, 1200 images for validation and 10800 test images.

The dataset will be released at conference time at this link .


Real Training Data Accuracy% Class Accuracy% Mean IoU% FWAVACC%

5% 70.94 44.59 29.34 56.65
10% 76.46 50.10 33.53 63.48
25% 80.22 60.57 43.76 68.19
50% 83.51 63.71 50.21 72.41
100% 83.72 64.47 49.49 72.62

0% 58.32 8.45 5.50 35.60
5% 71.41 47.50 31.41 56.06
10% 80.39 64.92 43.76 68.67
25% 83.02 62.04 47.61 71.34
50% 84.23 69.84 51.66 74.31
100% 84.78 63.37 50.53 73.78

0% 55.99 7.65 3.77 31.90
5% 84.17 75.54 55.91 74.78
10% 89.07 78.26 63.61 81.17
25% 89.87 79.94 62.57 82.30
50% 89.95 76.68 67.14 82.06
100% 91.06 81.07 66.93 83.98


F. Ragusa, D. DiMauro, A. Palermo, A. Furnari, G. M. Farinella. Synthetic vs Real. Objects Segmentation in Cultural Heritage. In International Conference on Pattern Recognition (ICPR), 2020. Download the paper.


This research is supported by MIUR - Programma Operativo Nazionale Ricerca e Innovazione 2014-2020 - Dottorati Innovativi a Caratterizzazione Industriale XXXIII CICLO, by the project VALUE - Visual Analysis for Localization and Understanding of Environments (N. 08CT6209090207) - PO FESR 2014/2020 - Azione 1.1.5. - “Sostegno all’avanzamento tecnologico delle imprese attraverso il finanziamento di linee pilota e azioni di validazione precoce dei prodotti e di dimostrazioni su larga scala”, and by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, University of Catania. The authors would like to thank Regione Siciliana Assessorato dei Beni Culturali dell'Identità Siciliana - Dipartimento dei Beni Culturali e dell'Identità Siciliana and Polo regionale di Siracusa per i siti culturali - Galleria Regionale di Palazzo Bellomo.