Semantic Object Segmentation in Cultural Sites using Real and Synthetic Data

We consider the problem of object segmentation in cultural sites. Since collecting and labeling large datasets of real images is challenging, we investigate whether the use of synthetic images can be useful to achieve good segmentation performance on real data. To perform the study, we collected a new dataset comprising both real and synthetic images of 24 artworks in a cultural site. The experimental results point out that the use of synthetic data helps to improve the performances of segmentation algorithms when tested on real images. Satisfactory performance is achieved exploiting semantic segmentation together with image-to-image translation and including a small amount of real data during training. The constributions of this work are the following:

We propose a novel dataset comprising both synthetic andreal images of 24 artworks in a cultural site. The images of the considered artworks have been labeled with semantic masks. To the best of our knowledge, this dataset is the first of its kind. We release it publicly to encourage research on this topic;
An experimental analysis to assess the usefulness of synthetic data to improve the performance of semantic segmentation on real data. The proposed analysis also provides useful baseline results on the proposed dataset.

Dataset

1. Real domain

We consider the cultural site Palazzo Bellomo of the EGO-CH dataset. The dataset has been acquired using a head-mounted Microsoft HoloLens device. We have manual annotated 24 objects from 11 environments with semantic masks. In particular, we have annotated 4740 images from the training set of EGO-CH dataset and 848 images from the test set.

2. Synthetic domain

We developed a framework to generate synthetic data automatic annotated whith semantic masks from the 3D model of the considered environment. We generated 12000 training images, 1200 images for validation and 10800 test images.

The dataset will be released at conference time at this link .

Results

	Real Training Data	Accuracy%	Class Accuracy%	Mean IoU%	FWAVACC%
PSPNet_R	5%	70.94	44.59	29.34	56.65
	10%	76.46	50.10	33.53	63.48
	25%	80.22	60.57	43.76	68.19
	50%	83.51	63.71	50.21	72.41
	100%	83.72	64.47	49.49	72.62
PSPNet_S+R	0%	58.32	8.45	5.50	35.60
	5%	71.41	47.50	31.41	56.06
	10%	80.39	64.92	43.76	68.67
	25%	83.02	62.04	47.61	71.34
	50%	84.23	69.84	51.66	74.31
	100%	84.78	63.37	50.53	73.78
PSPNet+CycleGAN	0%	55.99	7.65	3.77	31.90
	5%	84.17	75.54	55.91	74.78
	10%	89.07	78.26	63.61	81.17
	25%	89.87	79.94	62.57	82.30
	50%	89.95	76.68	67.14	82.06
	100%	91.06	81.07	66.93	83.98

Paper

F. Ragusa, D. DiMauro, A. Palermo, A. Furnari, G. M. Farinella. Synthetic vs Real. Objects Segmentation in Cultural Heritage. In International Conference on Pattern Recognition (ICPR), 2020. Download the paper.

Acknowledgement

This research is supported by MIUR - Programma Operativo Nazionale Ricerca e Innovazione 2014-2020 - Dottorati Innovativi a Caratterizzazione Industriale XXXIII CICLO, by the project VALUE - Visual Analysis for Localization and Understanding of Environments (N. 08CT6209090207) - PO FESR 2014/2020 - Azione 1.1.5. - “Sostegno all’avanzamento tecnologico delle imprese attraverso il finanziamento di linee pilota e azioni di validazione precoce dei prodotti e di dimostrazioni su larga scala”, and by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, University of Catania. The authors would like to thank Regione Siciliana Assessorato dei Beni Culturali dell'Identità Siciliana - Dipartimento dei Beni Culturali e dell'Identità Siciliana and Polo regionale di Siracusa per i siti culturali - Galleria Regionale di Palazzo Bellomo.

Related Work

This work is related to the following publication:

F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. EGO-CH: Dataset and Fundamental Tasks for Visitors Behavioral Understanding using Egocentric Vision. Pattern Recognition Letters - Special Issue on Pattern Recognition and Artificial Intelligence Techniques for Cultural Heritage, 2020. Web Page.

S. Orlando, A. Furnari, G. M. Farinella. Egocentric Visitor Localization and Artwork Detection inCultural Sites Using Synthetic Data . Pattern Recognition Letters - Special Issue on Pattern Recognition and Artificial Intelligence Techniques for Cultural Heritage, 2020. Web Page.

G. M. Farinella, G. Signorello, S. Battiato, A. Furnari, F. Ragusa, R. Leonardi, E. Ragusa, E. Scuderi, A. Lopes, L. Santo, M. Samarotto. VEDI: Vision Exploitation for Data Interpretation. In 20th International Conference on Image Analysis and Processing (ICIAP), 2019.

F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. Egocentric Visitors Localization in Cultural Sites. In Journal on Computing and Cultural Heritage (JOCCH), 2019. Web Page.

F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. Egocentric Point of Interest Recognition in Cultural Sites. In 14th International Conference on Computer Vision Theory and Applications (VISAPP), Prague, Czech Republic, February 25-27, 2019. Web Page.

G. M. Farinella, G. Signorello, A. Furnari, S. Battiato, E. Scuderi, A. Lopes, L. Santo, M. Samarotto, G. P. A. Distefano, D. G. Marano, “Integrated Method With Wearable Kit For Behavioural Analysis And Augmented Vision" (Italian Version: "Metodo Integrato con Kit Indossabile per Analisi Comportamentale e Visione Aumentata"), Patent Application number 102018000009545, filling date: 17/10/2018, Università degli Studi di Catania, Xenia Gestione Documentale S.R.L., IMC Service S.R.L.

You can also visit our page dedicated to First Person Vision Research @ IPLAB.