Santi Andrea Orlando1, 2, **, Antonino Furnari1, ** and Giovanni Maria Farinella1, 3, **
The dataset has been generated using the tool proposed in the paper:
S. A. Orlando, A. Furnari, G. M. Farinella - Egocentric Visitor Localization and Artwork Detectionin Cultural Sites Using Synthetic Data.
Submitted in Pattern Recognition Letters on Pattern Recognition and Artificial Intelligence Techniques for Cultural Heritage special issue, 2020.
The dataset has been generated using the "Galleria Regionale Palazzo Bellomo" 3D model scanned with Matterport, and includes
simulated egocentric navigations of 4 navigations.
Each frames has been labeled with the 3DOF and the room in which the virtual agent is during the acquisition.
The context of the museum are 11. The figure below shows the map of the museum.
Fig. 2: Map of Palazzo Bellomo with marked contexts.
Navigations | 1 | 2 | 3 | 4 | Overall images |
---|---|---|---|---|---|
# of Frames | 24, 525 | 25, 003 | 26, 281 | 23, 960 | 99, 769 |
Table 1: Dataset detail of Palazzo Bellomo.
We considered as training set the frames of 2nd and 3rd navigations.
We cast the Image Based Localization (IBL) as an image retrieval problem and use a Triplet Network to learn a suitable representation space for images.
The network is trained using triplets, each comprising of: 1) the anchor frame I; 2) a similar image I + and 3) a
dissimilar image I -. The triplets have been generated using: 1) as threshold for Euclidean distance 0.5 m, and 2) as threshold for Orientation
distance 45°. We trained the network for 100 epochs, one model for each subset.
We considered as Test set the frames of 1st navigation.
We also train the network with different mid level representation as technique of domain adaptation.
Another method of studied domain adaptation is the unpaired image-to-image translation using CycleGAN and ToDayGAN.
We described below the dataset used as real domain and after show example of extracted mid level representation.
Fig. 3: Examples of mid-level representations extracted with the corresponding RGB images. We report two pairs of virtual and real images.
This dataset has been collected with Microsof HoloLens used from volunteer during their visit in the museum.
The dataset has benn published in this work:
F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. EGO-CH: Dataset and Fundamental Tasks for Visitors Behavioral
Understanding using Egocentric Vision . Pattern Recognition Letters - Special Issue on Pattern Recognition and Artificial Intelligence Techniques
for Cultural Heritage, 2020.
A subset of the dataset is used to benchmark Virtual to Real domain adaptation in particular, we used the 10 video of Test.
We acquired (at 5 fps) and excluded the frame of the ground floor not available in virtual dataset.
The total extracted images are 12, 008 labeled with the 11 rooms of the museum.
Fig. 3: Examples of mid-level representations extracted with the corresponding RGB images. We report two pairs of virtual and real images.
We split the Dataset in training set, validation set and test set as follows:
Training set | Validation set | Test set | Overall images | |
---|---|---|---|---|
# of Frames | 24, 357 | 10, 288 | 12, 008 | 46, 653 |
This research is supported by XENIA Progetti - DWORD, by the project VALUE - Visual Analysis for Localization and Understanding of Environments (N. 08CT6209090207, CUP G69J18001060007) granted by PO FESR 2014/2020 - Azione 1.1.5 - "Sostegno all’avanzamento tecnologico delle imprese attraverso il finanziamento di linee pilota e azioni di validazione precoce dei prodotti e di dimostrazioni su larga scala'', and by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, University of Catania. The authors would like to thank Regione Siciliana Assessorato dei Beni Culturali dell'Identità Siciliana - Dipartimento dei Beni Culturali e dell'Identità Siciliana and Polo regionale di Siracusa per i siti culturali - Galleria Regionale di Palazzo Bellomo.