Santi Andrea Orlando1, 2, **, Antonino Furnari1, ** and Giovanni Maria Farinella1, 3, **
The dataset has been generated using the NEW tool for Unity 3D proposed in our paper:
S. A. Orlando, A. Furnari, G. M. Farinella - Egocentric Visitor Localization and Artwork Detectionin Cultural Sites Using Synthetic Data.
Submitted in Pattern Recognition Letters on Pattern Recognition and Artificial Intelligence Techniques for Cultural Heritage special issue, 2020.
The tool used to generate the dataset is available at this link ).
The dataset has been generated using the "Galleria Regionale Palazzo Bellomo" 3D model scanned with Matterport.
By using this model, we simulated egocentric navigations of 4 navigations with a first clockwise navigation and a second counterclowise navigation in accord to
the room layout of the museum.
In each room the virtual agent have to visit 5 observation points situated in front of each artworks of the museum.
We acquired (at 5 fps) 99, 769 images.
The tool automatically labels each frame according to the 6 Degrees of Freedom (6DoF) of the camera: 1) camera position (x, y, z) and 2) camera orientation in quaternions (w, p, q, r).
Specifically we converted the 6DoF of the camera pose in 3DoF format taking in consideration:
Navigations | 1 | 2 | 3 | 4 | Overall images |
---|---|---|---|---|---|
# of Frames | 24, 525 | 25, 003 | 26, 281 | 23, 960 | 99, 769 |
We considered as training set the frames of 2nd and 3rd navigations. The Training set has been divided in subset each with 25%, 50%, 75% and 100% of the frames. We cast the Image Based Localization (IBL) as an image retrieval problem and use a Triplet Network to learn a suitable representation space for images. The network is trained using triplets, each comprising of: 1) the anchor frame I; 2) a similar image I + and 3) a dissimilar image I -. The triplets have been generated using: 1) as threshold for Euclidean distance 0.5 m, and 2) as threshold for Orientation distance 45°. We trained the network for 100 epochs, one model for each subset. We considered as Test set the frames of 1st navigation. We investigated techniques of temporal smoothing by applying three different filter: mean, median and 25% trimmed mean.
The dataset has been generated using the tool for Unity 3D proposed in our paper:
S. A. Orlando, A. Furnari, S. Battiato, G. M. Farinella - Image Based Localization with Simulated Egocentric Navigations.
In International Conference on Computer Vision Theory and Applications (VISAPP), 2019.
The tool is available at this link ).
The dataset has been generated using the "Area 3" 3D model from the S3DIS Dataset. By using this model, we simulated egocentric navigations of 90 paths, 30 paths for each height of the virtual agents that performed the navigations [150cm, 160cm, 170cm].
We acquired (at 30 fps) 886, 823 images.
The tool automatically labels each frame according to the 6 Degrees of Freedom (6DoF) of the camera: 1) camera position (x, y, z) and 2) camera orientation in quaternions (w, p, q, r).
We converted the 6DoF of the camera in a 3DoF format taking in consideration:
Agent height | 1.5 m | 1.6 m | 1.7 m | Overall images |
---|---|---|---|---|
# of Frames | 301, 757 | 296, 164 | 288, 902 | 886, 823 |
We splited our Dataset in three parts in order to train a CNN: