Image-Based Localization with Simulated Egocentric Navigations


Santi Andrea Orlando1,2, Antonino Furnari1, Sebastiano Battiato1 and Giovanni Maria Farinella1

1Department of Mathematics and Computer Science, University of Catania, IT
2DWORD - Xenia Progetti S.r.l., Acicastello, Catania, IT



We present a tool to generate simulated egocentric data starting from a 3D model of a real building, useful to study the problem of Image-Based Localization (IBL). In particular our focus is on the localization of users wearing an egocentric camera. The contributions of this work are:

  • a general methodology to create synthetic dataset automatically labelled with the 6 Degrees of Fredom pose (6DoF) of the camera, to study the IBL problem;
  • a large dataset of Simulated Egocentric Navigations collected considering a 3D model from the S3DIS Dataset (Area 3).
  • a benckmark study of 3DoF indoor localization to asses the effectiveness of our tool considering an IBL pipeline based on metric learning with a triplet network and an image-retrieval procedure.

Results and Demo


A video example of our localization system in action is shown below. We performd this localization considering a path from the test set.



Table 1:Results related to the selection of the two thresholds Thxz and Thθ, when 10,000 training samples are considered to build embedding space with the triplet network. For each considered value of Thθ, we marked in bold the best result.
         Thθ Thxz 60° 45° 30°
2m 0.73m ± 1.79m; 11.93° ± 22.06° 0.71m ± 1.73m; 12.17° ± 22.47° 0.71m ± 1.82m; 11.16° ± 21.72°
1m 0.68m ± 1.81m; 11.52° ± 20.54° 0.72m ± 1.88m; 11.57° ± 21.93° 0.67m ± 1.79m; 10.34° ± 19.88°
0.75m 0.64m ± 1.68m; 11.42° ± 20.97° 0.72m ± 1.94m; 11.59° ± 22.33° 0.67m ± 1.72m; 10.89° ± 21.21°
0.5m 0.74m ± 2.09m; 11.59° ± 21.62° 0.69m ± 1.86m; 11.61° ± 21.98° 0.71m ± 1.88m; 11.65° ± 22.41°
0.25 0.64m ± 1.71m; 11.41° ± 21.62° 0.71m ± 1.87m; 11.65° ± 22.41° 0.76m ± 2.06m; 11.68° ± 22.65°


Table 2:Results related to different training set dimensionality during training.
Thθ Thxz Training
Samples
Position Error Orientation Error
60° 0.75 m 10000 0.64 m ± 1.68 m 11.42° ± 20.97°
20000 0.50 m ± 1.37 m 8.83° ± 16.52°
30000 0.46 m ± 1.21 m 8.71° ± 17.45°
40000 0.42 m ± 1.21 m 7.85° ± 15.35°
45° 0.5 m 10000 0.69 m ± 1.86 m 11.61° ± 21.98°
20000 0.47 m ± 1.27 m 8.58° ± 17.07°
30000 0.44 m ± 1.13 m 8.05° ± 15.49°
40000 0.40 m ± 1.13 m 7.72° ± 15.65°
30° 1 m 10000 0.67 m ± 1.79 m 10.34° ± 19.88°
20000 0.56 m ± 1.68 m 8.86° ± 17.67°
30000 0.48 m ± 1.37 m 8.01° ± 16.61°
40000 0.43 m ± 1.25 m 7.49° ± 16.31°


Table 3:The table shows the percentage of test images with error below of 0.5m and below of 30° for varying the numbers of triplets used during the training of the network. The last row reports the average time spent during the training phase.
Training Samples 10000 20000 30000 40000
0.75 m; 60° 72.34% 78.92% 80.22% 82.20%
0.5 m; 45° 72.26% 79.56% 81.03% 83.26%
1 m; 30°
71.75% 76.70% 80.34% 82.47%
Avg training time
(in hours)
8.59 16.45 23.98 33.85


Dataset







We collected a large dataset of Simulated Egocentric Navigations using the developed Tool in Unity 3D. The dataset comprises 90 paths generated simulating three virtual agents with different heights (1.5 m, 1.6 m, 1.7m), 30 paths for each agent. During the navigation, we acquired at 30 frames per second and for each frame we saved the RGB Map, Depth Map and the 6DoF of the camera. In this way, we obtain a large dataset without any manual labelling. We provide a 3DoF version of the Dataset comprising labels and RGB images.

We considered a total of 90 random paths, 30 paths for each height of the agent [150, 160, 170]cm. Each path is performed by letting the agent reach 21 target points. For more details about our dataset go to this page .


You can download the dataset at this link .


Method





We propose a tool for Unity 3D to generate synthetic dataset to study the IBL problem. The proposed pipeline above comprises two main steps:
  1. Virtualization of the environment
  2. Simulation and generation of the dataset
We propose to acquire a 3D model of a real indoor environment and use it to simulte navigations inside the virtual environment.

We benchmarked 3DoF localization considering an image retrieval pipeline and using a Triplet Network with an Inception V3 backbone to learn useful representation. We trained the model to learn a representation space and performed a K-nearest Neighbor search to assign a 3DoF label to a new query image. At training time, we fed the model using triplets generated by sampling 10000, 20000, 30000 and 40000 images from the Training set.

At test time, we useed the sampled images from the Test set to extract the representation learned during training and for each image we assigned the label of the nearest nighbor image of the Training set.



Tool

The tool for the generation of the dataset is described and available at this link


Code

An example of how to use a learned model is available at this link


Paper

S. A. Orlando, A. Furnari, S. Battiato, G. M. Farinella - Image Based Localization with Simulated Egocentric Navigations. Submitted to the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), 2019.

Acknowledgement

This research is supported by DWORD - Xenia Progetti s.r.l. and Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, University of Catania.


Related Work

Please visit our page dedicated to First Person Vision Research @ IPLAB.

People