Egocentric Visitors Localization in Cultural Sites

Francesco Ragusa1, Antonino Furnari1, Sebastiano Battiato1, Giovanni Signorello2, Giovanni Maria Farinella1,2

1IPLab, Department of Mathematics and Computer Science - University of Catania, IT
2CUTGANA - University of Catania, IT

You can download the dataset:
Microsoft HoloLens
GoPro Hero 4

VEDI Dataset

The dataset has been acquired using two wearable devices: Microsoft Hololens and a chest mounted GoPro Hero4. The two devices have been used simultaneously to acquire the whole dataset.

Each frame of the videos has been labelled according to two levels: 1) the location of the visitor (one of the 9 environments of interest) and 2) the ``point of interest'', i.e. cultural good of interest currently observed by the visitor, if any. Both labelling levels allow for a ``negative'' class (i.e., frames containing visual informations which are not of interest). In the case of location labels, a frame is a negative when the visitor is not in any of the considered environments. This for instance happens when the visitor is transiting through a corridor which has not been included in the training set. In the case of object level annotations, a frame is negative when the visitor is not looking at any of the considered points of interest in particular. We considered a total of 9 environments and 57 points of interests.

Negative Class

Sometimes some environemnts don't belong to any class previously defined. These elements represent the negative class.


For the Traning we have acquired 160 videos (80 with Hololens and 80 with GoPro). The Table shows a summary of the video's details acquired to according to the device used.

Points of interest Environments
#video time(s) MB #video time(s) MB
Hololens 68 104 103 12 190 196
GoPro 68 93 233 12 186 455

Next bar charts compare the number of frames of the Training videos of environments captured using Hololens and GoPro.

Points of Interest captured by Hololens

Point of Interest captured by GoPro


In the final phase of acquisition we have been recorded videos tour in order to have test videos on which to try our proposed method. The Test videos acquired are 14 (7 recorded by Hololens and 7 by GoPro). The 7 Hololens videos are fragmented into one or more videos due to the time limit of data recording of the device.

Test Video
#Video Time(s) MB
Hololens 7 780 730
GoPro 7 828 2062

Test videos acquired with Hololens have some matches with Test videos acquired with GoPro. More videos have been recorded at the same time using both devices. The list of matches between Test videos is shown in the next Table.

Hololens GoPro Note
Test3.0, Test3.1, Test3.2, Test3.3, Test3.4 Test4
Test4.0, Test4.1, Test4.2, Test4.3, Test4.4 Test5, Test6
Test5.0, Test5.1, Test5.2, Test5.3 Test7 Test7 covers Test5.x Hololens videos but contains additional environments
Test7.0, Test7.1 Test2