Egocentric Points of Interest

We consider the problem of the detection and recognition of points of interest in cultural sites. We observe that a ``point of interest" in a cultural site may be either an object or an environment and highlight that the use of an object detector is beneficial to recognize points of interest which occupy a small part of the frame. The contributions of this work are the following:

The observation of the dual nature of point of interest in a cultural site, including objects and environments;
The extension of the UNICT-VEDI dataset with bounding box annotations;
A comparison of approaches based on whole scene processing with respect to object detection to recognize points of interest in cultural sites.

Results

A video example of our object-based approach in action is shown below. Quantitative results are summarized in the following table (see the paper for more details).

Comparison of the three temporal approach and the object based method. The last column reports the maximum value for each row.

Class	57-POI	57-POI-N	9-Classifiers	object-based	Per-row Max
1.1 Ingresso	0.70	0,68	0.68	0,50	0,70
2.1 RampaS.Nicola	0,58	0,57	0.64	0,47	0,64
2.2 RampaS.Benedetto	0,29	0,28	0.55	0,54	0,55
3.1 SimboloTreBiglie	0,00	0,00	0,00	0,00	0,00
3.2 ChiostroLevante	/	/	/	/	/
3.3 Plastico	/	/	/	/	/
3.4 Affresco	0,48	0,49	0.50	0,45	0,50
3.5 Finestra_ChiostroLevante	0,00	0,00	0,00	0.02	0,02
3.6 PortaCorodiNotte	0,73	0,70	0.76	0,64	0,76
3.7 TracciaPortone	0,00	0,00	0.93	0,80	0,93
3.8 StanzaAbate	/	/	/	/	/
3.9 CorridoioDiLevante	0,60	0,49	0.81	0,23	0,81
3.10 CorridoioCorodiNotte	0,76	0,88	0.92	0,78	0,92
3.11 CorridoioOrologio	0,67	0,67	0.81	0,44	0,81
4.1 Quadro	0,91	0,92	0,79	0,92	0,92
4.2 PavimentoOriginaleAltare	0,44	0,64	0,46	0.69	0,69
4.3 BalconeChiesa	0.87	0,82	0,86	0,68	0,87
5.1 PortaAulaS.Mazzarino	0,46	0,59	0,48	0.75	0,75
5.2 PortaIngressoMuseoFabbrica	0,37	0,42	0.91	0,53	0,91
5.3 PortaAntirefettorio	0,00	0,00	0,40	0.79	0,79
5.4 PortaIngressoRef.Piccolo	0,00	0,00	0,00	0.86	0,86
5.5 Cupola	0,91	0,49	0,87	0.99	0,99
5.6 AperturaPavimento	0,95	0,94	0,94	0.97	0,97
5.7 S.Agata	0,97	0,97	0,97	1.00	1,00
5.8 S.Scolastica	0,96	0.99	0,85	0,92	0,99
5.9 ArcoconFirma	0,72	0.83	0,77	0,77	0,83
5.10 BustoVaccarini	0,87	0.94	0,88	0,90	0,94
6.1 QuadroSantoMazzarino	0.96	0,81	0,68	0,81	0,96
6.2 Affresco	0,89	0,89	0,96	0.97	0,97
6.3 PavimentoOriginale	0,92	0,89	0,96	0.98	0,98
6.4 PavimentoRestaurato	0,48	0,60	0.74	0,33	0,74
6.5 BassorilieviMancanti	0,77	0,61	0.88	0,77	0,88
6.6 LavamaniSx	0,82	0,81	0.99	0,97	0,99
6.7 LavamaniDx	0,00	0,00	0.98	0,95	0,98
6.8 TavoloRelatori	0.88	0,69	/	0,75	0,88
6.9 Poltrone	0,56	0.87	0,47	0,28	0,87
7.1 Edicola	0,70	0,77	0.86	0,85	0,86
7.2 PavimentoA	0,00	0,00	0,42	0.58	0,58
7.3 PavimentoB	0,00	0,00	0,00	0.29	0,29
7.4 PassavivandePavimentoOriginale	0,57	0,58	0,68	0.80	0,80
7.5 AperturaPavimento	0.83	0,82	0,80	0,73	0,83
7.6 Scala	0,59	0,68	0,86	0.91	0,91
7.7 SalaMetereologica	0,76	0,75	0.98	0,82	0,98
8.1 Doccione	0,79	0,80	0.86	0,72	0,86
8.2 VanoRaccoltaCenere	0,35	0,40	0.47	0,44	0,47
8.3 SalaRossa	0,73	0,81	0.84	0,57	0,84
8.4 ScalaCucina	0,68	0.72	0,60	0,62	0,72
8.5 CucinaProvv.	0,66	0,62	0,81	0.83	0,83
8.6 Ghiacciaia	0,43	0.95	0,69	0,40	0,95
8.7 Latrina	0,98	0,98	0.99	0,75	0,99
8.8 OssaeScarti	0,64	0.77	0,72	0,69	0,77
8.9 Pozzo	0,41	0,90	0.94	0,87	0,94
8.10 Cisterna	0,13	0,00	0,00	0.44	0,44
8.11 BustoPietroTacchini	0,95	0,97	0.99	0,85	0,99
9.1 NicchiaePavimento	0,73	0,75	0.95	0,65	0,95
9.2 TraccePalestra	0,79	0.91	0,28	0,88	0,91
9.3 PergolatoNovizi	0.75	0,69	/	0,72	0,75
Negatives	0,46	0.62	0,60	0,55	0,62
mF₁	0.59	0.62	0.66	0.68	0.75

Mean Average Precision (mAP) of the proposed object-based approach on the 7 test videos. AP scores are reported for each point of interest (POI) using a threshold of 0.35.

Class	Test1	Test2	Test3	Test4	Test5	Test6	Test7	AVG
1.1 Ingresso	73,61%	40,00%	27,27%	53,85%	0,00%	37,50%	35,29%	38,22%
2.1 RampaS.Nicola	0,00%	56,25%	24,62%	19,44%	80,52%	27,47%	0,00%	29,76%
2.2 RampaS.Benedetto	55,81%	/	12,50%	40,08%	0,00%	6,67%	63,21%	29,71%
3.1 SimboloTreBiglie	0,00%	/	0,00%	0,00%	66,67%	0,00%	0,00%	11,11%
3.2 ChiostroLevante	0,00%	/	0,00%	0,00%	35,14%	0,00%	0,00%	5,86%
3.3 Plastico	/	/	/	/	50,00%	/	/	50,00%
3.4 Affresco	0,00%	/	22,73%	6,12%	36,84%	18,46%	0,00%	14,03%
3.5 Fin.\_ChiostroLev.	0,00%	0,00%	/	0,00%	0,00%	/	/	0,00%
3.6 PortaCorodiNotte	8,89%	16,67%	15,91%	15,79%	7,50%	15,91%	35,90%	16,65%
3.7 TracciaPortone	0,00%	/	/	27,27%	50,00%	57,14%	14,29%	29,74%
3.8 StanzaAbate	/	/	/	/	/	/	/	/
3.9 Corr.DiLevante	12,50%	11,96%	2,86%	12,33%	0,00%	14,29%	12,12%	9,44%
3.10 Corr.CorodiNotte	58,93%	55,32%	61,08%	59,46%	35,77%	72,29%	64,58%	58,20%
3.11 Corr.Orologio	78,00%	/	25,74%	22,33%	10,17%	23,64%	8,75%	28,11%
4.1 Quadro	80,65%	80,00%	47,62%	46,15%	66,67%	/	/	64,22%
4.2 Pav.OriginaleA.	49,06%	55,41%	75,29%	66,33%	64,29%	/	/	62,08%
4.3 BalconeChiesa	40,91%	52,94%	61,82%	/	65,38%	/	/	55,26%
5.1 PortaAulaS.Mazz.	55,41%	/	29,07%	36,36%	20,00%	/	/	35,21%
5.2 PortaIngr.MuseoF.	0,00%	/	33,33%	36,67%	62,50%	/	/	33,13%
5.3 PortaAntirefettorio	0,00%	/	40,91%	9,09%	0,00%	/	/	12,50%
5.4 PortaIng.Ref.Pic.	0,00%	/	66,67%	/	/	/	/	33,34%
5.5 Cupola	/	/	100,00%	100,00%	100,00%	/	/	100,00%
5.6 AperturaPav.	88,89%	/	100,00%	50,00%	/	/	/	79,63%
5.7 S.Agata	100,00%	/	45,83%	50,00%	88,89%	/	/	71,18%
5.8 S.Scolastica	0,00%	/	25,00%	88,89%	97,62%	/	/	52,88%
5.9 ArcoconFirma	/	/	79,69%	100,00%	50,00%	/	49,16%	69,71%
5.10 BustoVaccarini	/	/	81,82%	71,43%	/	/	91,67%	81,64%
6.1 QuadroS.Mazz.	90,00%	/	76,92%	/	92,31%	/	/	86,41%
6.2 Affresco	100,00%	/	79,67%	/	94,74%	/	/	91,47%
6.3 Pav.Originale	56,00%	/	55,56%	/	54,55%	/	/	55,37%
6.4 Pav.Restaurato	13,33%	/	4,17%	/	0,00%	/	/	5,83%
6.5 Bass.Mancanti	13,64%	/	42,01%	/	11,11%	/	/	22,25%
6.6 LavamaniSx	71,43%	/	38,89%	/	0,00%	/	/	36,77%
6.7 LavamaniDx	0,00%	/	38,89%	/	54,44%	/	/	31,11%
6.8 TavoloRelatori	0,00%	/	62,02%	/	0,00%	/	/	20,67%
6.9 Poltrone	39,25%	/	15,54%	/	25,00%	/	/	26,60%
7.1 Edicola	/	/	73,73%	53,85%	65,31%	/	/	64,30%
7.2 PavimentoA	/	/	7,84%	0,00%	15,38%	/	/	7,74%
7.3 PavimentoB	/	/	0,00%	0,00%	37,50%	/	/	12,50%
7.4 Passaviv.Pav.O.	/	/	53,57%	49,12%	43,59%	/	/	48,76%
7.5 AperturaPav.	/	/	28,57%	40,62%	44,74%	/	/	37,98%
7.6 Scala	/	/	70,00%	/	60,00%	/	/	65,00%
7.7 SalaMetereologica	/	/	70,37%	86,21%	26,67%	/	/	61,08%
8.1 Doccione	/	/	23,53%	33,33%	42,59%	/	/	33,15%
8.2 VanoRacc.Cenere	/	/	87,50%	/	100,00%	/	/	93,75%
8.3 SalaRossa	/	/	42,50%	45,24%	61,54%	/	/	49,76%
8.4 ScalaCucina	/	/	61,25%	42,11%	50,76%	/	/	51,37%
8.5 CucinaProvv.	/	/	/	73,33%	82,61%	/	/	77,97%
8.6 Ghiacciaia	/	/	100,00%	/	66,67%	/	/	83,34%
8.7 Latrina	/	/	/	100,00%	50,00%	/	/	75,00%
8.8 OssaeScarti	/	/	68,33%	54,55%	63,16%	/	/	62,01%
8.9 Pozzo	/	/	80,00%	52,08%	85,71%	/	/	72,60%
8.10 Cisterna	/	/	13,89%	53,32%	25,00%	/	/	30,74%
8.11 BustoPietroT.	/	/	67,78%	70,59%	100,00%	/	/	79,46%
9.1 NicchiaePavimento	/	/	45,83%	31,94%	0,00%	/	/	25,92%
9.2 TraccePalestra	/	/	62,50%	70,59%	92,31%	/	/	75,13%
9.3 PergolatoNovizi	/	/	/	60,05%	0,00%	/	/	30,03%
(m)AP	35.04%	40.95%	47.01%	44.60%	45.92%	24.85%	28.84%	38.17%

Dataset

We extended the UNICT-VEDI dataset proposed in Ragusa et al.[1] annotating with bounding boxes the presence of 57 different points of interest. We only considered data acquired using the head-mounted Microsoft HoloLens device. The UNICT-VEDI dataset comprises a set of training videos (at least one per point of interest), plus 7 test videos acquired by subjects visiting a cultural site. Each video of the dataset has been temporally labeled to indicate the environment in which the visitor is moving (9 different environments are labeled) and the point of interest observed by the visitor (57 points of interest have been labeled). For each of the 57 points of interest included in the UNICT-VEDI dataset, we annotated approximately 1,000 frames from the provided training videos, for a total of 54,248 frames.

We considered a total of 9 environments and 57 points of interests. For more details about our dataset go to this page .

You can download the dataset annotating with bounding boxes at this link .

Methods

Recognizing the points of interest observed by visitors in a cultural site is the natural next step after visitor localization (Ragusa et al. [1]). We consider three different variants of the pipeline presented in Ragusa et al. [1] and an approach based on an object detector.

57-POI: is the state-of-the-art method proposed in Ragusa et al. [1]. The discrimination component of the method is trained to discriminate between the 57 points of interest. No ``negative'' frames are used for training. The rejection of negatives is performed by the rejection component of Ragusa et al. [1];

57-POI-N: is similar to the 57-POI method, with the addition of a negative class. The discriminator component of the method in Ragusa et al. [1] is trained to discriminate between 57 points of interest plus the ``negative'' class. In this case, negative frames are explicitly used for training. The rejection component of Ragusa et al. [1] is further used to detect and reject more negatives;

9-Classifiers: nine context-specific instances of the method in Ragusa et al. [1] are trained to recognize the points of interest related to the nine different contexts of the UNICT-VEDI dataset (i.e., one classifier per context). Similarly to 57-POI, no negatives are used for training;

Object-based: A YOLOv3 object detector is used to perform the detection and recognition of each of the 57 points of interest. At test time, YOLOv3 returns the coordinates of a set of bounding boxes with the related class scores for each frame. If no bounding box has been predicted in a given frame, we reject the frame and assign it to the ``negative'' class. If multiple bounding boxes are found in a specific frame, we choose the bounding box with the highest class-score and assign its class to the frame. We have chosen the YOLOv3 object detector because it is a state-of-the-art real-time object detector.

Paper

F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. Egocentric Point of Interest Recognition in Cultural Sites. In International Conference on Computer Vision Theory and Applications, 2019. Download the paper here.

Acknowledgement

This research is supported by PON MISE – Horizon 2020, Project VEDI - Vision Exploitation for Data Interpretation, Prog. n. F/050457/02/X32 - CUP: B68I17000800008 - COR: 128032, and Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI of the University of Catania.

Related Work

This work is related to the following publication:

F. Ragusa, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. Egocentric Visitors Localization in Cultural Sites. In Journal on Computing and Cultural Heritage (JOCCH), 2019. Web Page.

G. M. Farinella, G. Signorello, A. Furnari, S. Battiato, E. Scuderi, A. Lopes, L. Santo, M. Samarotto, G. P. A. Distefano, D. G. Marano, “Integrated Method With Wearable Kit For Behavioural Analysis And Augmented Vision" (Italian Version: "Metodo Integrato con Kit Indossabile per Analisi Comportamentale e Visione Aumentata"), Patent Application number 102018000009545, filling date: 17/10/2018, Università degli Studi di Catania, Xenia Gestione Documentale S.R.L., IMC Service S.R.L.

F. Ragusa, L. Guarnera, A. Furnari, S. Battiato, G. Signorello, G. M. Farinella. Localization of Visitors for Cultural Sites Management. In International Conference on Signal Processing and Multimedia Applications (SIGMAP), Porto, Portugal, July 26-28, 2018. Web Page.

A. Furnari, S. Battiato, G. M. Farinella, Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications, submitted to Journal of Visual Communication and Image Representation Web Page.

A. Furnari, G. M. Farinella, S. Battiato. 2017. Recognizing Personal Locations From Egocentric Videos. IEEE Transactions on Human-Machine Systems (2017). https://doi.org/10.1109/THMS.2016.2612002 Web Page.

A. Furnari, G. M. Farinella, S. Battiato, Temporal Segmentation of Egocentric Videos to Highlight Personal Contexts of Interest, International workshop on egocentric perception, interaction and computing (EPIC) in conjunction with ECCV 2016.

You can also visit our page dedicated to First Person Vision Research @ IPLAB.

Egocentric Point of Interest Recognition in Cultural Sites