D. Damen, H. Doughty, , G. M. Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price, M. Wray, Scaling Egocentric Vision: The EPIC-KITCHENS Dataset, arXiv preprint arXiv:1804.02748, 2018. Web Page

We introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.2K object bounding boxes. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens.

Egocentric Shopping Cart Localization

E. Spera, A. Furnari, S. Battiato, G. M. Farinella. Egocentric Shopping Cart Localization . In International Conference on Pattern Recognition (ICPR). 2018. Web Page

We investigate the new problem of egocentric shopping cart localization in retail stores. We propose a novel large-scale dataset for image-based egocentric shopping cart localization. The dataset has been collected using cameras placed on shopping carts in a large retail store. It contains a total of 19,531 image frames, each labelled with its six Degrees Of Freedom pose. We study the localization problem by analysing how cart locations should be represented and estimated, and how to assess the localization results. We benchmark two families of algorithms: classic methods based on image retrieval and emerging methods based on regression.

Vision For Autonomous Navigation

Organizing Egocentric Videos of Daily Living Activities

A. Ortis, G. M. Farinella, V. D’Amico, L. Addesso, G. Torrisi, S. Battiato. Organizing egocentric videos of daily living activities. Pattern Recognition, 72(Supplement C), pp. 207 - 218. 2017. Web Page

We propose a system useful for the automatic organization of the egocentric videos acquired by a user over different days. The system is able to perform an unsupervised segmentation of each egocentric video in chapters by considering visual content. The video segments related to the different days are hence linked to produce graphs which are coherent with respect to the context in which the user acts.

Evaluation of Egocentric Action Recognition

A. Furnari, S. Battiato, G. M. Farinella, "How Shall We Evaluate Egocentric Action Recognition?", In International Workshop on Egocentric Perception, Interaction and Computing (EPIC) in conjunction with ICCV, 2017. Web Page

We propose a set of measures aimed to quantitatively and qualitatively assess the performance of egocentric action recognition methods. To improve exploitability of current action classification methods in the recognition scenario, we investigate how frame-wise predictions can be turned into action-based temporal video segmentations. Experiments on both synthetic and real data show that the proposed set of measures can help to improve evaluation and to drive the design of egocentric action recognition methods.

Next Active Object Prediction from Egocentric Video

A. Furnari, S. Battiato, K. Grauman, G. M. Farinella, Next-Active-Object Prediction from Egocentric Videos, Journal of Visual Communication and Image Representation, Volume 49, November 2017, Pages 401-411, 2017 Web Page

We address the problem of recognizing next-active-objects from egocentric videos. Even if this task is not trivial, the First Person Vision paradigm can provide important cues useful to address this challenge. Specifically, we propose to exploit the dynamics of the scene to recognize next-active-objects before an object interaction actually begins. Next-active-object prediction is performed by analyzing fixed-length trajectory segments within a sliding window. We investigate what properties of egocentric object motion are most discriminative for the task and evaluate the temporal support with respect to which such motion should be considered.

Visual Market Basket Analysis

V. Santarcangelo, G. M. Farinella, S. Battiato. Egocentric Vision for Visual Market Basket Analysis. Web Page

We introduce a new application scenario for egocentric vision: Visual Market Basket Analysis (VMBA). The main goal in the proposed application domain is the understanding of customers behaviors in retails from videos acquired with cameras mounted on shopping carts (which we call narrative carts). To properly study the problem and to set the first VMBA challenge, we introduce the VMBA15 dataset. The dataset is composed by 15 different egocentric videos acquired with narrative carts during users shopping in a retail.

Location-Based Temporal Segmentation of Egocentric Videos

A. Furnari, S. Battiato, G. M. Farinella, Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications, submitted to Journal of Visual Communication and Image Representation. Web Page

We propose a method to segment egocentric videos on the basis of the locations visited by user. To account for negative locations (i.e., locations not specified by the user), we propose an effective negative rejection methods which leverages the continuous nature of egocentric videos and does not require any negative sample at training time.

Recognizing Personal Locations from Egocentric Videos

A. Furnari, G. M. Farinella, S. Battiato, Recognition of Personal Locations from Egocentric Videos, IEEE Transactions on Human-Machine Systems. PDF Web Page

We study how personal locations arising from the user’s daily activities can be recognized from egocentric videos. We assume that few training samples are available for learning purposes. Considering the diversity of the devices available on the market, we introduce a benchmark dataset containing egocentric videos of 8 personal locations acquired by a user with 4 different wearable cameras. To make our analysis useful in real-world scenarios, we propose a method to reject negative locations.