F. Ragusa1,2, A. Furnari1,2, G. M. Farinella1,2
Wearable cameras allow to acquire images and videos from the user’s perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human–object interactions. With the MECCANO dataset, we explored six different tasks including (1) Action Recognition, (2) Active Objects Detection and Recognition, (3) Egocentric Human–Objects Interaction Detection, (4) Egocentric Gaze Estimation, (5) Action Anticipation and (6) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms.
MECCANO Multimodal comprises multimodal egocentric data acquired in an industrial-like domain in which subjects built a toy model of a motorbike. The multimodality is characterized by the gaze signal, depth maps and RGB videos acquired simultaneously. We considered 20 object classes which include the 16 classes categorizing the 49 components, the two tools (screwdriver and wrench), the instructions booklet and a partial_model class.
Additional details related to the MECCANO:
@InProceedings{EgoProceLECCV2022,
author="Bansal, Siddhant and Arora, Chetan and Jawahar, C.V.",
title="My View is the Best View: Procedure Learning from Egocentric Videos",
booktitle = "European Conference on Computer Vision (ECCV)",
year="2022"
}
F. Ragusa, A. Furnari, G. M. Farinella. MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain. Computer Vision and Image Understanding 2023.
Cite our paper: CVIU or Arxiv.
@article{ragusa_MECCANO_2023,
doi = { https://doi.org/10.1016/j.cviu.2023.103764},
url = { https://iplab.dmi.unict.it/MECCANO/ },
year = { 2023 },
title = { MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain },
journal = { Computer Vision and Image Understanding (CVIU) },
author = { Francesco Ragusa and Antonino Furnari and Giovanni Maria Farinella },
}
@InProceedings{Ragusa_2021_WACV,
author = {Ragusa, Francesco and Furnari, Antonino and Livatino, Salvatore and Farinella, Giovanni Maria},
title = {The MECCANO Dataset: Understanding Human-Object Interactions From Egocentric Videos in an Industrial-Like Domain},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2021},
pages = {1569-1578}
}