F. Ragusa1,2, A. Furnari1,2, G. M. Farinella1,2
Understanding worker’s behaviour in industrial environments is an underexplored topic due to the lack of public benchmark datasets. One of the most interesting information to know about users is which actions they are performing. In this context, we proposed a competition on MECCANO, a multimodal egocentric dataset acquired in an industrial-like domain in which subjects assembly a toy model of a motorbike. Each different signal provides additional information about the observed environment and the camera wearer, such as semantic information (RGB), 3D information of the environment and the objects (depth), as well as the user’s attention (gaze) which can be exploited to recognize human’s actions.
Time | Event | Authors |
---|---|---|
14:00-14:15 | Opening Session | |
14:15-14:50 |
Keynote |
Oswald Lanz (Free University of Bozen-Bolzano & Covision Lab) |
14:50-15:05 | Report on the challenge and announcement of the winners | |
Accepted Reports Presentation – Session 1 | ||
15:05-15:15 |
A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition | Benjia Zhou, Yang Zhao, Jun Wan (Macau University of Science and Technology, Xiamen University, Institute of Automation, Chinese Academy of Sciences- CASIA) (Remotely) |
15:15-15:25 |
Action Recognition on the MECCANO Dataset with Gate-Shift-Fuse Networks |
Edoardo Bianchi, Oswald Lanz (Free University of Bozen-Bolzano) (In person) |
15:25-16:00 Coffee Break | ||
Accepted Reports Presentation – Session 2 | ||
16:00-16:10 |
Ensamble Modeling for Multimodal Visual Action Recognition |
Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah (Center for Research in Computer Vision, University of Central Florida) (Remotely) |
16:10-16:20 |
A novel UniFormer architecture for action recognition on the MECCANO benchmark |
Yaxin Hu, Erhardt Barth (University of Lubeck, Pattern Recognition Company GmbH) (In person) |
16:20-16:30 |
GADDCCANet for Multimodal Action Recognition |
Kai Liu, Lei Gao, Ling Guan (Toronto Metropolitan University) (Remotely) |
16:30-16:45 | Closing Session |
Methods are expected to take as input RGB, Depth, and Gaze signals to predict an action. Algorithms may also opt to process only a subset of these signals. The following figure shows the architecture of the baseline used to achieve this task using RGB, Depth and Gaze signals.
The participants will compete for the proposed task to obtain the best score on the provided test split of the MECCANO dataset available at the webpage of the challenge. Participants must submit their results via email to the organizers through a technical report (see emails below). Reports should evaluate action recognition using Top-1 and Top-5 accuracy computed on the whole test set. Submissions will be sorted based on Top-1 accuracy on the test split. All technical reports together with results will be published online in the web page of the challenge. The TOP-3 best results will be asked to release the code of their methods to ensure repetibility of experiments. Authors must use the ICIAP format for submitting their Tecnical Reports. The maximum number of pages is 4 excluding references.
Technical reports to participate the challenge as well as questions have to be submitted via email to the organizers: {francesco.ragusa, antonino.furnari, giovanni.farinella}@unict.it.
MECCANO Multimodal comprises multimodal egocentric data acquired in an industrial-like domain in which subjects built a toy model of a motorbike. The multimodality is characterized by the gaze signal, depth maps and RGB videos acquired simultaneously. We considered 20 object classes which include the 16 classes categorizing the 49 components, the two tools (screwdriver and wrench), the instructions booklet and a partial_model class.
Additional details related to the MECCANO:
Competition opening, Training/Test data available: | May 20 2023 |
---|---|
Test results submission: | July 31 2023 |
Notification of accepted reports: | August 07 2023 |
Announcement of the winners: | During the conference |
F. Ragusa, A. Furnari, G. M. Farinella. MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain. Computer Vision and Image Understanding 2023. CVIU.
F. Ragusa, A. Furnari, S. Livatino, G. M. Farinella. The MECCANO Dataset: Understanding Human-Object Interactions From Egocentric Videos in an Industrial-Like Domain. CVF or Arxiv.