Abstract

Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial domain. To overcome the lack of public datasets in this context, we propose a pipeline and a tool able to generate synthetic images of EHOIs paired with several annotations and data signals. Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility of synthetic data, we designed an EHOI detection method that uses the different multimodal signals available within our dataset. Our study shows that exploiting synthetic data to pre-train the proposed system significantly improves performance when tested on real-world data. Additional experiments show that the proposed approach outperforms classic baseline approaches based on state-of-the-art class agnostic methods.

Code Data
Abstract

Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial domain. To overcome the lack of public datasets in this context, we propose a pipeline and a tool able to generate synthetic images of EHOIs paired with several annotations and data signals. Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility of synthetic data, we designed an EHOI detection method that uses the different multimodal signals available within our dataset. Our study shows that exploiting synthetic data to pre-train the proposed system significantly improves performance when tested on real-world data. Additional experiments show that the proposed approach outperforms classic baseline approaches based on state-of-the-art class agnostic methods.

Code Data

EHOI Generation Pipeline

Collecting and labeling a large amount of data can be very expensive in terms of time and cost. Driven by this motivation, we investigated how using synthetic data in first-person vision to train models can reduce the need for labeled real domain-specific data. We propose a pipeline for generating and labeling synthetic human-object interactions from a first-person point of view using 3D models of the target environment and objects, which can be cheaply collected using commercial scanners.


Dataset

EgoISM-HOI

We present EgoISM-HOI a new multimodal synthetic-real dataset of Egocentric Human-Object interactions, which contains a total of 39,304 RGB images, 23,356 depth maps and instance segmentation masks, 59,860 hand annotations, 237,985 object instances across 19 object categories and 35,416 egocentric human-object interactions.

Download
RGB Images

39304

Hand annotations

59860

Object annotations

237985

EHOI annotations

35416


Proposed approach

Paper

If you find the code, pre-trained models, or the EgoISM-HOI dataset useful for your research, please citing the following paper:

@article{leonardi2024synthdata,
title = {Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario},
journal = {Computer Vision and Image Understanding},
volume = {242},
pages = {103984},
year = {2024},
issn = {1077-3142},
doi = {https://doi.org/10.1016/j.cviu.2024.103984},
author = {Rosario Leonardi and Francesco Ragusa and Antonino Furnari and Giovanni Maria Farinella},
}

Additionally, consider citing the original paper:

@inproceedings{leonardi2022egocentric,
    title={Egocentric Human-Object Interaction Detection Exploiting Synthetic Data},
    author={Leonardi, Rosario and Ragusa, Francesco and Furnari, Antonino and Farinella, Giovanni Maria},
    booktitle={Image Analysis and Processing -- ICIAP 2022},
    pages={237--248},
    year={2022}
    }

Visit our page dedicated to First Person Vision Research for other related publications.


People
Rosario Leonardi
FPV@IPLAB
Next Vision s.r.l.
Francesco Ragusa
FPV@IPLAB
Next Vision s.r.l.
Antonino Furnari
FPV@IPLAB
Next Vision s.r.l.
Giovanni Maria Farinella
FPV@IPLAB
Next Vision s.r.l.