Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

R. Leonardi1, F. Ragusa1,2, A. Furnari1,2, G. M. Farinella1,2

1FPV@IPLAB, DMI - University of Catania, Italy
2Next Vision s.r.l. - Spinoff of the University of Catania, Italy


We consider the problem of detecting Egocentric Human-Object Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection, we propose a method that detects the hands, the objects in the scene, and determines which objects are currently involved in an interaction. We compare the performance of our method with a set of state-of-the-art baselines. Results show that using a synthetic dataset improves the performance of an EHOI detection system, especially when few real data are available.

Synthetic Dataset generation pipeline

We developed a tool in Blender which takes as input the 3D models of the objects and the environment and generates synthetic EHOIs along with different data, including:
  • Photo-realistic RGB images
  • Depth maps
  • Semantic segmentation masks
  • Objects bounding boxes and categories
  • Hands bounding boxes and attributes
    • Hand side (Left/Right)
    • Contact state (In contact with an object/No contact)
  • Distance between hands and objects in the 3D space

Download Synthetic Dataset

Statistics of the Synthetic Dataset
  • 20,000 images
  • 29,034 hands
    • 14,589 hands involved in an interaction
  • 123,827 object instances
    • 14,589 active object instances
  • 19 object categories

Download Real Dataset

Statistics of the Real Dataset
  • 8 videos
  • 7 subjects
  • Average duration: 28.37 minutes
  • 3 hours and 47 minutes of video recordings
  • Video Acquisition: 2272x1278 at 30.00 fps
  • 19 object categories
  • 3,056 labeled frames
  • 4,503 hands
    • 3,331 hands involved in an interaction
  • 17,598 object instances
    • 2,872 active object instances


R. Leonardi, F. Ragusa, A. Furnari, G. M. Farinella. Egocentric Human-Object Interaction Detection Exploiting Synthetic Data, International Conference on Image Analysis and Processing (ICIAP) 2021. Download the paper.



This research has been supported by Next Vision s.r.l., by the project MISE - PON I&C 2014-2020 - Progetto ENIGMA - Prog n. F/190050/02/X44 – CUP: B61B19000520008, and by Research Program Pia.ce.ri. 2020/2022 Linea 2 - University of Catania.