Abstract

ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial domain. We provide benchmarks on four tasks related to human behavior: 1) untrimmed temporal detection of human-object interactions, 2) egocentric human-object interaction detection, 3) short-term object interaction anticipation and 4) natural language understanding of intents and entities. Baseline results show that the ENIGMA-51 dataset poses a challenging benchmark to study human behavior in industrial scenarios.

Code Data
Abstract

ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial domain. We provide benchmarks on four tasks related to human behavior: 1) untrimmed temporal detection of human-object interactions, 2) egocentric human-object interaction detection, 3) short-term object interaction anticipation and 4) natural language understanding of intents and entities. Baseline results show that the ENIGMA-51 dataset poses a challenging benchmark to study human behavior in industrial scenarios.

Code Data

The ENIGMA-51 Dataset

ENIGMA-51 is a new dataset composed of 51 egocentric videos acquired in an industrial environment which simulates a real industrial laboratory. The dataset was acquired by 19 subjects who wore a Microsoft HoloLens 2 headset and followed audio and AR instructions provided by the device to complete repairing procedures on electrical boards. The subjects interact with industrial tools such as an electric screwdriver and pliers, as well as with electronic instruments such as a power supply and an oscilloscope while executing the steps to complete a specific procedure. ENIGMA-51 has been annotated with a rich set of annotations which allows to study large variety of tasks, especially tasks related to human-object interactions.


Data Annotation

We labelled the ENIGMA-51 dataset with a rich set of fine-grained annotations that can be used and combined to study different aspects of human behavior.

  • 51 egocentric videos with a resolution of 2272x1278 pixels at 30 fps
  • 22 hours of video recordings
  • 45505 RGB frames
  • 2 procedures consisting of instructions that involve humans interacting with the objects
  • 14036 interactions temporally annotated indicating the verbs which describe the actions performed
  • 275135 objects and 56473 hands annotated with bounding boxes
  • 12597 interaction frames annotated with 14036 interactions and 9342 active objects
  • 37314 next-object interactions annotated in past frames
  • 4 basic verb classes, and 25 objects classes
  • 3D models of the environment and the objects

Egocentric
Videos

51

RGB
Images

45505

Hand
annotations

56473

Object
annotations

275135

Interaction
frames

12597

Next-object Interactions

37314

Download
Data
Videos
Download
Data
Frames
Download
Data
Annotations
Download
Data
Textual Procedures
Download
Data
3D Models
Download
Features
CLIP Features
Download
Features
DINOV2 Features
Download
Additional
Hands Keypoints (MMPOSE)
Download
Additional
Segmentation Masks (SAMHQ)
Download


Paper

F. Ragusa, R. Leonardi, M. Mazzamuto, C. Bonanno, R. Scavo, A. Furnari, G. M. Farinella. ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios. 2023. Cite our paper: ArXiv.

@article{ragusa2023enigma51,
    title={ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios}, 
    author={Francesco Ragusa and Rosario Leonardi and Michele Mazzamuto and Claudia Bonanno and Rosario Scavo and Antonino Furnari and Giovanni Maria Farinella},
    journal   = {IEEE Winter Conference on Application of Computer Vision (WACV)},
    year      = {2024}
}
   

Visit our page dedicated to First Person Vision Research for other related publications.

Supplementary

More details on the dataset, the annotation phase and the baselines can be found in the supplementary material associated to the publication.


People
Francesco
Ragusa
FPV@IPLAB
Next Vision s.r.l.
Rosario
Leonardi
FPV@IPLAB
Next Vision s.r.l.
Michele
Mazzamuto
FPV@IPLAB
Next Vision s.r.l.
Claudia
Bonanno
FPV@IPLAB
Next Vision s.r.l.
Antonino
Furnari
FPV@IPLAB
Next Vision s.r.l.
Giovanni Maria
Farinella
FPV@IPLAB
Next Vision s.r.l.