Collecting and labeling a large amount of data can be very expensive in terms of time and cost. Driven by this motivation, we investigated how using synthetic data in first-person vision to train models can reduce the need for labeled real domain-specific data. We propose a pipeline for generating and labeling synthetic human-object interactions from a first-person point of view using 3D models of the target environment and objects, which can be cheaply collected using commercial scanners.
We present EgoISM-HOI a new multimodal synthetic-real dataset of Egocentric Human-Object interactions, which contains a total of 39,304 RGB images, 23,356 depth maps and instance segmentation masks, 59,860 hand annotations, 237,985 object instances across 19 object categories and 35,416 egocentric human-object interactions.
DownloadIf you find the code, pre-trained models, or the EgoISM-HOI dataset useful for your research, please citing the following paper:
@article{leonardi2024synthdata, title = {Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario}, journal = {Computer Vision and Image Understanding}, volume = {242}, pages = {103984}, year = {2024}, issn = {1077-3142}, doi = {https://doi.org/10.1016/j.cviu.2024.103984}, author = {Rosario Leonardi and Francesco Ragusa and Antonino Furnari and Giovanni Maria Farinella}, }
Additionally, consider citing the original paper:
@inproceedings{leonardi2022egocentric, title={Egocentric Human-Object Interaction Detection Exploiting Synthetic Data}, author={Leonardi, Rosario and Ragusa, Francesco and Furnari, Antonino and Farinella, Giovanni Maria}, booktitle={Image Analysis and Processing -- ICIAP 2022}, pages={237--248}, year={2022} }
Visit our page dedicated to First Person Vision Research for other related publications.