Panoptic Segmentation in Industrial Environments using Synthetic and Real Data

Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern vision systems can rely on advanced algorithms for object detection, semantic and panoptic segmentation, these methods still require large quantities of domain-specific labeled data, which can be difficult to obtain in industrial scenarios. Motivated by this observation, we propose a pipeline which allows to generate synthetic images from 3D models of real environments and real objects. The generated images are automatically labeled and hence effortless to obtain. Exploiting the proposed pipeline, we generate a dataset comprising synthetic images automatically labeled for panoptic segmentation. This set is complemented by a small number of manually labeled real images for fine-tuning. Experiments show that the use of synthetic images allows to drastically reduce the number of real images needed to obtain reasonable panoptic segmentation performance.

Dataset

Dataset generation pipeline
To study the considered problem, we have created a dataset comprised of two parts: real images with segmented masks manually annotated and synthetic images with automatically generated annotations.

Red box: generation of the real dataset: (1) acquisition of real images using HoloLens2; (2) extraction of frames and related camera poses; (3) annotation of the segmentation masks.
Blue box: generation of the synthetic dataset: (4) acquisition of the 3D model using a Matterport3D scanner; (5) generation of the 3D model; (6) semantic labelling of the 3D model using Blender; (7) generation of a random tour inside the 3D model; (8) generation of synthetic frames and semantic labels. (Rendering through Blender) the 3D model and the positions are processed by a script for the generation of frames and semantic labels; (Conversion in COCO format) semantic labels are processed by a script for extracting JSON annotations in COCO format.

Real Dataset

1665 RGB Images
1665 Semantic Images
1665 Panoptic Images
Image Resolution: 1280x720
35 classes represented
Download

Synthetic Dataset

25.079 RGB Images
25.079 Semantic Images
25.079 Panoptic Images
Image Resolution: 1280x720
35 classes represented
Download

Task

In this paper, we investigate the impact of synthetic data for the development of domain-specific applications in industrial environments. As a primary task, which can be useful in many downstream applications, we propose to study panoptic segmentation, which consists in identifying the main semantic elements in the scene, including both structural parts, such as walls, and object instances such as tools and equipment. Specifically, we study the suitability of training a panoptic segmentation approach using a large amount of labeled synthetic data and very small amount of labeled real data.

Paper

C. Quattrocchi, D. Di Mauro, A. Furnari, G. M. Farinella. Panoptic Segmentation in Industrial Environments using Synthetic and Real Data. International Conference on Image Analysis and Processing (ICIAP) 2021. Download the paper.

Acknowledgement

This research is supported by Next Vision s.r.l., and the project MEGABIT - PIAno di inCEntivi per la RIcerca di Ateneo 2020/2022 (PIACERI) – linea di intervento 2, DMI - University of Catania.

Related Work

Visit our page dedicated to First Person Vision Research @ IPLAB for related publications.