Panoptic Segmentation in Industrial Environments using Synthetic and Real Data

C. Quattrocchi1, D. Di Mauro1, A. Furnari1,2, G. M. Farinella1,2

1IPLab, Department of Mathematics and Computer Science - University of Catania, IT
2Next Vision s.r.l. - Spinoff of the University of Catania, Italy

Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern vision systems can rely on advanced algorithms for object detection, semantic and panoptic segmentation, these methods still require large quantities of domain-specific labeled data, which can be difficult to obtain in industrial scenarios. Motivated by this observation, we propose a pipeline which allows to generate synthetic images from 3D models of real environments and real objects. The generated images are automatically labeled and hence effortless to obtain. Exploiting the proposed pipeline, we generate a dataset comprising synthetic images automatically labeled for panoptic segmentation. This set is complemented by a small number of manually labeled real images for fine-tuning. Experiments show that the use of synthetic images allows to drastically reduce the number of real images needed to obtain reasonable panoptic segmentation performance.


Dataset generation pipeline
To study the considered problem, we have created a dataset comprised of two parts: real images with segmented masks manually annotated and synthetic images with automatically generated annotations.

Red box: generation of the real dataset: (1) acquisition of real images using HoloLens2; (2) extraction of frames and related camera poses; (3) annotation of the segmentation masks.
Blue box: generation of the synthetic dataset: (4) acquisition of the 3D model using a Matterport3D scanner; (5) generation of the 3D model; (6) semantic labelling of the 3D model using Blender; (7) generation of a random tour inside the 3D model; (8) generation of synthetic frames and semantic labels. (Rendering through Blender) the 3D model and the positions are processed by a script for the generation of frames and semantic labels; (Conversion in COCO format) semantic labels are processed by a script for extracting JSON annotations in COCO format.

Real Dataset

  • 1665 RGB Images
  • 1665 Semantic Images
  • 1665 Panoptic Images
  • Image Resolution: 1280x720
  • 35 classes represented
  • Download
Synthetic Dataset
  • 25.079 RGB Images
  • 25.079 Semantic Images
  • 25.079 Panoptic Images
  • Image Resolution: 1280x720
  • 35 classes represented
  • Download


In this paper, we investigate the impact of synthetic data for the development of domain-specific applications in industrial environments. As a primary task, which can be useful in many downstream applications, we propose to study panoptic segmentation, which consists in identifying the main semantic elements in the scene, including both structural parts, such as walls, and object instances such as tools and equipment. Specifically, we study the suitability of training a panoptic segmentation approach using a large amount of labeled synthetic data and very small amount of labeled real data.


C. Quattrocchi, D. Di Mauro, A. Furnari, G. M. Farinella. Panoptic Segmentation in Industrial Environments using Synthetic and Real Data. International Conference on Image Analysis and Processing (ICIAP) 2021. Download the paper.


This research is supported by Next Vision s.r.l., and the project MEGABIT - PIAno di inCEntivi per la RIcerca di Ateneo 2020/2022 (PIACERI) – linea di intervento 2, DMI - University of Catania.