FacebookFacebook group TwitterTwitter
ICVSS Computer Vision - Where are we?

Features for 6D pose and 3D reconstruction: towards lightweight, monocular and unsupervised

Federico Tombari

Technical University of Munich, DE

Abstract

Features that capture the geometry and shape of surrounding objects and scenes have been for long studied in computer vision, enabling tasks such as 6D object pose estimation, SLAM and 3D reconstruction. Recently, their use has been fueled by novel applications in the field of robotics, autonomous driving, healthcare and augmented reality, which pose important challenges in terms of real-time requirements, hardware constraints, and guaranteed performance to deal with safety-critical use cases.

In this talk, we first walk through the recent progress of features extracted from common 3D data types (such as point clouds and voxel maps) for object pose estimation and reconstruction. In particular, we analyze their evolution from hand-crafted to deep learned, and highlight challenges of learning from unorganised 3D data. Then, we outline emerging trends in the field which are pushing current techniques to be unsupervised, lightweight and monocular. We observe how the state of the art is replacing 3D with monocular data to achieve 6D object pose estimation and tracking, and how features learned from 3D cues can allow monocular real-time reconstruction and mapping. In both cases, we will focus on the challenges posed by effectively training neural networks for these tasks, i.e. the need to rely on synthetic and large-scale datasets and to overcome the domain shift problem. Finally, we discuss a selection of emerging and novel applications relying on features learned from 3D data for scene understanding such as semantic SLAM, 3D segmentation and shape/scene completion.