Object Recognition and Reconstruction in the era of LLMs

Georgia Gkioxari

California Institute of Technology, USA

Abstract

In this talk, I will cover modern developments in visual perception in 2D and 3D. I will discuss the state of the methods in recognizing and localizing objects from images, perceiving them in 3D space, including predicting their size, pose and distance from camera, and then I will cover how to learn general representations to reconstruct their geometry, all from a single image. In the 2 hours of my lecture, I hope that students get a comprehensive understanding of how to design modern object recognition systems, leveraging ideas from large language models and guiding image-centric representations for the task of 2D and 3D recognition.