FacebookFacebook group TwitterTwitter
ICVSS Computer Vision in the Age of Large Language Models

Object Recognition and Reconstruction in the era of LLMs

Georgia Gkioxari

California Institute of Technology, USA

Abstract

In this talk, I will cover modern developments in visual perception in 2D and 3D. I will discuss the state of the methods in recognizing and localizing objects from images, perceiving them in 3D space, including predicting their size, pose and distance from camera, and then I will cover how to learn general representations to reconstruct their geometry, all from a single image. In the 2 hours of my lecture, I hope that students get a comprehensive understanding of how to design modern object recognition systems, leveraging ideas from large language models and guiding image-centric representations for the task of 2D and 3D recognition.