FacebookFacebook group TwitterTwitter
ICVSS Computer Vision in the Age of Large Language Models

Pixels in your feet

Antonio Torralba

Massachusetts Institute of Technology, USA

Abstract

Vision is the most important sense, or is it? Our perceptual system collects information via many different sensors (sounds, smells, forces, temperature, …) that provide very useful information. In particular, tactile sensing is critical for humans to perform everyday tasks there are yet no equivalent sensing platforms and large-scale datasets to train systems with tactile sensing. In the era of large language models, it is useful to take a step back to think about some important but forgotten senses. The lecture will have three parts - I will start reviewing the current state of vision in the era of large language models. I will then move away from language, and talk about vision in the context of other senses (such as tactile information), and, finally, I will end by moving away from all senses and describe ways in which a vision system might be able to learn using no real data at all.