Self-supervised Learning of Visual Representations for Perception and Action
Abhinav Gupta
Carnegie Mellon University, US
Abstract
In this talk, I will discuss how to learn representation for perception and action without using any manual supervision. First, I am going to discuss how we can learn ConvNets for vision in a completely unsupervised manner using auxiliary tasks. We are going to see how different forms of signal in the data can act as supervision: context, time, audio, stereo etc. Next, I am going to talk about how we can use a robot to physically explore the world and learn visual representations for classification/recognition tasks. Finally, I am going to talk about how we can perform end-to-end learning for actions using self-supervision. We will start with grasping and explore other tasks such as poking. Finally, we will see how this paradigm can be scaled up!