Learning digital humans that act and interact
Michael J. Black
Max Planck Institute for Intelligent Systems and Meshcapade GmbH, DEU
Abstract
Augmented and virtual reality will require artificial humans that interact with real humans as well as with real and virtual 3D worlds. This requires a real-time understanding of humans and scenes as well as the generation of natural and appropriate behavior. We approach the problem of creating such embodied human behavior through capture, modeling, and synthesis. First, we learn realistic and expressive 3D human avatars from 3D scans. We then train neural networks to estimate human pose and shape from images and video. Specifically, we focus on humans interacting with each other and the 3D world. By capturing people in action, we are able to train networks to model and generate human movement and human-scene interaction. To validate our models, we synthesize virtual humans in novel 3D scenes. The goal is to produce realistic human avatars that interact with virtual worlds in ways that are indistinguishable from real humans. This course will introduce students to the current state of the art, including methods for estimating 3D humans from video, generative models of human movement driven by text and high-level goals, and large multi-modal foundation models for reasoning about humans and their behavior.