ACVR 2024 / Program

Program


9:00	Opening Remarks
9:05	Keynote 1: DEEPAK PATHAK, CARNEGIE MELLON UNIVERSITY, US - A Bottom-up Approach to Sensorimotor Learning
9:35	ORAL SESSION 1
10:05	Abstracts
10:30	POSTER SESSION AND COFFEE BREAK
11:30	Keynote 2: LAMBERTO BALLAN, UNIVERSITY OF PADOVA, IT - From Context-aware Motion Prediction to Embodied Visual Navigation
12:00	ORAL SESSION 2
12:30	Doctoral Consortium
13:00	Closing Remarks

Oral Session 1* (8 minutes presentation + 2 minutes Q&A)

Modelling the Distribution of Human Motion for Sign Language Assessment
OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare
BurnSafe: Automatic Assistive Tool for Burn Severity Assessment by Semantic Segmentation

Abstracts* (5 minutes presentation)

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Audio-Driven Scene Generation: Transforming Original Images with Sound-Induced Changes
Transformer-based encoder for improving perception in visual prosthesis

Poster Session

Poster size: the posters should be portrait (vertical), with a maximum size of 90x180 cm.

Enhancing Human-Robot Collaborative Search through Efficient Space Sharing with On-demand Interaction
Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models
Hand Gesture Recognition using Dual Graph Hierarchical Edges Representation and Graph Transformer Network
DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism
Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents
Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation
HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions
Aligning Object Detector Bounding Boxes with Human Preference
GSK-C2F: Graph Skeleton Modelization for Action Segmentation and Recognition using a Coarse-to-Fine strategy
Machine Learning Approaches for Analyzing Physiological Data in Remote Patient Monitoring
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Video Editing for Video Retrieval
Towards Wearable Multi-Modal Human Activity Recognition with Deep Fusion Networks
Segmenting Object Affordances: Reproducibility and Sensitivity to Scale
Target-Oriented Object Grasping via Multimodal Human Guidance
Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets
OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation
(Abstract) Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Procedural Video by Detecting Unpredictable Gaze
(Abstract) Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
(Abstract) Action Scene Graphs for Long-Form Understanding of Egocentric Videos
(Abstract) Investigating Semantic Segmentation Models to Assist Visually Impaired People

Oral Session 2* (8 minutes presentation + 2 minutes Q&A)

REST–HANDS: Rehabilitation with Egocentric Vision using Smartglasses for Treatment of Hands after Surviving Stroke
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning
ExeChecker: Where Did I Go Wrong?

Doctoral Consortium* (10 minutes presentation including questions)

*All papers that will be presented as oral/abstract/doctoral will also be presented as poster

UNICT USC CNS ISASI UCSD