The Doctor is in, officially Dr. Khoa Vo! πΌπ¨βπ
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Open-fusion: Real-time open-vocabulary 3d mapping and queryable scene representation
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Topic: Deformable Articulations Network for Dynamic 3D Human Reconstruction from RGB-D Video
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning (Oral Session)
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
AISFormer: Amodal Instance Segmentation with Transformer
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation (Oral Session)