ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Topic: Deformable Articulations Network for Dynamic 3D Human Reconstruction from RGB-D Video
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning (Oral Session)
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
AISFormer: Amodal Instance Segmentation with Transformer
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation (Oral Session)