VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph CaptioningJan 1, 2023ยทKashu Yamazaki*,Khoa Vo*,Sang Truong,Bhiksha Raj,Ngan Leยท 0 min read PDF Preprint Code Image credit: UnsplashPublicationAAAI (2023)Last updated on Jan 1, 2023Video Paragraph Captioning ← AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation Jan 1, 2023VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Oct 16, 2022 →