VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Jan 1, 2023·

Kashu Yamazaki*

,

Khoa Vo*

,

Sang Truong

,

Bhiksha Raj

,

Ngan Le

· 0 min read

PDF Preprint Code

Image credit: Unsplash

Publication

AAAI (2023)

Last updated on Jan 1, 2023

Video Paragraph Captioning

← AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation Jan 1, 2023

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Oct 16, 2022 →