AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Paper • 2307.16368 • Published Jul 31, 2023 • 13
Vamos: Versatile Action Models for Video Understanding Paper • 2311.13627 • Published Nov 22, 2023 • 2
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning Paper • 2309.06597 • Published Sep 12, 2023
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models Paper • 2405.20305 • Published May 30, 2024
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models Paper • 2407.14502 • Published Jul 19, 2024
Disentangled Neural Relational Inference for Interpretable Motion Prediction Paper • 2401.03599 • Published Jan 7, 2024
MERGE: Guided Vision-Language Models for Multi-Actor Event Reasoning and Grounding in Human-Robot Interaction Paper • 2603.18988 • Published 13 days ago