Building a Precise Video Language with Human-AI Oversight Paper ⢠2604.21718 ⢠Published Apr 22 ⢠17
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers Paper ⢠2412.00142 ⢠Published Nov 28, 2024 ⢠5
Towards Understanding Camera Motions in Any Video Paper ⢠2504.15376 ⢠Published Apr 21, 2025 ⢠157
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis Paper ⢠2504.13157 ⢠Published Apr 17, 2025 ⢠20
Motion Prompting: Controlling Video Generation with Motion Trajectories Paper ⢠2412.02700 ⢠Published Dec 3, 2024 ⢠16
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper ⢠2410.14669 ⢠Published Oct 18, 2024 ⢠39
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation Paper ⢠2406.13743 ⢠Published Jun 19, 2024 ⢠2
Evaluating Text-to-Visual Generation with Image-to-Text Generation Paper ⢠2404.01291 ⢠Published Apr 1, 2024 ⢠6
Language Models as Black-Box Optimizers for Vision-Language Models Paper ⢠2309.05950 ⢠Published Sep 12, 2023 ⢠4