Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 13 days ago • 81
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering Paper • 2603.18558 • Published 8 days ago • 10
Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models Paper • 2602.02600 • Published Feb 1 • 13
HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering Paper • 2512.14870 • Published Dec 16, 2025 • 15