Structured Over Scale: Learning Spatial Reasoning from Educational Video Paper • 2601.23251 • Published Jan 30
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models Paper • 2603.18850 • Published 15 days ago