Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
Tinted Frames: Question Framing Blinds Vision-Language Models Paper • 2603.19203 • Published Mar 19 • 17
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40