MR. Video: "MapReduce" is the Principle for Long Video Understanding Paper • 2504.16082 • Published Apr 22 • 5
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Paper • 2504.11457 • Published Apr 15 • 1
Frozen Transformers in Language Models Are Effective Visual Encoder Layers Paper • 2310.12973 • Published Oct 19, 2023 • 1
Situational Awareness Matters in 3D Vision Language Reasoning Paper • 2406.07544 • Published Jun 11, 2024 • 1
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26, 2024 • 20