Video-CoM: Interactive Video Reasoning via Chain of Manipulations Paper • 2511.23477 • Published 13 days ago • 2
Diversity Has Always Been There in Your Visual Autoregressive Models Paper • 2511.17074 • Published 21 days ago • 7
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards Paper • 2511.16672 • Published 21 days ago • 1
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 24
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications Paper • 2508.14039 • Published Aug 19
BiMediX2 Collection BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities • 7 items • Updated Oct 24 • 10
How Good are Foundation Models in Step-by-Step Embodied Reasoning? Paper • 2509.15293 • Published Sep 18
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts Paper • 2502.14865 • Published Feb 20 • 1
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding Paper • 2503.10621 • Published Mar 13