MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents Paper • 2603.05697 • Published Mar 5
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 7 days ago • 85
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 7 days ago • 85
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors Paper • 2602.21778 • Published Feb 25 • 14
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration Paper • 2512.02981 • Published Dec 2, 2025 • 1
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration Paper • 2512.02981 • Published Dec 2, 2025 • 1
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9 • 2
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 188
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 96
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 136
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation Paper • 2503.19065 • Published Mar 24, 2025 • 11
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation Paper • 2503.19065 • Published Mar 24, 2025 • 11