MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model Paper • 2408.12321 • Published Aug 22, 2024
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Paper • 2411.11909 • Published Nov 17, 2024 • 22
Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning Paper • 2509.23322 • Published Sep 27, 2025 • 1
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published 1 day ago • 141
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents Paper • 2510.24563 • Published Oct 28, 2025 • 23
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published 1 day ago • 141
Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning Paper • 2509.23322 • Published Sep 27, 2025 • 1