DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution Paper • 2405.16071 • Published May 25, 2024 • 3
ControlCap: Controllable Region-level Captioning Paper • 2401.17910 • Published Jan 31, 2024 • 1
Balancing Understanding and Generation in Discrete Diffusion Models Paper • 2602.01362 • Published 3 days ago • 11
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 2 days ago • 118
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 6 days ago • 139
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27