From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning Paper • 2603.03825 • Published 9 days ago • 9
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17, 2025 • 42
EpiCoder: Encompassing Diversity and Complexity in Code Generation Paper • 2501.04694 • Published Jan 8, 2025 • 18
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published Jan 8, 2025 • 53
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 11 days ago • 229
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 160
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 62