Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems Paper • 2606.05985 • Published 22 days ago • 10
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published Mar 16 • 187
In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published Mar 9 • 43
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 80
Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling Paper • 2602.02453 • Published Feb 2 • 36
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published Jan 16 • 10
PEAR: Phase Entropy Aware Reward for Efficient Reasoning Paper • 2510.08026 • Published Oct 9, 2025 • 9
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 137
On the Multi-turn Instruction Following for Conversational Web Agents Paper • 2402.15057 • Published Feb 23, 2024 • 1