VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct Paper • 2606.23543 • Published 4 days ago • 6
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 64
OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents Paper • 2601.18467 • Published Jan 26 • 1
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published Mar 2 • 34
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent Paper • 2512.20745 • Published Dec 23, 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Paper • 2505.15431 • Published May 21, 2025 • 2
Towards a Unified Paradigm: Integrating Recommendation Systems as a New Language in Large Models Paper • 2412.16933 • Published Dec 22, 2024
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper • 2304.12244 • Published Apr 24, 2023 • 14
VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct Paper • 2606.23543 • Published 4 days ago • 6
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Paper • 2308.09583 • Published Aug 18, 2023 • 8
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena Paper • 2407.10627 • Published Jul 15, 2024
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Paper • 2505.15431 • Published May 21, 2025 • 2
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent Paper • 2512.20745 • Published Dec 23, 2025
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability Paper • 2606.19236 • Published 9 days ago • 12
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability Paper • 2606.19236 • Published 9 days ago • 12
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published Jan 7 • 14
AIR: Post-training Data Selection for Reasoning via Attention Head Influence Paper • 2512.13279 • Published Dec 15, 2025 • 2
Leveraging Large Language Models for NLG Evaluation: A Survey Paper • 2401.07103 • Published Jan 13, 2024 • 4