DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 155
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch May 7, 2024 • 115
view article Article M2.1: Multilingual and Multi-Task Coding with Strong Generalization 22 days ago • 37
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 Oct 30, 2025 • 42
ToolRM Collection ToolRM: Towards Agentic Tool-Use Reward Modeling • 6 items • Updated 13 days ago • 4
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published Sep 29, 2025 • 38
🦫 PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! • 9 items • Updated Oct 1, 2025 • 3
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Paper • 2510.24645 • Published Oct 28, 2025 • 10
Spurious Rewards: Rethinking Training Signals in RLVR Paper • 2506.10947 • Published Jun 12, 2025 • 2
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30, 2025 • 6
Data-Efficient RLVR via Off-Policy Influence Guidance Paper • 2510.26491 • Published Oct 30, 2025 • 11
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 34