Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 18 days ago • 61
UltraData Collection Ultra Scale, Ultra Quality, Ultra Coverage • 11 items • Updated 29 days ago • 98
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 356
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published Jan 9 • 60
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published Jan 13 • 53
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published Jan 20 • 57