OpenThinker-Agent-Complete Collection OpenThinkerAgent-32B SFT data-scaling ladder (models + matching datasets, 316->100K) plus TaskTrove & AgentTrove sources. • 15 items • Updated 15 days ago • 4
OpenThinker-Agent2 Collection OpenThinker-Agent2: agentic SFT/RL datasets and 8B/32B models (cold-start SFT, RL, and the OpenThinkerAgent-32B release). • 11 items • Updated 14 days ago • 3
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 103
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published Feb 23 • 58
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 55
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 159
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 147
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published Jan 14, 2025 • 19
EmbedLLM: Learning Compact Representations of Large Language Models Paper • 2410.02223 • Published Oct 3, 2024 • 3