8 15 11

Richard Zhuang PRO

RZ412

https://richardzhuang0412.github.io

AI & ML interests

LLM Routing, LLM + Games, Post-Training, Agents

Recent Activity

upvoted a collection about 18 hours ago

OpenThinker-Agent-Complete

upvoted a collection about 18 hours ago

OpenThinker-Agent2

upvoted a paper about 23 hours ago

OpenThoughts-Agent: Data Recipes for Agentic Models

View all activity

Organizations

upvoted 2 collections about 18 hours ago

OpenThinker-Agent-Complete

Collection

OpenThinkerAgent-32B SFT data-scaling ladder (models + matching datasets, 316->100K) plus TaskTrove & AgentTrove sources. • 15 items • Updated 15 days ago • 4

OpenThinker-Agent2

Collection

OpenThinker-Agent2: agentic SFT/RL datasets and 8B/32B models (cold-start SFT, RL, and the OpenThinkerAgent-32B release). • 11 items • Updated 14 days ago • 7

upvoted a paper about 23 hours ago

OpenThoughts-Agent: Data Recipes for Agentic Models

Paper • 2606.24855 • Published 2 days ago • 31

upvoted a paper 3 months ago

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published Feb 24 • 103

upvoted a paper 4 months ago

SkillOrchestra: Learning to Route Agents via Skill Transfer

Paper • 2602.19672 • Published Feb 23 • 58

upvoted 2 collections 6 months ago

OpenThinker-Agent

Collection

5 items • Updated 14 days ago • 11

Olmo 3 Post-training

Collection

All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 56

upvoted a paper 7 months ago

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Paper • 2512.04324 • Published Dec 3, 2025 • 159

upvoted a paper 9 months ago

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 147

upvoted an article 11 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 780

upvoted 2 collections 11 months ago

Reasoning Datasets

Collection

50 items • Updated Jun 8, 2025 • 12

Reasoning Models

Collection

53 items • Updated Jun 8, 2025 • 1

upvoted an article about 1 year ago

Article

Reasoning Datasets Competition

bespokelabs

•

Apr 9, 2025

• 38

upvoted 2 papers over 1 year ago

PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14, 2025 • 19

EmbedLLM: Learning Compact Representations of Large Language Models

Paper • 2410.02223 • Published Oct 3, 2024 • 3

Richard Zhuang PRO

AI & ML interests

Recent Activity

Organizations

RZ412's activity

SmolLM3: smol, multilingual, long-context reasoner

Reasoning Datasets Competition