lianghua (Yulianghua)

upvoted an article 2 months ago

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

•

Jan 27

• 75

upvoted an article 10 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 775

upvoted 2 articles about 1 year ago

Article

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

NormalUhr

•

Feb 28, 2025

• 19

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 297

upvoted a collection about 2 years ago

Zephyr ORPO

Collection

Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12, 2024 • 18

Yulianghua

AI & ML interests

Organizations

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

SmolLM3: smol, multilingual, long-context reasoner

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

Open R1: Update #3

Zephyr ORPO

Yulianghua

AI & ML interests

Organizations

lianghua's activity

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

SmolLM3: smol, multilingual, long-context reasoner

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

Open R1: Update #3