Zhenghao Xu

zhenghaoxu

·

AI & ML interests

None yet

Organizations

upvoted an article 4 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

+7

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 165

upvoted 2 papers 5 months ago

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Paper • 2602.05933 • Published Feb 5 • 6

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 45