sian cao

sonald

1 33 20

AI & ML interests

AI, big data, OS

Recent Activity

upvoted an article about 1 month ago

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

upvoted an article about 1 month ago

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

upvoted an article 4 months ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

View all activity

Organizations

upvoted 2 articles about 1 month ago

Article

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego

•

May 27

• 42

Article

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

ariG23498, sayakpaul, sergiopaniego, ror, pcuenq

•

May 29

• 132

upvoted an article 4 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 168

upvoted an article 5 months ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

andito, mfarre, merve

•

Jan 23, 2025

• 192

upvoted a paper 6 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 234

upvoted 6 articles 6 months ago

Article

Deriving the DPO Loss from First Principles

garg-aayush

•

Dec 30, 2025

• 8

Article

Deriving the PPO Loss from First Principles

garg-aayush

•

Dec 25, 2025

• 46

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 129

Article

Efficient MultiModal Data Pipeline

ariG23498, lusxvr, andito, sergiopaniego, pcuenq

•

Jul 8, 2025

• 72

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

upvoted 3 articles 7 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

sirluk

•

Oct 7, 2024

• 71

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

nvidia

•

Dec 15, 2025

• 113

Article

Putting RL back in RLHF

vwxyzjn, ArashAhmadian

•

Jun 12, 2024

• 112

upvoted a paper 7 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 306

upvoted 3 articles 8 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-AI

•

Oct 30, 2025

• 43

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 262

upvoted 2 articles 9 months ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante

•

Aug 5, 2025

• 513

Article

Learn the Hugging Face Kernel Hub in 5 Minutes

drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb

•

Jun 12, 2025

• 164

sian cao

AI & ML interests

Recent Activity

Organizations

sonald's activity

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Deriving the DPO Loss from First Principles

Deriving the PPO Loss from First Principles

From GRPO to DAPO and GSPO: What, Why, and How

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Putting RL back in RLHF

Why Did MiniMax M2 End Up as a Full Attention Model?

Aligning to What? Rethinking Agent Generalization in MiniMax M2

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Welcome GPT OSS, the new open-source model family from OpenAI!

Learn the Hugging Face Kernel Hub in 5 Minutes