1 31 20

sian cao

sonald

AI & ML interests

AI, big data, OS

Recent Activity

upvoted an article about 2 months ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

upvoted an article 3 months ago

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

liked a Space 4 months ago

HuggingFaceH4/blogpost-scaling-test-time-compute

View all activity

Organizations

upvoted an article about 2 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 152

upvoted an article 3 months ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

andito, mfarre, merve

•

Jan 23, 2025

• 192

liked a Space 4 months ago

Scaling test-time compute

📈

596

Run advanced search strategies to boost LLM problem solving

upvoted a paper 4 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 231

upvoted 3 articles 5 months ago

Article

Deriving the DPO Loss from First Principles

garg-aayush

•

Dec 30, 2025

• 6

Article

Deriving the PPO Loss from First Principles

garg-aayush

•

Dec 25, 2025

• 42

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 118

liked a Space 5 months ago

The Ultra-Scale Playbook

🌌

3.84k

The ultimate guide to training LLM on large GPU Clusters

upvoted 4 articles 5 months ago

Article

Efficient MultiModal Data Pipeline

ariG23498, lusxvr, andito, sergiopaniego, pcuenq

•

Jul 8, 2025

• 70

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 119

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

sirluk

•

Oct 7, 2024

• 71

liked a Space 5 months ago

The Smol Training Playbook

📚

3.18k

The secrets to building world-class LLMs

upvoted 2 articles 5 months ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

nvidia

•

Dec 15, 2025

• 111

Article

Putting RL back in RLHF

vwxyzjn, ArashAhmadian

•

Jun 12, 2024

• 111

upvoted a paper 5 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304

liked a dataset 6 months ago

PleIAs/SYNTH

Viewer • Updated 11 days ago • 68M • 12.7k • 263

upvoted 2 articles 6 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-AI

•

Oct 30, 2025

• 43

upvoted an article 7 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 258

sian cao

AI & ML interests

Recent Activity

Organizations

sonald's activity

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Scaling test-time compute

Deriving the DPO Loss from First Principles

Deriving the PPO Loss from First Principles

From GRPO to DAPO and GSPO: What, Why, and How

The Ultra-Scale Playbook

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Efficient LLM Pretraining: Packed Sequences and Masked Attention

The Smol Training Playbook

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Putting RL back in RLHF

Why Did MiniMax M2 End Up as a Full Attention Model?

Aligning to What? Rethinking Agent Generalization in MiniMax M2

nanoVLM: The simplest repository to train your VLM in pure PyTorch