Alex Chen

stone37

AI & ML interests

None yet

Organizations

None yet

upvoted 10 articles 6 months ago

Article

ChatGPT 背后的“功臣”——RLHF 技术详解

natolambert, LouisCastricato, lvwerra, Dahoas

•

Dec 9, 2022

• 14

Article

SmolLM3: smol, multilingual, long-context reasoner

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 25

Article

来自OpenAI gpt-oss的技巧，你🫵在transformers中也可以使用

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 14

Article

大模型偏好优化技术：DPO及其变种

Junrulu

•

Feb 20, 2025

• 21

Article

Mastering Tensor Dimensions in Transformers

not-lain

•

Jan 12, 2025

• 190

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

NormalUhr

•

Feb 7, 2025

• 296

Article

Deriving the PPO Loss from First Principles

garg-aayush

•

Dec 25, 2025

• 46

Article

The Optimal Architecture for Small Language Models

codelion

•

Dec 26, 2025

• 121

Article

nanoVLM: 最简洁、最轻量的纯 PyTorch 视觉-语言模型训练代码库

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 30

Article

流式数据集：效率提升 100 倍

andito, lhoestq, burtenshaw, pcuenq, merve

•

Oct 27, 2025

• 7

Alex Chen

AI & ML interests

Organizations

stone37's activity

ChatGPT 背后的“功臣”——RLHF 技术详解

SmolLM3: smol, multilingual, long-context reasoner

来自OpenAI gpt-oss的技巧，你🫵在transformers中也可以使用

大模型偏好优化技术：DPO及其变种

Mastering Tensor Dimensions in Transformers

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Deriving the PPO Loss from First Principles

The Optimal Architecture for Small Language Models

nanoVLM: 最简洁、最轻量的纯 PyTorch 视觉-语言模型训练代码库

流式数据集：效率提升 100 倍