Jialong Liu

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 785

commented on The 4 Things Qwen-3’s Chat Template Teaches Us 11 months ago

Great post， learned a lot！👍

upvoted an article 11 months ago

Article

The 4 Things Qwen-3’s Chat Template Teaches Us

cfahlgren1

•

Apr 30, 2025

• 89

liked a model over 1 year ago

rasbt/llama-3.2-from-scratch

Updated Jun 12, 2025 • 284

upvoted 2 articles over 1 year ago

Article

Mixture of Experts Explained

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.16k

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

AviSoori1x

•

May 7, 2024

• 125

liked a Space over 1 year ago

LLM训练终极指南 | The Ultra-Scale Playbook

🔥

269

了解LLM训练的方方面面

liked 3 models over 1 year ago

liked a model about 2 years ago

nvidia/Llama3-ChatQA-1.5-70B

Text Generation • 71B • Updated May 24, 2024 • 212 • • 334

Jialong Liu

AI & ML interests

Recent Activity

Organizations

Galleons's activity

Evaluation Guidebook

Unlocking On-Policy Distillation for Any Model Family

Continuous batching from first principles

SmolLM3: smol, multilingual, long-context reasoner

The 4 Things Qwen-3’s Chat Template Teaches Us

Mixture of Experts Explained

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

LLM训练终极指南 | The Ultra-Scale Playbook