Avinash Sooriyarachchi

AviSoori1x

3 12 99

https://www.linkedin.com/in/avi-data-ml/

AI & ML interests

I work at Mistral AI

Recent Activity

upvoted a collection 2 months ago

MolmoAct2-BimanualYAM Dataset

liked a dataset 4 months ago

MuskumPillerum/General-Knowledge

upvoted an article 4 months ago

From GRPO to DAPO and GSPO: What, Why, and How

View all activity

Organizations

upvoted a collection 2 months ago

MolmoAct2-BimanualYAM Dataset

Collection

Collection of the MolmoAct2-BimanualYAM Dataset • 741 items • Updated 28 days ago • 14

upvoted 2 articles 4 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 129

Article

Continuous batching from first principles

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 418

upvoted 2 articles 6 months ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

not-lain

•

Jan 30, 2025

• 362

Article

You could have designed state of the art positional encoding

FL33TW00D-HF

•

Nov 25, 2024

• 490

upvoted an article 10 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

AviSoori1x

•

May 7, 2024

• 122

upvoted an article 12 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 780

upvoted 4 articles about 1 year ago

Article

Learn the Hugging Face Kernel Hub in 5 Minutes

drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb

•

Jun 12, 2025

• 164

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar

•

Jun 3, 2025

• 101

Article

🐯 Liger GRPO meets TRL

shisahni, kashif, smohammadi, ShirinYamani, m0m0chen, liberty4321

•

May 25, 2025

• 54

upvoted a paper over 2 years ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 191

Avinash Sooriyarachchi

AI & ML interests

Recent Activity

Organizations

AviSoori1x's activity

From GRPO to DAPO and GSPO: What, Why, and How

Continuous batching from first principles

KV Caching Explained: Optimizing Transformer Inference Efficiency

You could have designed state of the art positional encoding

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

SmolLM3: smol, multilingual, long-context reasoner

Learn the Hugging Face Kernel Hub in 5 Minutes

KV Cache from scratch in nanoVLM

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

🐯 Liger GRPO meets TRL