view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 613
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article Everything You Need to Know about Knowledge Distillation Kseniase • Mar 6, 2025 • 80
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch AviSoori1x • May 7, 2024 • 121
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k