view article Article Why Did MiniMax M2 End Up as a Full Attention Model? MiniMax-AI • Oct 30, 2025 • 80
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 112
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
view article Article Gaia2 and ARE: Empowering the community to study agents +9 clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter • Sep 22, 2025 • 136
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware RakshitAralimatti • Aug 8, 2025 • 36
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 351
view article Article ChatML vs Harmony: Understanding the new Format from OpenAI 🔍 kuotient • Aug 9, 2025 • 59
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.15k
view article Article Assisted Generation: a new direction toward low-latency text generation joaogante • May 11, 2023 • 79
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP +2 smangrul, sgugger, lewtun, philschmid • Sep 13, 2023 • 32