view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family 7 days ago • 64
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 116
view article Article Shrinking Giants: The Quantization Mathematics Making LLMs Accessible May 3, 2025 • 2
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 122
Running on CPU Upgrade Featured 2.93k The Smol Training Playbook 📚 2.93k The secrets to building world-class LLMs
Running 3.66k The Ultra-Scale Playbook 🌌 3.66k The ultimate guide to training LLM on large GPU Clusters
view article Article Transformers Are Getting Old: Variants and Alternatives Exist! Jul 5, 2025 • 44
ByteDance Papers Collection ByteDance papers collection • 138 items • Updated about 22 hours ago • 25
Deepseek Papers Collection Deepseek papers collection • 28 items • Updated about 22 hours ago • 317
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 Jun 13, 2024 • 62