view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 411
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One Jun 26, 2025 • 50
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 • 505
Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 54
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 157
view article Article Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp Jan 30 • 24
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 93
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 124
view article Article Shrinking Giants: The Quantization Mathematics Making LLMs Accessible May 3, 2025 • 2