view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention sirluk • Oct 7, 2024 • 71
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 514
Set Block Decoding is a Language Model Inference Accelerator Paper • 2509.04185 • Published Sep 4, 2025 • 54
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Kseniase • Mar 17, 2025 • 357
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article How to build a custom text classifier without days of human labeling sdiazlor • Oct 17, 2024 • 57
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf • Sep 18, 2024 • 280
view article Article Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers sanchit-gandhi • Nov 3, 2022 • 371
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging akjindal53244 • Aug 19, 2024 • 79
view article Article TGI Multi-LoRA: Deploy Once, Serve 30 Models +1 derek-thomas, dmaniloff, drbh • Jul 18, 2024 • 63