view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance tngtech • Apr 16, 2025 • 78
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated Jul 10, 2025 • 207
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 368
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k