view article Article Ettin Suite: SoTA Paired Encoders and Decoders +4 orionweller, kdricci, mmarone, NohTow, dlawrie, vandurme • Jul 16, 2025 • 81
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16, 2025 • 43
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 497
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 459
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated Jul 10, 2025 • 212
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10, 2025 • 153
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Dec 31, 2025 • 128
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper • 2412.06071 • Published Dec 8, 2024 • 11
view article Article Timm ❤️ Transformers: Use any timm model with transformers +3 ariG23498, rwightman, qubvel-hf, pcuenq, reach-vb • Jan 16, 2025 • 55