view article Article How to Use Multiple GPUs in Hugging Face Transformers: Device Map vs Tensor Parallelism ariG23498 • Feb 12 • 20
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand qgallouedec • Dec 4, 2025 • 69
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 393
view article Article RexBERT: Encoders for a brave new world of E-Commerce thebajajra • Sep 20, 2025 • 50
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 777
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face +5 burtenshaw, reach-vb, pcuenq, clem, rajatarya, jsulz, lysandre • Apr 5, 2025 • 149
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 philschmid, osanseviero, alvarobartt, lvwerra, dvilasuero, reach-vb, marcsun13, pcuenq • Jul 23, 2024 • 241
view article Article Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner abhishek • May 9, 2024 • 25
Indic Parler-TTS Collection Collection of Parler-TTS models adapted to Indian languages. • 3 items • Updated Dec 4, 2024 • 10
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24, 2024 • 32