view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.14k
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 jsulz, yuchenglow, znation, saba9 • Feb 12, 2025 • 81
view article Article From Files to Chunks: Improving HF Storage Efficiency jsulz, erinys • Nov 20, 2024 • 73
view article Article Deploy LLMs with Hugging Face Inference Endpoints philschmid • Jul 4, 2023 • 17
view article Article Assisted Generation: a new direction toward low-latency text generation joaogante • May 11, 2023 • 79
view article Article How to generate text: using different decoding methods for language generation with Transformers patrickvonplaten • Mar 1, 2020 • 299
view article Article Welcome to Inference Providers on the Hub 🔥 +5 burkaygur, zeke, aton2006, hassanelmghari, sbrandeis, kramp, julien-c • Jan 28, 2025 • 494