Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 4 days ago • 148
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 890
view article Article mmBERT: ModernBERT goes Multilingual +4 mmarone, orionweller, will-fleshman, eugene-yang, dlawrie, vandurme • Sep 9, 2025 • 146
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 bwarner, NohTow, bclavie, orionweller, ohallstrom, staghado, alexisgallagher, rbiswasfc, fladhak, tomaarsen, ncoop57, griffin, jph00, johnowhitaker, iacolippo • Dec 19, 2024 • 740
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 abidlabs, znation, nouamanetazi, sasha, qgallouedec • Jul 29, 2025 • 223
view article Article A Deepdive into Aya Expanse: Advancing the Frontier of Multilinguality +2 johndang-cohere, shivalikasingh, dsouzadaniel, ArashAhmadian • Oct 24, 2024 • 64
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 611
RLHF Collection A collection of models trained with Reinforcement Learning from Human Feedback (RLHF). • 4 items • Updated 4 days ago • 7
view article Article Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU +4 edbeeching, ybelkada, lvwerra, smangrul, lewtun, kashif • Mar 9, 2023 • 72
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19, 2024 • 40
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition Paper • 2302.13750 • Published Feb 27, 2023 • 2
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 philschmid, osanseviero, alvarobartt, lvwerra, dvilasuero, reach-vb, marcsun13, pcuenq • Jul 23, 2024 • 241
view article Article Welcome Gemma 2 - Google’s new open LLM +4 philschmid, osanseviero, pcuenq, lewtun, tomaarsen, reach-vb • Jun 27, 2024 • 132
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 55