Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset Paper • 2508.15096 • Published Aug 20, 2025 • 9
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 18 days ago • 146
🧠 SmolLM3 Collection Smol, multilingual, long-context reasoner • 14 items • Updated Oct 9, 2025 • 101
🇮🇹 Italian NLP Resources Collection Collection of models, datasets and demos relevant to Italian NLP 🇮🇹 • 300 items • Updated Mar 26 • 34
GAD-Models Collection Model checkpoints of Black-Box On-Policy Distillation of Large Language Models • 5 items • Updated Nov 17, 2025 • 6