Mercury: Ultra-Fast Language Models Based on Diffusion Paper • 2506.17298 • Published Jun 17, 2025 • 11
view article Article BigCodeArena: Judging code generations end to end with code executions bigcode • Oct 7, 2025 • 21
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 14 days ago • 172
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20, 2025 • 51
Deepseek v3.2 Speciale Collection Distilled models and datasets for Deepseek v3.2 Speciale. • 11 items • Updated Dec 20, 2025 • 8
Gemini 3 Pro Collection Distilled models and datasets for Gemini 3 Pro. • 9 items • Updated Dec 20, 2025 • 7
view article Article Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset +1 HugoLaurencon, Leyo, VictorSanh • Mar 15, 2024 • 13
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 269
GPT-4 generated datasets Collection Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 10
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 15 items • Updated Mar 10 • 674
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 merve, andsteing, pcuenq • May 14, 2024 • 287