MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis Paper • 2606.13782 • Published 17 days ago • 2
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 16 days ago • 168
Data Science and Technology Towards AGI Part I: Tiered Data Management Paper • 2602.09003 • Published Feb 9 • 8
UltraData Collection Ultra Scale, Ultra Quality, Ultra Coverage • 11 items • Updated about 1 month ago • 98
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 48
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 65
RAVine: Reality-Aligned Evaluation for Agentic Search Paper • 2507.16725 • Published Jul 22, 2025 • 31
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data Paper • 2505.05427 • Published May 8, 2025 • 6