Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning Paper • 2512.24265 • Published 27 days ago • 4
nvidia/Nemotron-Pretraining-Specialized-v1 Viewer • Updated Dec 22, 2025 • 60.7M • 7.64k • 68
Does your data spark joy? Performance gains from domain upsampling at the end of training Paper • 2406.03476 • Published Jun 5, 2024 • 4
agentica-org/DeepScaleR-1.5B-Preview Text Generation • 2B • Updated Apr 9, 2025 • 74.1k • 577
ByteDance-Seed/Seed-OSS-36B-Instruct Text Generation • 36B • Updated Aug 26, 2025 • 7.79k • 477
Running 132 TxT360: Trillion Extracted Text 📖 132 Explore and analyze the TxT360 dataset for LLM pre-training
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Paper • 2502.04235 • Published Feb 6, 2025 • 23