[Dataset] Pretrain-corpus
updated
Viewer
• Updated • 69.9k • 155k
• 400
EssentialAI/essential-web-v1.0
Preview
• Updated • 138k
• 224
Viewer
• Updated • 52.5B • 946k
• 2.8k
HuggingFaceFW/fineweb-edu
Viewer
• Updated • 3.5B • 591k
• 1.08k
Viewer
• Updated • 4.48B • 68.4k
• 798
data-is-better-together/fineweb-c
Viewer
• Updated • 88.7k • 8.21k
• 60
Viewer
• Updated • 170M • 17.2k
• 94
Updated • 4.96k
• 1.03k
Viewer
• Updated • 621M • 15k
• 88
mlfoundations/dclm-baseline-1.0
Preview
• Updated • 371k
• 269
Preview
• Updated • 105k
• 94