[Dataset] Pretrain-corpus
updated
Viewer
•
Updated
•
470M
•
36.9k
•
335
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
15.5k
•
216
Viewer
•
Updated
•
52.5B
•
201k
•
2.63k
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
325k
•
933
Viewer
•
Updated
•
4.48B
•
105k
•
740
data-is-better-together/fineweb-c
Viewer
•
Updated
•
88.7k
•
1.27k
•
58
Viewer
•
Updated
•
170M
•
69.4k
•
90
Updated
•
2.2k
•
985
Viewer
•
Updated
•
621M
•
8.78k
•
86
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
•
148k
•
252
Preview
•
Updated
•
130k
•
85