HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
286k
•
947
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
•
96.4k
•
253
Viewer
•
Updated
•
4.48B
•
81.1k
•
752
Note
only multimodal data =(
Viewer
•
Updated
•
48.3M
•
7.75k
•
350
Viewer
•
Updated
•
5.45B
•
7.72k
•
466
Note
Don't have directly text =(
HuggingFaceTB/issues-kaggle-notebooks
Viewer
•
Updated
•
16.1M
•
184
•
13
Note
only 500k rows
Viewer
•
Updated
•
7.89M
•
11.5k
•
184
Note
1.6M rows with web-0.5-to-1.0
Locutusque/UltraTextbooks
Viewer
•
Updated
•
5.52M
•
1.55k
•
198
tokyotech-llm/swallow-math-v2
Viewer
•
Updated
•
17.4M
•
6.14k
•
25
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
177k
•
30
HuggingFaceFW/finepdfs-edu
Viewer
•
Updated
•
49.5M
•
7.93k
•
77
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
21.3k
•
432