Dataset
updated
mlfoundations/MINT-1T-HTML
Viewer
•
Updated
•
623M
•
45.2k
•
91
mlfoundations/MINT-1T-ArXiv
Viewer
•
Updated
•
5.6M
•
14.6k
•
55
mlfoundations/MINT-1T-PDF-CC-2024-18
Updated
•
42.2k
•
20
mlfoundations/dclm-baseline-1.0-parquet
Viewer
•
Updated
•
2.73B
•
6.18k
•
32
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
352k
•
929
Viewer
•
Updated
•
52.5B
•
205k
•
2.63k
Viewer
•
Updated
•
258M
•
64.3k
•
44
Viewer
•
Updated
•
48.3M
•
7.45k
•
348
DAMO-NLP-SG/multimodal_textbook
Updated
•
745
•
157
fhswf/TinyStoriesV2_cleaned
Viewer
•
Updated
•
2.71M
•
77
•
13
Viewer
•
Updated
•
7.1M
•
301
•
7
Viewer
•
Updated
•
6.78M
•
11
•
5
TinyHelen's First Curriculum: Training and Evaluating Tiny Language
Models in a Simpler Language Environment
Paper
•
2501.00522
•
Published
•
2
HuggingFaceH4/Multilingual-Thinking
Viewer
•
Updated
•
1k
•
12.3k
•
108
nyu-dice-lab/wavepulse-radio-raw-transcripts
Viewer
•
Updated
•
565M
•
454
•
8
facebook/recycling_the_web
Viewer
•
Updated
•
60.3M
•
840
•
66
Viewer
•
Updated
•
68M
•
33.8k
•
231
Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale
📝