Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
rain2sun
's Collections
mBase LLM
Benchmark
NLP
RL-Datasets
Distilled
Math-Code-Reason
Code-IFT-Datasets
Open-LLM
High-Quality-Datasets
Pretrain-Datasets
IFT-Datasets
High-Quality-Datasets
updated
Dec 2, 2024
高质量数据集,包含高密度的知识
Upvote
-
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
79.1k
•
1.14k
OpenCoder-LLM/opc-annealing-corpus
Viewer
•
Updated
May 29, 2025
•
15.6M
•
4.17k
•
43
hltcoe/megawika
Updated
Jan 31, 2025
•
35.4k
•
41
allenai/dolmino-mix-1124
Viewer
•
Updated
Oct 29, 2025
•
170M
•
175k
•
90
Upvote
-
Share collection
View history
Collection guide
Browse collections