Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
rain2sun
's Collections
Agent RL
mBase LLM
Benchmark
NLP
RL-Datasets
Distilled
Math-Code-Reason
Code-IFT-Datasets
Open-LLM
High-Quality-Datasets
Pretrain-Datasets
IFT-Datasets
High-Quality-Datasets
updated
Dec 2, 2024
高质量数据集,包含高密度的知识
Upvote
-
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
255k
•
1.23k
OpenCoder-LLM/opc-annealing-corpus
Viewer
•
Updated
May 29, 2025
•
15.6M
•
1.63k
•
43
hltcoe/megawika
Updated
Jan 31, 2025
•
192k
•
41
allenai/dolmino-mix-1124
Viewer
•
Updated
Oct 29, 2025
•
170M
•
16.8k
•
94
Upvote
-
Share collection
View history
Collection guide
Browse collections