Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
itsnotsplat
's Collections
Ai/real image classifier
Post-training
Pretraining
Pretraining
updated
Mar 29
This is general pretraining data for training a model from scratch. Around ~2.1 trillion tokens.
Upvote
1
ronantakizawa/github-top-code
Viewer
•
Updated
Feb 23
•
1.12M
•
538
•
122
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
Jul 11, 2025
•
3.5B
•
615k
•
1.08k
openbmb/UltraData-Math
Viewer
•
Updated
Apr 15
•
181M
•
64.2k
•
306
nick007x/github-code-2025
Viewer
•
Updated
Apr 1
•
148M
•
1.35k
•
117
angie-chen55/python-github-code
Viewer
•
Updated
May 31, 2022
•
7.23M
•
5.34k
•
37
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
21.2k
•
913
nick007x/arxiv-papers
Viewer
•
Updated
Apr 1
•
2.55M
•
858k
•
185
hoskinson-center/proof-pile
Viewer
•
Updated
Aug 19, 2023
•
363k
•
2.29k
•
67
HuggingFaceTB/finemath
Viewer
•
Updated
Feb 6, 2025
•
48.3M
•
40.2k
•
360
Upvote
1
Share collection
View history
Collection guide
Browse collections