Collections

Discover the best community collections!

Collections trending this week
Pretraining
This is general pretraining data for training a model from scratch. Around 5.37 trillion tokens