Pretraining Datasets togethercomputer/RedPajama-Data-1T Viewer • Updated Jun 17, 2024 • 1.73M • 2.15k • 1.14k EleutherAI/the_pile_deduplicated Viewer • Updated Dec 2, 2022 • 134M • 27.8k • 111 karpathy/climbmix-400b-shuffle Viewer • Updated Mar 3 • 553M • 195k • 32 allenai/dolma3_mix-6T Preview • Updated Jan 15 • 109k • 24
Text-Image microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.22M • 1.79k
Pretraining Datasets togethercomputer/RedPajama-Data-1T Viewer • Updated Jun 17, 2024 • 1.73M • 2.15k • 1.14k EleutherAI/the_pile_deduplicated Viewer • Updated Dec 2, 2022 • 134M • 27.8k • 111 karpathy/climbmix-400b-shuffle Viewer • Updated Mar 3 • 553M • 195k • 32 allenai/dolma3_mix-6T Preview • Updated Jan 15 • 109k • 24
Text-Image microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.22M • 1.79k