Text & Image Datasets
updated
airtrain-ai/fineweb-edu-fortified
Viewer
• Updated • 322M • 132k
• 65
Viewer
• Updated • 1.29B • 57.8k
• 393
Viewer
• Updated • 63.1M • 1.07k
• 26
Viewer
• Updated • 2.87B • 515
• 13
Viewer
• Updated • 113k • 1.4k
• 1
chestnutlzj/LaTeX_OCR_384x384
Viewer
• Updated • 76.3k • 169
Viewer
• Updated • 5.65M • 5.62k
• 2
laicsiifes/flickr30k-pt-br
Viewer
• Updated • 31k • 59
• 4
Rapidata/Flux-2-pro_t2i_human_preference
Viewer
• Updated • 44.9k • 220
• 13
Viewer
• Updated • 14.8M • 11.4k
• 118
Viewer
• Updated • 24.2M • 99.2k
• 499
Viewer
• Updated • 122k • 23.7k
• 79
KBlueLeaf/coyo11m-256px-ccrop-latent
Viewer
• Updated • 9.16M • 102
• 4
HuggingFaceM4/the_cauldron
Viewer
• Updated • 1.88M • 234k
• 547
Viewer
• Updated • 770k • 26k
• 36
Viewer
• Updated • 174k • 63
• 3
BLIP3o/BLIP3o-Pretrain-Long-Caption
Viewer
• Updated • 27.2M • 10.4k
• 67
Viewer
• Updated • 97.2M • 1.9k
• 8
Viewer
• Updated • 936k • 265k
• 346
Viewer
• Updated • 68M • 6.11k
• 269
Viewer
• Updated • 200k • 3.26k
• 103
lightonai/LightOnOCR-mix-0126
Viewer
• Updated • 16.4M • 987
• 112
Viewer
• Updated • 395M • 4.45k
• 34
karpathy/tinystories-gpt4-clean
Viewer
• Updated • 2.73M • 2.07k
• 80
pszemraj/cnn_dailymail-cleaned
Viewer
• Updated • 350k • 183
Viewer
• Updated • 1.1M • 58
• 4
Viewer
• Updated • 44.4M • 5
• 1
omarkamali/wikipedia-monthly
Viewer
• Updated • 195M • 5.78k
• 74
Felladrin/ChatML-SlimOrca-Dedup
Viewer
• Updated • 363k • 30
• 1
BEE-spoke-data/cosmopedia-v2-mincols
Viewer
• Updated • 39.1M • 298
• 3
Felladrin/ChatML-hercules-v2.0
Viewer
• Updated • 1.31M • 25
• 1
pacozaa/alpaca-cleaned-chatml
Viewer
• Updated • 51.8k • 28
Felladrin/ChatML-ultrachat_200k
Viewer
• Updated • 208k • 25
• 1
berng/jordiclive-wikipedia-summary-dataset-cutted
Viewer
• Updated • 7.75M • 65
• 1