admarcosai 's Collections Datasets
updated
Beyond Human Data: Scaling Self-Training for Problem-Solving with
Language Models
Paper
• 2312.06585
• Published
• 29
TinyGSM: achieving >80% on GSM8k with small language models
Paper
• 2312.09241
• Published
• 39
Viewer
• Updated
• 70k • 1.83k
• 92
Paper
• 2309.17425
• Published
• 6
jondurbin/gutenberg-dpo-v0.1
Viewer
• Updated
• 918 • 582
• 158
garage-bAInd/Open-Platypus
Viewer
• Updated
• 24.9k • 8.76k
• 415
Viewer
• Updated
• 243k • 982
• 219
Viewer
• Updated
• 58.7k • 1.14k
• 46
Viewer
• Updated
• 1.49M • 882
• 153
Viewer
• Updated
• 166k • 726
• 118
Viewer
• Updated
• 198k • 100
• 112
Viewer
• Updated
• 2.75M • 5.15k
• 386
Viewer
• Updated
• 6.2M • 1.44k
• 102
open-web-math/open-web-math
Viewer
• Updated
• 6.32M • 11.8k
• 330
Viewer
• Updated
• 4.04k • 265k
• 220
Viewer
• Updated
• 14.3k • 2.72k
• 51
Viewer
• Updated
• 44.8k • 124
• 53
Viewer
• Updated
• 6.14k • 14.4k
• 204
Viewer
• Updated
• 262k • 3.96k
• 299
argilla/ultrafeedback-binarized-preferences-cleaned
Viewer
• Updated
• 60.9k • 5.48k
• 161
WhiteRabbitNeo/Code-Functions-Level-Cyber
Viewer
• Updated
• 8.44k • 64
• 32
WhiteRabbitNeo/Code-Functions-Level-General
Viewer
• Updated
• 8.69k • 33
• 20
Viewer
• Updated
• 317k • 786
• 33
Updated
• 1.82k
• 132
Viewer
• Updated
• 183k • 1.06k
• 295
selfrag/selfrag_train_data
Viewer
• Updated
• 146k • 153
• 75
Viewer
• Updated
• 463k • 34
• 18
Locutusque/UltraTextbooks
Viewer
• Updated
• 5.52M • 602
• 198
Undi95/ConversationChronicles-sharegpt-SHARDED
Viewer
• Updated
• 787k • 67
• 10
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
• 2402.10176
• Published
• 38
Viewer
• Updated
• 31.1M • 14.8k
• 676
togethercomputer/RedPajama-Data-1T
Viewer
• Updated
• 1.73M • 2.65k
• 1.14k
Viewer
• Updated
• 968M • 15.1k
• 893
Viewer
• Updated
• 276M • 14.6k
• 165
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper
• 2412.14475
• Published
• 57