Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 161k • 1.25k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 7.5k • 403 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 9.96k • 407
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 17.8k • 1.55k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 955 • 100 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 2.06k • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 192k • 882
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 17.8k • 1.55k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 955 • 100 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 2.06k • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 192k • 882
Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 161k • 1.25k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 7.5k • 403 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 9.96k • 407