HuggingFaceH4/ultrachat_200k
Viewer • Updated • 515k • 42.2k • 678
Curated SFT datasets for instruction-following and conversational fine-tuning
Note Multi-turn conversations, 515K samples, MIT license - Best for chat
Note GPT-4 generated, 1M samples, Apache 2.0 - High quality general purpose
Note GPT-3.5/4 augmented FLAN, 2.94M samples, MIT - Large scale training
Note Curated subset, 100K samples, Apache 2.0 - Quick experiments