frankenstallm / source /eval /data_inventory /preference_benchmark_datasets.md
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
# Preference/RLHF + Benchmark ๋ฐ์ดํ„ฐ ์ „์ˆ˜ ์กฐ์‚ฌ
> ์กฐ์‚ฌ์ผ: 2026-02-27
---
## Part 1: ํ•œ๊ตญ์–ด Preference/DPO ๋ฐ์ดํ„ฐ
| ๋ฐ์ดํ„ฐ์…‹ | ๊ทœ๋ชจ | ๋‹ค์šด๋กœ๋“œ | ๋น„๊ณ  |
|----------|------|----------|------|
| `kuotient/orca-math-korean-dpo-pairs` | 100K~1M | 111 | ํ•œ๊ตญ์–ด ์ˆ˜ํ•™ DPO. ๋Œ€๊ทœ๋ชจ |
| `nayohan/preference-collection-ko-full` | 100K~1M | 30 | ํ•œ๊ตญ์–ด ์ข…ํ•ฉ preference |
| `jojo0217/korean_rlhf_dataset` | 100K~1M | 54 | ํ•œ๊ตญ์–ด RLHF |
| `maywell/ko_Ultrafeedback_binarized` | 10K~100K | 108 | UltraFeedback ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ |
| `ChuGyouk/argilla-distilabel-math-preference-dpo-korean` | 1K~10K | 10 | ์ˆ˜ํ•™ DPO ํ•œ๊ตญ์–ด |
| `ohsuz/dpo-v1010-korean` | 10K~100K | 3 | ํ•œ๊ตญ์–ด DPO |
| `ohsuz/dpo-v1010-korean-without-finance` | 10K~100K | 3 | ๊ธˆ์œต ์ œ์™ธ ๋ฒ„์ „ |
| `tellang/yeji-preference-ko-v1` | 10K~100K | 13 | ํ•œ๊ตญ์–ด preference |
| `AnonymousLLMer/Safety_preference-ko-cleaned` | 1K~10K | 4 | ์•ˆ์ „์„ฑ preference |
| `mncai/distilabel-math-preference-dpo-ko` | 1K~10K | 4 | ์ˆ˜ํ•™ DPO ํ•œ๊ตญ์–ด |
| `vaiv/ko-rag-preference` | <1K | 2 | RAG preference (์†Œ๊ทœ๋ชจ) |
### โŒ ์ ‘๊ทผ ๋ถˆ๊ฐ€ (404)
- `Bongseok/ko-DPO-v0.1` โ€” ์‚ญ์ œ๋จ
- `HAERAE-HUB/KoRA` โ€” ์‚ญ์ œ๋จ
- `maywell/ko_Ultrafeedback` โ€” ์‚ญ์ œ๋จ (binarized ๋ฒ„์ „๋งŒ ์กด์žฌ)
---
## Part 2: ์˜์–ด Preference ๋ฐ์ดํ„ฐ (๋ฒˆ์—ญ ๊ฐ€์น˜ ์ˆœ์œ„)
| ๋ฐ์ดํ„ฐ์…‹ | ๊ทœ๋ชจ | ๋‹ค์šด๋กœ๋“œ | ๋ฒˆ์—ญ ๊ฐ€์น˜ |
|----------|------|----------|-----------|
| `HuggingFaceH4/ultrafeedback_binarized` | 100K~1M (~62K์Œ) | 5,158 | โญโญโญ ์ตœ๊ณ . ์ด๋ฏธ ko ๋ฒˆ์—ญํŒ ์กด์žฌ(maywell) |
| `Anthropic/hh-rlhf` | 100K~1M | 17,609 | โญโญโญ ์ธ๊ฐ„ ์„ ํ˜ธ๋„. ๋Œ€ํ™”ํ˜• |
| `nvidia/HelpSteer2` | 10K~100K | 15,448 | โญโญโญ ๊ณ ํ’ˆ์งˆ ์„ธ๋ฐ€ ์ ์ˆ˜ |
| `openbmb/UltraFeedback` | 10K~100K | 2,317 | โญโญ ์›๋ณธ (binarized ๋ฒ„์ „ ๋” ์œ ์šฉ) |
| `argilla/distilabel-math-preference-dpo` | 1K~10K | 328 | โญโญ ์ˆ˜ํ•™ ํŠนํ™” (์ด๋ฏธ ko ๋ฒˆ์—ญํŒ ์กด์žฌ) |
| `snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset` | 10K~100K | 71 | โญ ์ž๋™ ์ƒ์„ฑ |
| `HuggingFaceH4/stack-exchange-preferences` | 10M~100M | 3,873 | โญ ๋„ˆ๋ฌด ๋Œ€๊ทœ๋ชจ, ์ฝ”๋“œ ํŽธํ–ฅ |
| `allenai/preference-test-sets` | 10K~100K | 2,777 | ํ‰๊ฐ€์šฉ (ํ•™์Šต ๋ถ€์ ํ•ฉ) |
---
## Part 3: ๋ฒค์น˜๋งˆํฌ/ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ
| ๋ฐ์ดํ„ฐ์…‹ | ๊ทœ๋ชจ | ๋‹ค์šด๋กœ๋“œ | ์šฉ๋„ |
|----------|------|----------|------|
| **`HAERAE-HUB/KMMLU`** | 100K~1M | 10,537 | ํ•œ๊ตญ์–ด MMLU. ํ•ต์‹ฌ ๋ฒค์น˜๋งˆํฌ |
| `skt/kobest_v1` | 10K~100K | 3,194 | KoBEST 5๊ฐœ ํƒœ์Šคํฌ (BoolQ, COPA, WiC, HellaSwag, SentiNeg) |
| `HAERAE-HUB/HAE_RAE_BENCH_1.0` | 1K~10K | 457 | ํ•ด๋ž˜ ๋ฒค์น˜ |
| `HAERAE-HUB/K2-Eval` | <1K | 76 | K2 ํ‰๊ฐ€ |
| `openai/gsm8k` | 10K~100K | 465,032 | ์ˆ˜ํ•™ ์ถ”๋ก  (์˜์–ด) |
| `HuggingFaceH4/MATH-500` | <1K | 94,894 | ์ˆ˜ํ•™ ๋ฒค์น˜๋งˆํฌ (์˜์–ด) |
| `Rowan/hellaswag` | 10K~100K | 213,419 | ์ƒ์‹์ถ”๋ก  (์˜์–ด) |
| `google/IFEval` | <1K | 60,319 | ์ง€์‹œ ๋”ฐ๋ฅด๊ธฐ ํ‰๊ฐ€ (์˜์–ด) |
### โŒ ์ ‘๊ทผ ๋ถˆ๊ฐ€ (404)
- `coastalcph/mimir`, `kuotient/korean-gsm8k`, `HAERAE-HUB/KorNAT-CV`, `HAERAE-HUB/KorNAT-NL2SQL`, `snunlp/korean-hate-speech`
---
## Part 4: ์ž์ฒด Preference ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ฐ€๋Šฅ์„ฑ
**SFT v2 ๋ชจ๋ธ (๋ฐ˜๋ณต๋ฅ  18%) ๊ธฐ๋ฐ˜ Self-Play ๋ฐฉ์‹:**
### ๋ฐฉ๋ฒ•
1. SFT ๋ฐ์ดํ„ฐ์˜ ํ”„๋กฌํ”„ํŠธ ํ’€์—์„œ ๊ฐ ํ”„๋กฌํ”„ํŠธ๋‹น N=4~8ํšŒ ์ƒ˜ํ”Œ๋ง (temperature 0.7~1.0)
2. ์ž๋™ ํ’ˆ์งˆ ํŒ๋‹จ์œผ๋กœ chosen/rejected ์„ ๋ณ„
### ์ž๋™ ํ’ˆ์งˆ ํŒ๋‹จ ๊ธฐ์ค€
- **๋ฐ˜๋ณต ํƒ์ง€**: n-gram ๋ฐ˜๋ณต๋ฅ  > 20% โ†’ rejected
- **๊ธธ์ด ํ•„ํ„ฐ**: ๋„ˆ๋ฌด ์งง๊ฑฐ๋‚˜(<50์ž) ๋„ˆ๋ฌด ๊ธด(>2000์ž) โ†’ rejected
- **Perplexity ๊ธฐ๋ฐ˜**: ์™ธ๋ถ€ judge ๋ชจ๋ธ (GPT-4 ๋˜๋Š” ๋” ํฐ ๋ชจ๋ธ)๋กœ ์ ์ˆ˜ ๋ถ€์—ฌ
- **Self-consistency**: ๋™์ผ ํ”„๋กฌํ”„ํŠธ ์‘๋‹ต ๊ฐ„ reward model ์ ์ˆ˜ ๋น„๊ต
### ์˜ˆ์ƒ ์ƒ์„ฑ๋Ÿ‰
- SFT ํ”„๋กฌํ”„ํŠธ 10K๊ฐœ ร— 4ํšŒ ์ƒ˜ํ”Œ๋ง = 40K ์‘๋‹ต
- chosen/rejected ์Œ: ~10K~20K์Œ (์ƒ์œ„ 25% vs ํ•˜์œ„ 25%)
- **์ฃผ์˜**: ๋ฐ˜๋ณต๋ฅ  18%์ธ ๋ชจ๋ธ๋กœ ์ƒ์„ฑ ์‹œ rejected ํ’ˆ์งˆ์ด ๋„ˆ๋ฌด ๋‚ฎ์„ ์ˆ˜ ์žˆ์Œ โ†’ ์œ ์˜๋ฏธํ•œ ํ•™์Šต ์‹ ํ˜ธ ์•ฝํ™” ๊ฐ€๋Šฅ
### ๊ถŒ์žฅ
- ์ž์ฒด ์ƒ์„ฑ๋ณด๋‹ค **๊ธฐ์กด ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ ํ™œ์šฉ ์šฐ์„ ** (์•„๋ž˜ ์ถ”์ฒœ ์ฐธ์กฐ)
- ์ž์ฒด ์ƒ์„ฑ์€ ORPO 1์ฐจ ํ•™์Šต ํ›„, ๊ฐœ์„ ๋œ ๋ชจ๋ธ๋กœ 2์ฐจ Self-Play ์‹œ ๋” ํšจ๊ณผ์ 
---
## ๐ŸŽฏ ORPO ์ฆ‰์‹œ ์‹œ์ž‘ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์กฐํ•ฉ ์ถ”์ฒœ
### Tier 1: ์ฆ‰์‹œ ์‚ฌ์šฉ (ํ•œ๊ตญ์–ด, ๋ณ€ํ™˜ ์ตœ์†Œ)
| ๋ฐ์ดํ„ฐ | ์˜ˆ์ƒ ์Œ์ˆ˜ | ์šฐ์„ ์ˆœ์œ„ |
|--------|-----------|----------|
| `jojo0217/korean_rlhf_dataset` | ~100K+ | ๐Ÿฅ‡ ๊ฐ€์žฅ ๋ฒ”์šฉ์  |
| `maywell/ko_Ultrafeedback_binarized` | ~60K | ๐Ÿฅ‡ UltraFeedback ํ•œ๊ตญ์–ด, ๊ณ ํ’ˆ์งˆ |
| `nayohan/preference-collection-ko-full` | ~100K+ | ๐Ÿฅ‡ ์ข…ํ•ฉ preference |
| `kuotient/orca-math-korean-dpo-pairs` | ~100K+ | ๐Ÿฅˆ ์ˆ˜ํ•™ ํŠนํ™” |
### Tier 2: ๋ณด์ถฉ์šฉ
| ๋ฐ์ดํ„ฐ | ์˜ˆ์ƒ ์Œ์ˆ˜ | ์šฉ๋„ |
|--------|-----------|------|
| `ohsuz/dpo-v1010-korean` | ~10K+ | ์ถ”๊ฐ€ ๋‹ค์–‘์„ฑ |
| `tellang/yeji-preference-ko-v1` | ~10K+ | ์ถ”๊ฐ€ ๋‹ค์–‘์„ฑ |
| `ChuGyouk/argilla-distilabel-math-preference-dpo-korean` | ~5K | ์ˆ˜ํ•™ ๋ณด์ถฉ |
### ์ถ”์ฒœ ์กฐํ•ฉ
```
์ด ~200K~300K์Œ ํ™•๋ณด ๊ฐ€๋Šฅ
1์ฐจ: jojo0217 + maywell + nayohan ํ•ฉ์‚ฐ โ†’ ~260K์Œ (์˜ˆ์ƒ)
2์ฐจ: kuotient ์ˆ˜ํ•™ ์ถ”๊ฐ€ โ†’ ์ˆ˜ํ•™ ๋Šฅ๋ ฅ ๊ฐ•ํ™”
```
### ๋ฒค์น˜๋งˆํฌ ํ‰๊ฐ€ ํŒŒ์ดํ”„๋ผ์ธ
- **KMMLU** (ํ•œ๊ตญ์–ด ์ง€์‹) + **KoBEST** (ํ•œ๊ตญ์–ด NLU) ํ•„์ˆ˜
- **GSM8K** (์ˆ˜ํ•™) + **IFEval** (์ง€์‹œ ๋”ฐ๋ฅด๊ธฐ) ๋ณด์กฐ
- **HAE_RAE_BENCH** ํ•œ๊ตญ์–ด ์ข…ํ•ฉ ํ‰๊ฐ€