frankenstallm / source /eval /roadmap_framework.md
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
# ํ•œ๊ตญ์–ด LLM ์ „์ฒด ๋กœ๋“œ๋งต & ์˜์‚ฌ๊ฒฐ์ • ํ”„๋ ˆ์ž„์›Œํฌ
> **์ž‘์„ฑ์ผ**: 2026-02-26
> **ํ˜„์žฌ ์ƒํƒœ**: SFT 5,000 steps ์™„๋ฃŒ (loss 1.9677), 1.19B ํŒŒ๋ผ๋ฏธํ„ฐ ๋ชจ๋ธ
> **๋ชฉํ‘œ**: ์‹ค์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํ•œ๊ตญ์–ด LLM ๋ฐฐํฌ
---
## 0. TL;DR โ€” ์ง€๊ธˆ ๋‹น์žฅ ํ•  ์ผ
1. **SFT ๋ชจ๋ธ ๋น ๋ฅธ ์ƒ์„ฑ ํ…Œ์ŠคํŠธ** (30๋ถ„, ์˜ค๋Š˜): temperature sampling์œผ๋กœ ๋ฐ˜๋ณต ํ‡ดํ™” ํ™•์ธ
2. **lm-eval-harness ko_ifeval + ko_winogrande ์‹คํ–‰** (2~4์‹œ๊ฐ„): ์ˆซ์ž ํ™•์ธ
3. **๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ๋ถ„๊ธฐ** โ†’ ์•„๋ž˜ ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ ์ฐธ์กฐ
---
## 1. ํ˜„์žฌ ์œ„์น˜ ํŒŒ์•…
### 1.1 SFT ํ•™์Šต ํ˜„ํ™ฉ ์š”์•ฝ
| ํ•ญ๋ชฉ | ๊ฐ’ |
|------|-----|
| Steps | 5,000 |
| Final Loss | 1.9677 |
| ํ•™์Šต ์‹œ๊ฐ„ | 0.61h (~37๋ถ„) |
| ์ฒ˜๋ฆฌ ์†๋„ | ~75,700 tok/s (๋‹จ์ผ B200) |
| LR (final) | 2.00e-06 (์™„์ „ํžˆ decay๋จ) |
| Gradient Norm | ์•ˆ์ • (1.0~1.4 ๋ฒ”์œ„) |
| SFT ๋ฐ์ดํ„ฐ | ์•Œ ์ˆ˜ ์—†์Œ (ํ™•์ธ ํ•„์š”) |
**์ฃผ๋ชฉ**: 5,000 steps๋Š” **๋งค์šฐ ์ ์€ ์–‘**์ด๋‹ค. SFT์—์„œ ๋ณดํ†ต 1~3 ์—ํญ์„ ๋Œ๋ฆฌ๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์— ๋”ฐ๋ผ steps ์ถฉ๋ถ„์„ฑ์ด ๊ฒฐ์ •๋œ๋‹ค.
### 1.2 ์—…๊ณ„ ๋‚ด ์œ„์น˜
#### ๋ฒค์น˜๋งˆํฌ ๊ธฐ์ค€ (Open Ko-LLM Leaderboard ์‹ค์ธก์น˜ ๊ธฐ๋ฐ˜ ์ถ”์ •)
```
๋ชจ๋ธ ๊ทœ๋ชจ ko_ifeval ko_winogrande ๋น„๊ณ 
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
EXAONE-3.0-7.8B-Instruct 7.8B ~55% ~80%+ 8T tokens, SFT+DPO
Llama-3.1-8B-Korean-SFT 8B ~40% ~72% Llama ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์ ์‘
SOLAR-10.7B-Instruct 10.7B ~50% ~78% ์—…์Šคํ…Œ์ด์ง€
Gemma-2-9B-Korean 9B ~45% ~75% Google ๊ธฐ๋ฐ˜
โ”€โ”€ ํ˜„์‹ค์  1B SFT ๋ฒค์น˜๋งˆํฌ (ํƒ€์‚ฌ ์‚ฌ๋ก€) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
dltjdgh0928/test_instruction ~?B 24.1% 57.1% ๋ฆฌ๋”๋ณด๋“œ ์‹ค์ธก
lookuss/test-llilu ~?B 22.9% 58.2% ๋ฆฌ๋”๋ณด๋“œ ์‹ค์ธก
generic 1B SFT (์ถ”์ •) 1~2B 20-30% 52-62% ํ˜„์‹ค์  ๋ฒ”์œ„
โ”€โ”€ ์šฐ๋ฆฌ ๋ชจ๋ธ ์˜ˆ์ƒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
korean_1b_sft (5k steps) 1.19B 15-28%? 50-58%? ํ‰๊ฐ€ ์ „ ์ถ”์ •
```
#### ํ•ต์‹ฌ ๊ฒฉ์ฐจ ๋ถ„์„
| ๋น„๊ต ๋Œ€์ƒ | ํŒŒ๋ผ๋ฏธํ„ฐ ์ฐจ์ด | ์˜ˆ์ƒ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ | ์ฃผ์š” ์ด์œ  |
|-----------|--------------|---------------|-----------|
| EXAONE-3.0-7.8B | 6.6ร— | ๋งค์šฐ ํผ | ๊ทœ๋ชจ + ๋ฐ์ดํ„ฐ + DPO |
| 8B ํ•œ๊ตญ์–ด SFT | 6.7ร— | ํผ | ๊ทœ๋ชจ ์ฐจ์ด๊ฐ€ ์ง€๋ฐฐ์  |
| ํƒ€์‚ฌ 1B SFT | ์œ ์‚ฌ | ์ž‘์Œ~์ค‘๊ฐ„ | ๋ฐ์ดํ„ฐ/ํ•™์Šต ๋ฐฉ๋ฒ• ์ฐจ์ด |
**ํ˜„์‹ค์  ํ‰๊ฐ€**: 1B ๋ชจ๋ธ์€ 7~10B ๋ชจ๋ธ๊ณผ **direct ๊ฒฝ์Ÿ์ด ๋ถˆ๊ฐ€๋Šฅ**ํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜:
- **์—ฃ์ง€ ๋ฐฐํฌ** (๋กœ์ปฌ ์„œ๋น™, ์ €์ง€์—ฐ API): 1B๊ฐ€ ๋ช…ํ™•ํ•œ ์šฐ์œ„
- **๋ฆฌ์†Œ์Šค ํšจ์œจ**: 1B๋Š” ๋‹จ์ผ GPU, ์‹ฌ์ง€์–ด CPU์—์„œ๋„ ๊ตฌ๋™
- **ํŠนํ™” ๋„๋ฉ”์ธ**: ํ•œ๊ตญ์–ด ํŠนํ™” fine-tuning์œผ๋กœ ํŠน์ • ํƒœ์Šคํฌ์—์„œ ๋Œ€ํ˜• ๋ฒ”์šฉ ๋ชจ๋ธ ๊ทผ์ ‘ ๊ฐ€๋Šฅ
### 1.3 1B ๊ทœ๋ชจ์˜ ํ•œ๊ณ„์™€ ๊ฐ€๋Šฅ์„ฑ
**ํ˜„์‹ค์  ํ•œ๊ณ„**:
- ko_ifeval 30% ์ดˆ๊ณผ ์–ด๋ ค์›€ (instruction following ๋ณต์žก๋„)
- ์ˆ˜ํ•™/์ฝ”๋“œ: ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ์—†์œผ๋ฉด ๊ฑฐ์˜ ๋ถˆ๊ฐ€
- ์žฅ๋ฌธ ๋งฅ๋ฝ ์ดํ•ด: 4K context์—์„œ degradation ์‹œ์ž‘
- ์‚ฌ์‹ค ๊ธฐ์–ต: ์„ธ๋ฐ€ํ•œ ์‚ฌ์‹ค ์ €์žฅ capacity ๋ถ€์กฑ
**๊ฐ€๋Šฅํ•œ ๊ฒƒ**:
- ํ•œ๊ตญ์–ด ๊ธฐ๋ณธ QA, ์š”์•ฝ, ๋ถ„๋ฅ˜
- ๊ฐ„๋‹จํ•œ ์ง€์‹œ ๋”ฐ๋ฅด๊ธฐ (1~2๋‹จ๊ณ„)
- ํ•œ๊ตญ์–ด ์ž๋™์™„์„ฑ, ๊ต์ •
- ๋„๋ฉ”์ธ ํŠนํ™” ํƒœ์Šคํฌ (์ œํ•œ๋œ ํ˜•์‹)
---
## 2. ๋‹จ๊ณ„๋ณ„ ๋กœ๋“œ๋งต
### Phase 1: SFT ๊ฒ€์ฆ (์ง€๊ธˆ โ†’ ~1์ฃผ)
#### ๋ชฉํ‘œ
SFT 5,000 steps ๊ฒฐ๊ณผ๊ฐ€ ์‹ค์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์ธ์ง€ ํŒ์ •
#### ์ฒดํฌ๋ฆฌ์ŠคํŠธ
```
โ–ก 1-1. ์ƒ์„ฑ ํ’ˆ์งˆ ๋น ๋ฅธ ์ ๊ฒ€ (30๋ถ„)
- temperature=0.8, top_p=0.9์œผ๋กœ 10๊ฐœ ํ”„๋กฌํ”„ํŠธ ์ƒ์„ฑ
- ์ฒดํฌ: ๋ฐ˜๋ณต ํ‡ดํ™” ๋น„์œจ (๋ชฉํ‘œ: < 20%)
- ์ฒดํฌ: ํ•œ๊ตญ์–ด ์–ด๋ฏธ/์กฐ์‚ฌ ์ฒ˜๋ฆฌ ์ž์—ฐ์Šค๋Ÿฌ์šด๊ฐ€
- ์ฒดํฌ: instruction ๋”ฐ๋ฅด๋Š”๊ฐ€ (base์™€ ๋น„๊ต)
โ–ก 1-2. ๊ณต์‹ ๋ฒค์น˜๋งˆํฌ (2~4์‹œ๊ฐ„)
- lm-evaluation-harness ์„ค์น˜ ๋ฐ ko_ifeval ์‹คํ–‰
- lm-evaluation-harness ko_winogrande ์‹คํ–‰
- ์„ ํƒ: ko_gsm8k (์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ์—†์œผ๋ฉด skip ๊ฐ€๋Šฅ)
โ–ก 1-3. SFT ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ ๊ฒ€
- SFT ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹ ํ™•์ธ
- ๋ฐ์ดํ„ฐ ์ˆ˜ (๋ช‡ ๊ฐœ ์ƒ˜ํ”Œ์ธ๊ฐ€?)
- 5,000 steps ร— batch_size = ์ด ํ† ํฐ ์ˆ˜ ์‚ฐ์ถœ
- ์—ํญ ์ˆ˜ ๊ณ„์‚ฐ: epoch 2์— ์ง„์ž…ํ–ˆ์œผ๋ฏ€๋กœ ์ตœ์†Œ 1 ์—ํญ ์™„๋ฃŒ ํ™•์ธ๋จ
โ–ก 1-4. Base vs SFT ๋น„๊ต
- ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— base (pretrained)์™€ SFT ๊ฒฐ๊ณผ ๋น„๊ต
- SFT๊ฐ€ instruction following ๋Šฅ๋ ฅ์„ ๋ถ€์—ฌํ–ˆ๋Š”๊ฐ€?
```
#### Pass/Fail ๊ธฐ์ค€ (์ˆ˜์น˜ํ™”)
| ์ง€ํ‘œ | Pass โœ… | ๊ฒฝ๊ณ„์„  โš ๏ธ | Fail โŒ |
|------|---------|-----------|---------|
| ko_ifeval (prompt strict) | > 25% | 15~25% | < 15% |
| ko_winogrande | > 53% | 50~53% | < 50% |
| ๋ฐ˜๋ณต ํ‡ดํ™”์œจ (greedy) | < 20% | 20~40% | > 40% |
| temperature ์ƒ˜ํ”Œ๋ง ํ’ˆ์งˆ | ์ž์—ฐ์Šค๋Ÿฌ์›€ | ์–ด์ƒ‰ํ•จ | ๋ฌด์˜๋ฏธ |
| Base ๋Œ€๋น„ SFT ๊ฐœ์„  | ๋ช…ํ™•ํ•œ instruction ๋”ฐ๋ฅด๊ธฐ | ๋ฏธ๋ฏธํ•œ ๊ฐœ์„  | ๊ฐœ์„  ์—†์Œ/์•…ํ™” |
> **์ฐธ๊ณ **: ko_winogrande 50%๋Š” random (binary choice) ์ˆ˜์ค€. ์‹ค์งˆ์  ์˜๋ฏธ ์žˆ์œผ๋ ค๋ฉด 53%+.
#### ์‹คํŒจ ์‹œ ๋Œ€์‘
- **ko_ifeval < 15% + ๋ฐ˜๋ณต > 40%**: SFT ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ๋˜๋Š” steps ๋ถ€์กฑ โ†’ Phase 2A
- **Base ๋Œ€๋น„ ๊ฐœ์„  ์—†์Œ**: SFT ๋ฐ์ดํ„ฐ ํ˜•์‹/ํ’ˆ์งˆ ์ ๊ฒ€, ํ•™์Šต๋ฅ  ์žฌ๊ฒ€ํ† 
- **๋ชจ๋“  ์ง€ํ‘œ Fail**: ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ๋ถ€ํ„ฐ ์žฌ๊ฒ€ํ† 
---
### Phase 2A: SFT ๊ฐœ์„  (์„ ํƒ์ , Phase 1 ๊ฒฐ๊ณผ ๊ธฐ์ค€)
#### ์–ธ์ œ ์ง„์ž…ํ•˜๋Š”๊ฐ€?
```
Phase 2A ์ง„์ž… ์กฐ๊ฑด:
โ”œโ”€โ”€ ko_ifeval < 25% AND ๋ฐ˜๋ณต > 20% โ†’ ์ฆ‰์‹œ ์ง„์ž…
โ”œโ”€โ”€ ko_ifeval 25-30% AND ๋ฐ˜๋ณต 20-30% โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„ ์ง„์ž…
โ””โ”€โ”€ ko_ifeval > 30% AND ๋ฐ˜๋ณต < 15% โ†’ Phase 2B ๋˜๋Š” 4๋กœ ๋ฐ”๋กœ
```
#### ์˜ต์…˜๋ณ„ ๋ถ„์„
**์˜ต์…˜ A: Steps ์ฆ๊ฐ€ (5k โ†’ 10k~20k)**
- **์–ธ์ œ**: ๋ฐ์ดํ„ฐ๋Š” ์ถฉ๋ถ„ํ•˜๊ณ  ์•„์ง ์ˆ˜๋ ดํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ
- **ํ™•์ธ ๋ฐฉ๋ฒ•**: Loss ๊ณก์„ ์ด ์•„์ง ํ•˜๊ฐ• ์ค‘์ธ๊ฐ€? (5,000 steps์—์„œ 1.97 โ€” ์ˆ˜๋ ด ๊ทผ์ ‘)
- **์˜ˆ์ƒ ํšจ๊ณผ**: ์†Œํญ ๊ฐœ์„  (loss 1.97 โ†’ 1.80 ๋ชฉํ‘œ, ko_ifeval +3~7%p ์˜ˆ์ƒ)
- **๋น„์šฉ**: B200 1๊ฐœ ๊ธฐ์ค€ 1.5~3์‹œ๊ฐ„ ์ถ”๊ฐ€
- **์ฃผ์˜**: ์ด๋ฏธ epoch 2์— ์ง„์ž… โ€” ๊ณผ์ ํ•ฉ ์œ„ํ—˜ ์žˆ์Œ
**์˜ต์…˜ B: ๋” ์ข‹์€ ๋ฐ์ดํ„ฐ**
- **์–ธ์ œ**: ํ˜„์žฌ SFT ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜ ํ’ˆ์งˆ์ด ๋‚ฎ์„ ๋•Œ (๊ฐ€์žฅ ํ”ํ•œ ์ด์œ )
- **์ถ”์ฒœ ๋ฐ์ดํ„ฐ์…‹**:
- `beomi/KoAlpaca-v1.1a` โ€” 21K ํ•œ๊ตญ์–ด instruction
- `HAERAE-HUB/KMMLU` โ€” ํ•œ๊ตญ์–ด ์ง€์‹ QA
- `nayohan/llama3-baseline-ko-dataset` โ€” ๋‹ค์–‘ํ•œ instruction
- `squarelike/sharegpt_deepl_ko_ko-en` โ€” ShareGPT ํ•œ๊ตญ์–ด
- ํ•ฉ์‚ฐ ๋ชฉํ‘œ: 50K~200K ๊ณ ํ’ˆ์งˆ ์ƒ˜ํ”Œ
- **์˜ˆ์ƒ ํšจ๊ณผ**: ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„ ์ด steps 2๋ฐฐ๋ณด๋‹ค ํšจ๊ณผ์  (๊ฒฝํ—˜์  ๋ฒ•์น™)
- **๋น„์šฉ**: ๋ฐ์ดํ„ฐ ์ •์ œ 1~2์ผ, ํ•™์Šต ์ถ”๊ฐ€ 2~6์‹œ๊ฐ„
**์˜ต์…˜ C: ORPO (Odds Ratio Preference Optimization)**
- **์–ธ์ œ**: SFT baseline ํ™•๋ณด ํ›„ preference ์ •๋ ฌ์ด ํ•„์š”ํ•  ๋•Œ
- **์žฅ์ **: reference model ๋ถˆํ•„์š” โ†’ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ, ํ•™์Šต ๋‹จ์ˆœํ™”
- **ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ**: `kuotient/orca-math-korean-preference` (193K), `heegyu/orca-math-korean-preference-cleaned` (192K) ์กด์žฌ
- **์˜ˆ์ƒ ํšจ๊ณผ**: ๋ฐ˜๋ณต ํ‡ดํ™” -10~20%p, instruction following +5~10%p
- **๋น„์šฉ**: ๋ฐ์ดํ„ฐ ์ค€๋น„ 1์ผ, ํ•™์Šต 3~6์‹œ๊ฐ„
**์˜ต์…˜ D: DPO (Direct Preference Optimization)**
- **์–ธ์ œ**: ORPO๋ณด๋‹ค ๋” ๊ฐ•ํ•œ ์ •๋ ฌ์ด ํ•„์š”ํ•  ๋•Œ, ๋˜๋Š” SFT๊ฐ€ ์–ด๋А ์ •๋„ ์ž˜ ๋์„ ๋•Œ
- **์žฅ์ **: RLHF์™€ ์œ ์‚ฌํ•œ ํšจ๊ณผ, PPO๋ณด๋‹ค ์•ˆ์ •์ 
- **๋‹จ์ **: reference model ํ•„์š” (๋ฉ”๋ชจ๋ฆฌ 2ร—)
- **B200์—์„œ ๊ฐ€๋Šฅ์„ฑ**: 1.19B ร— 2 = ~2.4B params โ€” ๋‹จ์ผ B200 183GB์—์„œ ์ถฉ๋ถ„
- **๋น„์šฉ**: ํ•™์Šต 4~8์‹œ๊ฐ„
#### ๊ถŒ์žฅ ์ˆœ์„œ
```
๋ฐ์ดํ„ฐ ์ ๊ฒ€ โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• (์˜ต์…˜ B) โ†’ Steps ์ถ”๊ฐ€ (์˜ต์…˜ A) โ†’ ORPO (์˜ต์…˜ C)
```
---
### Phase 2B: ์Šค์ผ€์ผ์—… โ€” 3B ๋ชจ๋ธ
#### ๋ฐ์ดํ„ฐ ์ถฉ๋ถ„์„ฑ ๋ถ„์„
| ๊ธฐ์ค€ | ํ•„์š” ํ† ํฐ | ํ˜„์žฌ ๋ณด์œ  | ํŒ์ • |
|------|-----------|-----------|------|
| Chinchilla ์ตœ์†Œ (20ร—) | 3B ร— 20 = 60B | ~150B | โœ… ์ถฉ๋ถ„ |
| Chinchilla ์ตœ์  (70ร—) | 3B ร— 70 = 210B | ~150B | โš ๏ธ 71% ์ˆ˜์ค€ |
| Llama ๋ฐฉ์‹ (๊ณ ํ’ˆ์งˆ ์ง‘์ค‘) | 3B ร— 100 = 300B | ~150B | โŒ ๋ถ€์กฑ |
**๊ฒฐ๋ก **: **์ง€๊ธˆ ๋ฐ์ดํ„ฐ๋กœ 3B ํ•™์Šต ๊ฐ€๋Šฅ**. ๋‹จ, optimal์€ ์•„๋‹˜. ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ 50B ์ถ”๊ฐ€ ์ˆ˜์ง‘ํ•˜๋ฉด optimal ๊ทผ์ ‘.
#### ์˜ˆ์ƒ ํ•™์Šต ์‹œ๊ฐ„ (8ร— B200 ๊ธฐ์ค€)
```
3B ๋ชจ๋ธ ์„ค์ • ์ถ”์ •:
- ์ฒ˜๋ฆฌ ์†๋„: ~2.5~3M tok/s (8ร— B200, 1.19B ๊ธฐ์ค€ 2.64M)
โ†’ 3B ๋ชจ๋ธ์€ ์†๋„ ~40% ๊ฐ์†Œ ์˜ˆ์ƒ (๋ฉ”๋ชจ๋ฆฌ/์—ฐ์‚ฐ ์ฆ๊ฐ€)
โ†’ ์‹คํšจ ์†๋„: ~1.6M tok/s (์ถ”์ •)
60B tokens (์ตœ์†Œ): 60B / 1.6M = 37,500์ดˆ โ‰ˆ 10.4์‹œ๊ฐ„
150B tokens (ํ˜„์žฌ ๋ณด์œ  ์ „๋Ÿ‰): 150B / 1.6M = 93,750์ดˆ โ‰ˆ 26์‹œ๊ฐ„
210B tokens (optimal): 210B / 1.6M = 131,250์ดˆ โ‰ˆ 36.5์‹œ๊ฐ„
โ†’ ํ˜„์‹ค์  ํ•™์Šต ๊ธฐ๊ฐ„: 1~2์ผ (8ร— B200)
```
#### 3B ํ•™์Šต ์ค€๋น„์‚ฌํ•ญ
```
โ–ก ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์„ค์ • ๋ณ€๊ฒฝ:
- d_model: 2048 โ†’ 2560 (๋˜๋Š” 3072)
- n_layers: 24 โ†’ 32
- n_heads: 16 โ†’ 32
- n_kv_heads (GQA): 4 โ†’ 8
- d_ffn: 5472 โ†’ ~8192
โ†’ ์˜ˆ์ƒ ํŒŒ๋ผ๋ฏธํ„ฐ: ~3B
โ–ก ๋ฐ์ดํ„ฐ ์ค€๋น„:
- cc100 ko ์žฌ๋‹ค์šด๋กœ๋“œ (๋ฒ„๊ทธ ์ˆ˜์ • ํ›„)
- CulturaX 24.8B ํ™œ์šฉ
- ์ด 150B+ ํ† ํฐ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ
โ–ก configs/korean_3b_fp8.yaml ์ž‘์„ฑ
โ–ก ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ์ „๋žต: ๋งค 5,000 steps
โ–ก FP8 ์„ค์ • ์œ ์ง€ (B200 ์ตœ์ ํ™”)
```
#### 1B SFT ๊ฒฐ๊ณผ์˜ 3B ์ง„ํ–‰ ์—ฌ๋ถ€ ์˜ํ–ฅ
```
1B SFT ๊ฒฐ๊ณผ โ†’ 3B ์ง„ํ–‰ ์—ฌ๋ถ€
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ko_ifeval > 30% โ†’ ๊ฐ•๋ ฅํžˆ ์ถ”์ฒœ: 1B๊ฐ€ ์ด๋ฏธ ์ข‹์Œ, 3B๋Š” ํ™•์‹คํžˆ ๋” ์ข‹์„ ๊ฒƒ
ko_ifeval 20-30% โ†’ ์กฐ๊ฑด๋ถ€ ์ถ”์ฒœ: ๋ฐ์ดํ„ฐ/๋ฐฉ๋ฒ•๋ก  ํ™•์ธ ํ›„ 3B
ko_ifeval < 20% โ†’ 3B ์ „์— ์›์ธ ๋ถ„์„ ํ•„์ˆ˜: ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ 3B์—๋„ ์žฌํ˜„๋จ
๋ฐ˜๋ณต ํ‡ดํ™” > 40% โ†’ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ์˜์‹ฌ: 3B๋„ ๋™์ผ ๋ฌธ์ œ ๊ฐ€๋Šฅ
SFT ๊ฐœ์„  ์—†์Œ โ†’ SFT ํŒŒ์ดํ”„๋ผ์ธ ์ˆ˜์ • ํ›„ 3B
```
---
### Phase 3: RLHF / Preference Optimization (์„ ํƒ์ )
#### ์–ธ์ œ ํ•„์š”ํ•œ๊ฐ€?
| ์‹œ๋‚˜๋ฆฌ์˜ค | ํ•„์š”์„ฑ |
|----------|--------|
| ์„œ๋น„์Šค ๋ฐฐํฌ (์‚ฌ์šฉ์ž ๋Œ€๋ฉด) | ๊ฐ•๋ ฅํžˆ ํ•„์š” โ€” safety, coherence |
| ๋ฆฌ๋”๋ณด๋“œ ์ ์ˆ˜ ๊ทน๋Œ€ํ™” | ํ•„์š” โ€” DPO/ORPO๋กœ +5~15%p |
| ๋‚ด๋ถ€ ์—ฐ๊ตฌ/์‹คํ—˜ | ๋ถˆํ•„์š” |
| RAG ์‹œ์Šคํ…œ ๋ฐฑ์—”๋“œ | ๋ถˆํ•„์š” |
#### ORPO vs DPO vs PPO ๋น„๊ต
| ๋ฐฉ๋ฒ• | ์–ธ์ œ | ๋ฉ”๋ชจ๋ฆฌ | ๋ณต์žก๋„ | ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ |
|------|------|--------|--------|---------------|
| **ORPO** | SFT์™€ ๋™์‹œ, ๋น ๋ฅธ ์ •๋ ฌ | 1ร— (ref ์—†์Œ) | ๋‚ฎ์Œ | 193K+ ์กด์žฌ |
| **DPO** | SFT ์ดํ›„, ์•ˆ์ •์  ์ •๋ ฌ | 2ร— (ref ํ•„์š”) | ์ค‘๊ฐ„ | 193K+ ์กด์žฌ |
| **SimPO** | ref ์—†์ด DPO ํšจ๊ณผ | 1ร— | ์ค‘๊ฐ„ | ๋ฒ”์šฉ ์ ์šฉ |
| **PPO** | RLHF ์™„์ „ ๊ตฌํ˜„ | 3~4ร— | ๋†’์Œ | reward model ํ•„์š” |
**B200 ํ™˜๊ฒฝ์—์„œ ์ถ”์ฒœ**: ORPO ๋˜๋Š” SimPO (reference model ์—†์Œ, ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ)
#### ํ•œ๊ตญ์–ด Preference ๋ฐ์ดํ„ฐ ํ˜„ํ™ฉ (HuggingFace)
```
kuotient/orca-math-korean-preference 193K ์ƒ˜ํ”Œ ์ˆ˜ํ•™ ์ค‘์‹ฌ
heegyu/orca-math-korean-preference-cleaned 192K ์ˆ˜ํ•™ (์ •์ œ๋ณธ)
lemon-mint/korean-realqa-reasoning-v01-preference 7.7K ์ถ”๋ก 
ChuGyouk/argilla-distilabel-math-preference-dpo-korean 2.4K ์†Œ๊ทœ๋ชจ
โ†’ ์ˆ˜ํ•™ ํŠนํ™” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์Œ. ์ผ๋ฐ˜ ํ•œ๊ตญ์–ด preference๋Š” ๋ถ€์กฑ.
โ†’ ์ผ๋ฐ˜ preference๋Š” ์ž์ฒด ์ƒ์„ฑ ๋˜๋Š” ๋ฒˆ์—ญ์œผ๋กœ ๋ณด๊ฐ• ํ•„์š”.
๋ฐฉ๋ฒ•: GPT-4/Claude๋กœ chosen/rejected ์Œ ์ƒ์„ฑ (Self-Play)
```
---
### Phase 4: ๋ฐฐํฌ
#### ์„œ๋น™ ์˜ต์…˜ ๋น„๊ต
| ์˜ต์…˜ | ํŠน์ง• | B200 ์ ํ•ฉ์„ฑ | ์ถ”์ฒœ ์ƒํ™ฉ |
|------|------|-------------|-----------|
| **vLLM** | PagedAttention, ๊ณ ์ฒ˜๋ฆฌ๋Ÿ‰ | โœ… ์ตœ์šฐ์ˆ˜ | API ์„œ๋ฒ„, ๋ฐฐ์น˜ ์ถ”๋ก  |
| **TGI (Text Generation Inference)** | HF ๊ณต์‹, ์•ˆ์ •์  | โœ… ์šฐ์ˆ˜ | HF Hub ์—ฐ๋™ |
| **llama.cpp + GGUF** | CPU/์ €์‚ฌ์–‘ ๊ฐ€๋Šฅ | โš ๏ธ B200์—์„  ๊ณผ์†Œ | ์—ฃ์ง€ ๋ฐฐํฌ, Ollama |
| **Ollama** | ๋กœ์ปฌ ๋ฐฐํฌ ํŽธ์˜์„ฑ | โš ๏ธ | ๊ฐœ์ธ ์‚ฌ์šฉ, ๋ฐ๋ชจ |
**B200 ๊ธฐ์ค€ vLLM ์˜ˆ์ƒ throughput (1.19B ๋ชจ๋ธ)**:
```
1.19B ๋ชจ๋ธ (BF16):
- ๋ฉ”๋ชจ๋ฆฌ: ~2.4GB (ํŒŒ๋ผ๋ฏธํ„ฐ) + KV cache
- ๋‹จ์ผ B200 183GB: KV cache ๊ทน๋Œ€ํ™” ๊ฐ€๋Šฅ
- ์˜ˆ์ƒ throughput: 5,000~15,000 tokens/s (๋ฐฐ์น˜ ์ฒ˜๋ฆฌ)
- ๋‹จ์ผ ์ŠคํŠธ๋ฆฌ๋ฐ: 200~500 tokens/s (์‚ฌ์šฉ์ž ์ฒด๊ฐ)
โ†’ ๋™์‹œ ์‚ฌ์šฉ์ž 100~500๋ช… ์ง€์› ๊ฐ€๋Šฅ (๋‹จ์ผ GPU)
```
#### ์–‘์žํ™” ์˜ต์…˜ (B200 ํ™˜๊ฒฝ)
| ํฌ๋งท | ์ •๋ฐ€๋„ ์†์‹ค | ํฌ๊ธฐ | B200 ์ ํ•ฉ์„ฑ | ์ถ”์ฒœ |
|------|------------|------|-------------|------|
| FP8 (Native) | ์—†์Œ | 1.2GB | โœ… ์ตœ์šฐ์ˆ˜ (HW ์ง€์›) | **์ตœ์šฐ์„ ** |
| BF16 | ์—†์Œ | 2.4GB | โœ… ๊ธฐ๋ณธ | ๊ธฐ์ค€์„  |
| AWQ (W4A16) | ๋งค์šฐ ์ ์Œ | 0.6GB | โœ… ์šฐ์ˆ˜ | ์—ฃ์ง€/์ €๋ฉ”๋ชจ๋ฆฌ |
| GPTQ (W4) | ์ ์Œ | 0.6GB | โœ… ์šฐ์ˆ˜ | CPU ์˜คํ”„๋กœ๋“œ |
| GGUF Q4_K_M | ์ ์Œ | ~0.7GB | โš ๏ธ (CPU์šฉ) | Ollama ๋ฐฐํฌ์šฉ |
**B200 ๊ถŒ์žฅ**: FP8 โ†’ AWQ ์ˆœ์„œ๋กœ ๊ณ ๋ ค. B200์€ FP8 ํ•˜๋“œ์›จ์–ด ์ง€์›์œผ๋กœ ์–‘์žํ™” ์—†์ด ์ด๋ฏธ ํšจ์œจ์ .
#### HuggingFace Hub ์—…๋กœ๋“œ
```
ํ•„์š” ์ž‘์—…:
โ–ก HF ํฌ๋งท ๋ณ€ํ™˜: config.json, model.safetensors, tokenizer_config.json
โ–ก model card ์ž‘์„ฑ (ํ•œ๊ตญ์–ด ์„ค๋ช…, ๋ฒค์น˜๋งˆํฌ ๊ฒฐ๊ณผ, ์‚ฌ์šฉ๋ฒ•)
โ–ก ๋ผ์ด์„ ์Šค ์„ค์ • (Apache 2.0 ๊ถŒ์žฅ)
โ–ก eval ๊ฒฐ๊ณผ ํฌํ•จ
โ–ก Open Ko-LLM Leaderboard ์ œ์ถœ (ํ‰๊ฐ€ ์š”์ฒญ)
```
---
## 3. ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ (์ˆ˜์น˜ ๊ธฐ๋ฐ˜)
```
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
[Phase 1: SFT ํ‰๊ฐ€ ๊ฒฐ๊ณผ]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ”œโ”€โ”€ ko_ifeval > 30% AND ๋ฐ˜๋ณต์œจ < 15%
โ”‚ โ”œโ”€โ”€ ๋ฐ์ดํ„ฐ 150B ๋ชจ๋‘ ์‚ฌ์šฉ ๊ฐ€๋Šฅ? โ†’ Phase 2B (3B ์‚ฌ์ „ํ•™์Šต)
โ”‚ โ””โ”€โ”€ ์ง€๊ธˆ ๋‹น์žฅ ๋ฐฐํฌ๊ฐ€ ๋ชฉํ‘œ? โ†’ Phase 4 (vLLM ์„œ๋น™ + HF ์—…๋กœ๋“œ)
โ”‚
โ”œโ”€โ”€ ko_ifeval 20~30% AND ๋ฐ˜๋ณต์œจ 15~30%
โ”‚ โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ < 10K ์ƒ˜ํ”Œ? โ†’ Phase 2A-B (๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ์ตœ์šฐ์„ )
โ”‚ โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ 10~50K ์ƒ˜ํ”Œ? โ†’ Phase 2A-A (steps ์ถ”๊ฐ€) + 2A-C (ORPO)
โ”‚ โ””โ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ > 50K ์ƒ˜ํ”Œ? โ†’ Phase 2A-A (steps ์ถ”๊ฐ€) OR 2B (3B)
โ”‚
โ”œโ”€โ”€ ko_ifeval 10~20% AND ๋ฐ˜๋ณต์œจ 30~50%
โ”‚ โ”œโ”€โ”€ base ๋ชจ๋ธ๊ณผ SFT ์ฐจ์ด ์—†์Œ? โ†’ SFT ํŒŒ์ดํ”„๋ผ์ธ ๋ฒ„๊ทธ ์ ๊ฒ€
โ”‚ โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์˜์‹ฌ? โ†’ ๋ฐ์ดํ„ฐ ์ „์ˆ˜ ์ ๊ฒ€ ํ›„ Phase 2A-B
โ”‚ โ””โ”€โ”€ base PPL์ด ๋†’์Œ (> 15)? โ†’ ์‚ฌ์ „ํ•™์Šต ๋” ํ•„์š” (๋ฐ์ดํ„ฐ ์ถ”๊ฐ€)
โ”‚
โ””โ”€โ”€ ko_ifeval < 10% OR ๋ฐ˜๋ณต์œจ > 50%
โ”œโ”€โ”€ base ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ์ด๋ฏธ ๋ฐ˜๋ณต > 30%? โ†’ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋ฌธ์ œ
โ”‚ โ””โ”€โ”€ โ†’ cc100 ๋…ธ์ด์ฆˆ ํ•„ํ„ฐ๋ง ํ›„ ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šต
โ”œโ”€โ”€ SFT loss๊ฐ€ ๋ฐœ์‚ฐํ–ˆ๋Š”๊ฐ€? โ†’ ํ•™์Šต๋ฅ /optimizer ์„ค์ • ์žฌ๊ฒ€ํ† 
โ””โ”€โ”€ ๋ชจ๋“  ์ƒ์„ฑ์ด ๋ฌด์˜๋ฏธ? โ†’ ์ฒดํฌํฌ์ธํŠธ ์†์ƒ ํ™•์ธ, ์ด์ „ ์ฒดํฌํฌ์ธํŠธ ๋ณต์›
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
[Phase 2A ๋‚ด๋ถ€ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
Phase 2A ์ง„์ž… ํ›„:
โ”œโ”€โ”€ ํ˜„์žฌ SFT ๋ฐ์ดํ„ฐ < 20K ์ƒ˜ํ”Œ?
โ”‚ โ””โ”€โ”€ โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ•์ด steps ์ถ”๊ฐ€๋ณด๋‹ค ํšจ๊ณผ์  (์ตœ์šฐ์„ )
โ”‚ ๋ฐ์ดํ„ฐ: beomi/KoAlpaca, squarelike/sharegpt_ko, nayohan/llama3-ko
โ”‚
โ”œโ”€โ”€ loss curve๊ฐ€ ์•„์ง ํ•˜๊ฐ• ์ค‘ (step 4000~5000 ์ฐจ์ด > 0.05)?
โ”‚ โ””โ”€โ”€ โ†’ steps 2๋ฐฐ ์ถ”๊ฐ€ ์‹œ๋„ (10k๊นŒ์ง€)
โ”‚
โ”œโ”€โ”€ ๋ฐ˜๋ณต์œจ > 30% (์ฃผ์š” ๋ฌธ์ œ)?
โ”‚ โ””โ”€โ”€ โ†’ ORPO ๋˜๋Š” repetition penalty ์ ์šฉ ๋จผ์ €
โ”‚ ORPO ๋ฐ์ดํ„ฐ: kuotient/orca-math-korean-preference (193K)
โ”‚
โ””โ”€โ”€ ko_ifeval < 20% + ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„์—๋„ ๊ฐœ์„  ์—†์Œ?
โ””โ”€โ”€ โ†’ 3B ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์ „ํ™˜ (1B SFT ํ•œ๊ณ„ ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ)
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
[Phase 2B ๋‚ด๋ถ€ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
3B ์‚ฌ์ „ํ•™์Šต ์ง„ํ–‰ ๊ฒฐ์ • ์‹œ:
โ”œโ”€โ”€ ํ˜„์žฌ 150B ํ† ํฐ์ด ํ•œ๊ตญ์–ด ๋‹จ์ผ ์–ธ์–ด?
โ”‚ โ””โ”€โ”€ โ†’ ์˜์–ด ๋ฐ์ดํ„ฐ 10~30% ํ˜ผํ•ฉ ๊ถŒ์žฅ (cross-lingual transfer)
โ”‚ ์˜์–ด ์ˆ˜ํ•™/์ฝ”๋“œ ํฌํ•จํ•˜๋ฉด ko_gsm8k ๋“ฑ ์ถ”๊ฐ€ ๊ฐœ์„  ๊ฐ€๋Šฅ
โ”‚
โ”œโ”€โ”€ cc100 ko ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์™„๋ฃŒ?
โ”‚ โ””โ”€โ”€ No โ†’ CulturaX 24.8B๋งŒ์œผ๋กœ ์‹œ์ž‘ ๊ฐ€๋Šฅ (60B ๋ชฉํ‘œ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ)
โ”‚
โ””โ”€โ”€ 3B ํ•™์Šต ์ค‘ ์ค‘๊ฐ„ checkpoint์—์„œ SFT ํ…Œ์ŠคํŠธ?
โ””โ”€โ”€ โ†’ 1B๋ณด๋‹ค 3B base๊ฐ€ SFT ๋ฐ˜์‘์„ฑ์ด ๋†’์œผ๋ฉด 3B SFT๋กœ ๋ฐ”๋กœ ์ง„ํ–‰
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
[Phase 4 ๋ฐฐํฌ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
๋ฐฐํฌ ๋ฐฉ์‹ ์„ ํƒ:
โ”œโ”€โ”€ ์—ฐ๊ตฌ/๋ฐ๋ชจ ๋ชฉ์ ?
โ”‚ โ””โ”€โ”€ โ†’ HF Hub ์—…๋กœ๋“œ + Gradio Space ์ƒ์„ฑ (๋ฌด๋ฃŒ)
โ”‚
โ”œโ”€โ”€ ๋‚ด๋ถ€ API ์„œ๋น™?
โ”‚ โ””โ”€โ”€ โ†’ vLLM (FP8 native) + OpenAI ํ˜ธํ™˜ ์—”๋“œํฌ์ธํŠธ
โ”‚ ์ปค๋งจ๋“œ: vllm serve ./checkpoints/korean_1b_sft --dtype fp8
โ”‚
โ”œโ”€โ”€ ๊ฐœ์ธ/ํŒ€ ๋กœ์ปฌ ์‚ฌ์šฉ?
โ”‚ โ””โ”€โ”€ โ†’ GGUF Q4_K_M ๋ณ€ํ™˜ + Ollama (์ด๋ฏธ Modelfile ์กด์žฌ)
โ”‚
โ””โ”€โ”€ Open Ko-LLM ๋ฆฌ๋”๋ณด๋“œ ๋“ฑ์žฌ?
โ””โ”€โ”€ โ†’ HF Hub ์—…๋กœ๋“œ ํ•„์ˆ˜ โ†’ ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ ์–‘์‹ ์ž‘์„ฑ
```
---
## 4. ์ถ”๊ฐ€ ํ™•์žฅ Job ํ›„๋ณด๊ตฐ (์šฐ์„ ์ˆœ์œ„ ์ˆœ)
### ์ฆ‰์‹œ ๊ฐ€๋Šฅ (์ง€๊ธˆ ์„œ๋ฒ„์—์„œ ๋ฐ”๋กœ, ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ๋ถˆํ•„์š”)
| ์šฐ์„ ์ˆœ์œ„ | Job | ์˜ˆ์ƒ ์‹œ๊ฐ„ | ๊ธฐ๋Œ€ ํšจ๊ณผ |
|----------|-----|-----------|-----------|
| โญโญโญ | **SFT ๋ชจ๋ธ ์ƒ์„ฑ ํ…Œ์ŠคํŠธ** (temperature sampling) | 30๋ถ„ | ๋ฐ˜๋ณต์œจ ํ˜„ํ™ฉ ํŒŒ์•… |
| โญโญโญ | **lm-eval-harness ์„ค์น˜ + ko_ifeval ์‹คํ–‰** | 2~4์‹œ๊ฐ„ | ๊ณต์‹ ๋ฒค์น˜๋งˆํฌ ์ˆ˜์น˜ |
| โญโญโญ | **ko_winogrande ์‹คํ–‰** | 1~2์‹œ๊ฐ„ | ์–ธ์–ด ์ดํ•ด ์ˆ˜์น˜ |
| โญโญ | **Base vs SFT ๋น„๊ต ์ƒ์„ฑ** (๋™์ผ ํ”„๋กฌํ”„ํŠธ) | 1์‹œ๊ฐ„ | SFT ํšจ๊ณผ ์ธก์ • |
| โญโญ | **SFT ํ•™์Šต ์†์‹ค ๊ณก์„  ๋ถ„์„** (tensorboard) | 30๋ถ„ | ์ˆ˜๋ ด ์—ฌ๋ถ€ ํŒ๋‹จ |
| โญโญ | **๋ฐ˜๋ณต ํ‡ดํ™” ์ •๋Ÿ‰ ์ธก์ •** (repetition_penalty ํšจ๊ณผ) | 1์‹œ๊ฐ„ | ๋ฐฐํฌ ๊ฐ€๋Šฅ์„ฑ ํŒ๋‹จ |
| โญ | **vLLM ์„œ๋น™ ํ…Œ์ŠคํŠธ** (FP8) | 1~2์‹œ๊ฐ„ | throughput ์ธก์ • |
| โญ | **HF ํฌ๋งท ๋ณ€ํ™˜** (config.json, safetensors) | 2~3์‹œ๊ฐ„ | HF Hub ์—…๋กœ๋“œ ์ค€๋น„ |
### ๋ฐ์ดํ„ฐ ์ค€๋น„ ํ•„์š”
| ์šฐ์„ ์ˆœ์œ„ | Job | ์ค€๋น„ ์‹œ๊ฐ„ | ๊ธฐ๋Œ€ ํšจ๊ณผ |
|----------|-----|-----------|-----------|
| โญโญโญ | **SFT ๋ฐ์ดํ„ฐ ๋ณด๊ฐ•** (KoAlpaca + ShareGPT-ko 50K~) | 1~2์ผ | ko_ifeval +5~15%p |
| โญโญโญ | **cc100 ์žฌ์ˆ˜์ง‘** (๋ฒ„๊ทธ ์ˆ˜์ • ํ›„) | 0.5~1์ผ | 150B+ ํ† ํฐ ํ™•๋ณด |
| โญโญ | **ORPO ๋ฐ์ดํ„ฐ ์ค€๋น„** (orca-math-korean 193K) | 0.5์ผ | ๋ฐ˜๋ณต ํ‡ดํ™” -20%p |
| โญโญ | **3B ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ** (150B ํ† ํฐ ํ†ตํ•ฉ) | 1~2์ผ | 3B ํ•™์Šต ์ค€๋น„ |
| โญ | **์ผ๋ฐ˜ ํ•œ๊ตญ์–ด preference ๋ฐ์ดํ„ฐ ์ƒ์„ฑ** (GPT-4 ํ™œ์šฉ) | 3~7์ผ | ๋ฒ”์šฉ ORPO/DPO |
| โญ | **์˜์–ด/์ฝ”๋“œ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€** (10~30% ํ˜ผํ•ฉ) | 1~3์ผ | ์ˆ˜ํ•™/์ฝ”๋“œ ๊ฐœ์„  |
### ์™ธ๋ถ€ ๋ฆฌ์†Œ์Šค ํ•„์š”
| ์šฐ์„ ์ˆœ์œ„ | Job | ํ•„์š” ๋ฆฌ์†Œ์Šค | ๊ธฐ๋Œ€ ํšจ๊ณผ |
|----------|-----|-------------|-----------|
| โญโญ | **HuggingFace Hub ๊ณ„์ • ์—…๋กœ๋“œ** | HF ๊ณ„์ •, ์ธํ„ฐ๋„ท | ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ ๊ฐ€๋Šฅ |
| โญโญ | **Open Ko-LLM Leaderboard ์ œ์ถœ** | HF ๊ณ„์ • | ๊ณต์‹ ์ˆœ์œ„ ํ™•์ธ |
| โญ | **KoMT-Bench / LogicKor ํ‰๊ฐ€** | ์™ธ๋ถ€ API ๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ | ์งˆ์  ํ‰๊ฐ€ |
| โญ | **VRAM ์ฆ์„ค ๋˜๋Š” Multi-GPU SFT** | ํ˜„์žฌ 12GB โ†’ ๊ฐ€๋Šฅ ๋” ํ•„์š”? | ๋” ํฐ ๋ฐฐ์น˜ |
---
## 5. ๋ฆฌ์Šคํฌ ๋ถ„์„
### 5.1 ํ˜„์žฌ ํ•™์Šต ๋ฐฉ์‹์˜ ์ž ์žฌ์  ๋ฌธ์ œ์ 
| ๋ฆฌ์Šคํฌ | ์‹ฌ๊ฐ๋„ | ํ˜„์žฌ ์ฆ๊ฑฐ | ์™„ํ™” ๋ฐฉ๋ฒ• |
|--------|--------|-----------|-----------|
| SFT steps ๊ณผ์†Œ (5k) | ๐Ÿ”ด ๋†’์Œ | epoch 2 ์ง„์ž…, loss ์•„์ง 1.97 | steps ์ฆ๊ฐ€ ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• |
| ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ€์กฑ (~8.91B) | ๐ŸŸก ์ค‘๊ฐ„ | Chinchilla ๋Œ€๋น„ 1B ร— 20 = 20B ํ•„์š” โ†’ ๋ฏธ๋‹ฌ | 150B ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ํ•™์Šต |
| ์ฝ”๋“œ/์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ์—†์Œ | ๐ŸŸก ์ค‘๊ฐ„ | ko_gsm8k ๊ฑฐ์˜ 0 ์˜ˆ์ƒ | ์˜์–ด ์ฝ”๋“œ/์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ํ˜ผํ•ฉ |
| Greedy decoding ๋ฐ˜๋ณต ํ‡ดํ™” | ๐Ÿ”ด ๋†’์Œ | base์—์„œ 30% ๋ฐœ์ƒ ํ™•์ธ | SFT + repetition_penalty + ORPO |
### 5.2 cc100 ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ด์Šˆ
**์•Œ๋ ค์ง„ ๋ฌธ์ œ**:
- cc100์€ CommonCrawl์—์„œ ์ถ”์ถœ๋œ ์›น ํ…์ŠคํŠธ๋กœ **๋…ธ์ด์ฆˆ๊ฐ€ ์‹ฌํ•จ**
- ํ•œ๊ตญ์–ด cc100 ํŠนํžˆ: ๊ด‘๊ณ  ํ…์ŠคํŠธ, ์ŠคํŒธ, ๋ฐ˜๋ณต ์ฝ˜ํ…์ธ  ๋‹ค์ˆ˜
- ์ค‘๋ณต๋ฅ : ๋ฌธ์„œ ์ˆ˜์ค€ ์ค‘๋ณต 10~30% ์ถ”์ • (MinHash ์ œ๊ฑฐ ํ•„์š”)
**์‹ค์ œ ์˜ํ–ฅ**:
```
๋…ธ์ด์ฆˆ ํฌํ•จ ํ•™์Šต โ†’ ๋ชจ๋ธ์ด ๊ด‘๊ณ /์ŠคํŒธ ํŒจํ„ด ํ•™์Šต โ†’ ์ƒ์„ฑ ํ’ˆ์งˆ ์ €ํ•˜
์ค‘๋ณต ๋ฐ์ดํ„ฐ โ†’ ํŠน์ • ํŒจํ„ด ๊ณผ๋„ ์•”๊ธฐ โ†’ ๋ฐ˜๋ณต ํ‡ดํ™” ์•…ํ™”
```
**๊ถŒ์žฅ ์ „์ฒ˜๋ฆฌ**:
```bash
# 1. ์ค‘๋ณต ์ œ๊ฑฐ (MinHash LSH)
python scripts/dedup_minhash.py --input cc100_ko.bin --threshold 0.8
# 2. ํ’ˆ์งˆ ํ•„ํ„ฐ๋ง (perplexity ๊ธฐ๋ฐ˜)
# ๋‚ฎ์€ ํ’ˆ์งˆ ํ…์ŠคํŠธ: PPL > 1000 ์ œ๊ฑฐ
python scripts/quality_filter.py --max_ppl 1000
# 3. ๊ธธ์ด ํ•„ํ„ฐ๋ง
# ๋„ˆ๋ฌด ์งง์€ ๋ฌธ์žฅ (< 50 tokens) ์ œ๊ฑฐ
```
### 5.3 Tokenizer ์„ ํƒ (korean_sp 64K)์˜ ์˜ํ–ฅ
**ํ˜„์žฌ ์„ค์ •**: SentencePiece Unigram 64K vocab, ํ•œ๊ตญ์–ด ํŠนํ™”
**์žฅ์ **:
- ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„๋ฆฌ์— ์ตœ์ ํ™” โ†’ ํšจ์œจ์  ์ธ์ฝ”๋”ฉ
- 64K vocab์œผ๋กœ ์˜์–ด vs ํ•œ๊ตญ์–ด token fertility ๊ท ํ˜•
- ํ•œ๊ตญ์–ด ๊ธ€์ž 1๊ฐœ = ํ‰๊ท  1.2~1.8 tokens (BPE ๋Œ€๋น„ ํšจ์œจ์ )
**์ž ์žฌ์  ๋ฌธ์ œ**:
| ๋ฌธ์ œ | ์‹ฌ๊ฐ๋„ | ์„ค๋ช… |
|------|--------|------|
| ์˜์–ด vocabulary ๋ถ€์กฑ | ๐ŸŸก ์ค‘๊ฐ„ | ์˜์–ด ์ฝ”๋“œ/์ˆ˜ํ•™ ์ฒ˜๋ฆฌ ํšจ์œจ ๋‚ฎ์Œ (byte fallback) |
| ๊ธฐ์กด ๋ชจ๋ธ๊ณผ ํ˜ธํ™˜ ๋ถˆ๊ฐ€ | ๐ŸŸก ์ค‘๊ฐ„ | RLHF ๋ฐ์ดํ„ฐ ์žฌํ† ํฌ๋‚˜์ด์ง• ํ•„์š” |
| ์‹ ์กฐ์–ด/์™ธ๋ž˜์–ด ์ฒ˜๋ฆฌ | ๐ŸŸก ์ค‘๊ฐ„ | OOV ์ฒ˜๋ฆฌ๋Š” byte fallback์ด์ง€๋งŒ ๋А๋ฆผ |
| ํ‘œ์ค€ Llama/Mistral ํ† ํฌ๋‚˜์ด์ €์™€ ๋‹ค๋ฆ„ | ๐ŸŸข ๋‚ฎ์Œ | HF ์—…๋กœ๋“œ ์‹œ tokenizer ํฌํ•จํ•˜๋ฉด OK |
**์™„ํ™”**:
- ํ–ฅํ›„ 3B ๋ชจ๋ธ์—์„œ๋Š” **tiktoken (cl100k_base) ๋˜๋Š” Llama ๊ณ„์—ด ํ† ํฌ๋‚˜์ด์ € ์ฑ„ํƒ** ๊ณ ๋ ค
- ํ˜„์žฌ 1.19B ๋ชจ๋ธ์€ ํ˜„์žฌ ํ† ํฌ๋‚˜์ด์ € ์œ ์ง€ (์žฌํ•™์Šต ๋น„์šฉ too high)
---
## 6. ์‹œ๋‚˜๋ฆฌ์˜ค ๋ชฉ๋ก ("๋งŒ์•ฝ X๋ผ๋ฉด Y๋ฅผ ํ•ด์•ผ ํ•œ๋‹ค")
| # | ์กฐ๊ฑด (IF) | ์•ก์…˜ (THEN) |
|---|-----------|-------------|
| 1 | ko_ifeval > 30% AND ๋ฐ˜๋ณต < 15% | โ†’ ์ฆ‰์‹œ HF Hub ์—…๋กœ๋“œ + ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ + 3B ์‚ฌ์ „ํ•™์Šต ๋ณ‘๋ ฌ ์ง„ํ–‰ |
| 2 | ko_ifeval 20~30% AND ๋ฐ˜๋ณต 15~30% | โ†’ KoAlpaca+ShareGPT-ko๋กœ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„ 10k steps SFT ์žฌ์‹คํ–‰ |
| 3 | ko_ifeval < 20% AND base์™€ ์ฐจ์ด ์—†์Œ | โ†’ SFT ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๋ฒ„๊ทธ ์ ๊ฒ€ (๋ฐ์ดํ„ฐ ๋กœ๋”ฉ, ํฌ๋งท ํ™•์ธ) |
| 4 | ๋ฐ˜๋ณต์œจ > 40% | โ†’ ORPO (orca-math-korean 193K) ์ฆ‰์‹œ ์ ์šฉ |
| 5 | ๋ชจ๋“  SFT ์‹œ๋„ ํ›„์—๋„ ko_ifeval < 20% | โ†’ 1B ํ•œ๊ณ„ ์ธ์ •, 3B ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์ „ํ™˜ |
| 6 | cc100 ์ˆ˜์ง‘ ์™„๋ฃŒ (65~100B) | โ†’ 3B ์‚ฌ์ „ํ•™์Šต ๋ฐ”๋กœ ์‹œ์ž‘ (26์‹œ๊ฐ„, 8ร— B200) |
| 7 | 3B base PPL < 8 ๋‹ฌ์„ฑ | โ†’ 3B SFT (KoAlpaca + ORPO) โ†’ ๋ฆฌ๋”๋ณด๋“œ ๋ชฉํ‘œ ko_ifeval 40%+ |
| 8 | ์„œ๋น„์Šค ๋ฐฐํฌ ๊ฒฐ์ • | โ†’ vLLM FP8 ์„œ๋น™ + GGUF Q4_K_M Ollama ๋ณ‘ํ–‰ |
| 9 | ์ˆ˜ํ•™/์ฝ”๋“œ ์„ฑ๋Šฅ ํ•„์š” | โ†’ ์˜์–ด ์ˆ˜ํ•™+์ฝ”๋“œ ๋ฐ์ดํ„ฐ 20% ํ˜ผํ•ฉํ•˜์—ฌ 3B ์žฌํ•™์Šต |
| 10 | ํ•œ๊ตญ์–ด preference ๋ฐ์ดํ„ฐ ์ž์ฒด ์ƒ์„ฑ ์›ํ•จ | โ†’ Claude/GPT-4๋กœ chosen/rejected ์Œ 10K ์ƒ์„ฑ ํ›„ DPO |
---
## 7. ์ „์ฒด ํƒ€์ž„๋ผ์ธ
```
ํ˜„์žฌ (2026-02-26)
โ”‚
โ”œโ”€ Week 1: Phase 1 ๊ฒ€์ฆ
โ”‚ โ”œโ”€ D+0: SFT ์ƒ์„ฑ ํ…Œ์ŠคํŠธ (30๋ถ„)
โ”‚ โ”œโ”€ D+0: lm-eval ko_ifeval + ko_winogrande (4์‹œ๊ฐ„)
โ”‚ โ””โ”€ D+2: ๊ฒฐ๊ณผ ๋ถ„์„ + ๋‹ค์Œ ๋‹จ๊ณ„ ๊ฒฐ์ •
โ”‚
โ”œโ”€ Week 2~3: Phase 2A ๋˜๋Š” 2B ๊ฒฐ์ • ํ›„ ์‹คํ–‰
โ”‚ โ”œโ”€ [2A ๊ฒฝ๋กœ] ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• (3~5์ผ) + ์žฌํ•™์Šต (1~2์ผ)
โ”‚ โ””โ”€ [2B ๊ฒฝ๋กœ] 3B ์‚ฌ์ „ํ•™์Šต (26์‹œ๊ฐ„) + 3B SFT (3~6์‹œ๊ฐ„)
โ”‚
โ”œโ”€ Week 4: Phase 3 (ํ•„์š”์‹œ)
โ”‚ โ””โ”€ ORPO ํ•™์Šต (193K ๋ฐ์ดํ„ฐ, 3~6์‹œ๊ฐ„)
โ”‚
โ””โ”€ Week 4~5: Phase 4 ๋ฐฐํฌ
โ”œโ”€ HF ํฌ๋งท ๋ณ€ํ™˜ (2~3์‹œ๊ฐ„)
โ”œโ”€ HF Hub ์—…๋กœ๋“œ + Model Card
โ”œโ”€ vLLM ์„œ๋น™ ์„ค์ •
โ””โ”€ Ko-LLM ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ
์ด ์˜ˆ์ƒ ๊ธฐ๊ฐ„: 3~5์ฃผ (3B ์Šค์ผ€์ผ์—… ํฌํ•จ)
```
---
## 8. ์ฆ‰๊ฐ์ ์ธ ๋‹ค์Œ ๋‹จ๊ณ„ (Action Items)
```bash
# Step 1: lm-evaluation-harness ์„ค์น˜
pip install lm-eval
# Step 2: ko_ifeval ์‹คํ–‰ (SFT ์ฒดํฌํฌ์ธํŠธ)
lm_eval \
--model hf \
--model_args pretrained=/PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_1b_sft/checkpoint-0005000,dtype=bfloat16 \
--tasks ko_ifeval \
--device cuda:0 \
--output_path ./eval/results/sft_5k_ko_ifeval.json
# Step 3: ko_winogrande ์‹คํ–‰
lm_eval \
--model hf \
--model_args pretrained=/PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_1b_sft/checkpoint-0005000,dtype=bfloat16 \
--tasks ko_winogrande \
--device cuda:0 \
--output_path ./eval/results/sft_5k_ko_winogrande.json
```
---
*์ด ๋ฌธ์„œ๋Š” ํ‰๊ฐ€ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ์—…๋ฐ์ดํŠธ ์˜ˆ์ •.*
*๋‹ค์Œ ์—…๋ฐ์ดํŠธ: Phase 1 ํ‰๊ฐ€ ์™„๋ฃŒ ํ›„ (์˜ˆ์ƒ: D+1~2)*