frankenstallm / source /eval /roadmap_framework.md
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d

ํ•œ๊ตญ์–ด LLM ์ „์ฒด ๋กœ๋“œ๋งต & ์˜์‚ฌ๊ฒฐ์ • ํ”„๋ ˆ์ž„์›Œํฌ

์ž‘์„ฑ์ผ: 2026-02-26
ํ˜„์žฌ ์ƒํƒœ: SFT 5,000 steps ์™„๋ฃŒ (loss 1.9677), 1.19B ํŒŒ๋ผ๋ฏธํ„ฐ ๋ชจ๋ธ
๋ชฉํ‘œ: ์‹ค์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํ•œ๊ตญ์–ด LLM ๋ฐฐํฌ


0. TL;DR โ€” ์ง€๊ธˆ ๋‹น์žฅ ํ•  ์ผ

  1. SFT ๋ชจ๋ธ ๋น ๋ฅธ ์ƒ์„ฑ ํ…Œ์ŠคํŠธ (30๋ถ„, ์˜ค๋Š˜): temperature sampling์œผ๋กœ ๋ฐ˜๋ณต ํ‡ดํ™” ํ™•์ธ
  2. lm-eval-harness ko_ifeval + ko_winogrande ์‹คํ–‰ (2~4์‹œ๊ฐ„): ์ˆซ์ž ํ™•์ธ
  3. ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ๋ถ„๊ธฐ โ†’ ์•„๋ž˜ ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ ์ฐธ์กฐ

1. ํ˜„์žฌ ์œ„์น˜ ํŒŒ์•…

1.1 SFT ํ•™์Šต ํ˜„ํ™ฉ ์š”์•ฝ

ํ•ญ๋ชฉ ๊ฐ’
Steps 5,000
Final Loss 1.9677
ํ•™์Šต ์‹œ๊ฐ„ 0.61h (~37๋ถ„)
์ฒ˜๋ฆฌ ์†๋„ ~75,700 tok/s (๋‹จ์ผ B200)
LR (final) 2.00e-06 (์™„์ „ํžˆ decay๋จ)
Gradient Norm ์•ˆ์ • (1.0~1.4 ๋ฒ”์œ„)
SFT ๋ฐ์ดํ„ฐ ์•Œ ์ˆ˜ ์—†์Œ (ํ™•์ธ ํ•„์š”)

์ฃผ๋ชฉ: 5,000 steps๋Š” ๋งค์šฐ ์ ์€ ์–‘์ด๋‹ค. SFT์—์„œ ๋ณดํ†ต 1~3 ์—ํญ์„ ๋Œ๋ฆฌ๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์— ๋”ฐ๋ผ steps ์ถฉ๋ถ„์„ฑ์ด ๊ฒฐ์ •๋œ๋‹ค.

1.2 ์—…๊ณ„ ๋‚ด ์œ„์น˜

๋ฒค์น˜๋งˆํฌ ๊ธฐ์ค€ (Open Ko-LLM Leaderboard ์‹ค์ธก์น˜ ๊ธฐ๋ฐ˜ ์ถ”์ •)

๋ชจ๋ธ                          ๊ทœ๋ชจ      ko_ifeval   ko_winogrande   ๋น„๊ณ 
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
EXAONE-3.0-7.8B-Instruct     7.8B      ~55%        ~80%+           8T tokens, SFT+DPO
Llama-3.1-8B-Korean-SFT      8B        ~40%        ~72%            Llama ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์ ์‘
SOLAR-10.7B-Instruct          10.7B     ~50%        ~78%            ์—…์Šคํ…Œ์ด์ง€
Gemma-2-9B-Korean             9B        ~45%        ~75%            Google ๊ธฐ๋ฐ˜

โ”€โ”€ ํ˜„์‹ค์  1B SFT ๋ฒค์น˜๋งˆํฌ (ํƒ€์‚ฌ ์‚ฌ๋ก€) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
dltjdgh0928/test_instruction  ~?B       24.1%       57.1%           ๋ฆฌ๋”๋ณด๋“œ ์‹ค์ธก
lookuss/test-llilu             ~?B       22.9%       58.2%           ๋ฆฌ๋”๋ณด๋“œ ์‹ค์ธก
generic 1B SFT (์ถ”์ •)          1~2B      20-30%      52-62%          ํ˜„์‹ค์  ๋ฒ”์œ„

โ”€โ”€ ์šฐ๋ฆฌ ๋ชจ๋ธ ์˜ˆ์ƒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
korean_1b_sft (5k steps)      1.19B     15-28%?     50-58%?         ํ‰๊ฐ€ ์ „ ์ถ”์ •

ํ•ต์‹ฌ ๊ฒฉ์ฐจ ๋ถ„์„

๋น„๊ต ๋Œ€์ƒ ํŒŒ๋ผ๋ฏธํ„ฐ ์ฐจ์ด ์˜ˆ์ƒ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ ์ฃผ์š” ์ด์œ 
EXAONE-3.0-7.8B 6.6ร— ๋งค์šฐ ํผ ๊ทœ๋ชจ + ๋ฐ์ดํ„ฐ + DPO
8B ํ•œ๊ตญ์–ด SFT 6.7ร— ํผ ๊ทœ๋ชจ ์ฐจ์ด๊ฐ€ ์ง€๋ฐฐ์ 
ํƒ€์‚ฌ 1B SFT ์œ ์‚ฌ ์ž‘์Œ~์ค‘๊ฐ„ ๋ฐ์ดํ„ฐ/ํ•™์Šต ๋ฐฉ๋ฒ• ์ฐจ์ด

ํ˜„์‹ค์  ํ‰๊ฐ€: 1B ๋ชจ๋ธ์€ 7~10B ๋ชจ๋ธ๊ณผ direct ๊ฒฝ์Ÿ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜:

  • ์—ฃ์ง€ ๋ฐฐํฌ (๋กœ์ปฌ ์„œ๋น™, ์ €์ง€์—ฐ API): 1B๊ฐ€ ๋ช…ํ™•ํ•œ ์šฐ์œ„
  • ๋ฆฌ์†Œ์Šค ํšจ์œจ: 1B๋Š” ๋‹จ์ผ GPU, ์‹ฌ์ง€์–ด CPU์—์„œ๋„ ๊ตฌ๋™
  • ํŠนํ™” ๋„๋ฉ”์ธ: ํ•œ๊ตญ์–ด ํŠนํ™” fine-tuning์œผ๋กœ ํŠน์ • ํƒœ์Šคํฌ์—์„œ ๋Œ€ํ˜• ๋ฒ”์šฉ ๋ชจ๋ธ ๊ทผ์ ‘ ๊ฐ€๋Šฅ

1.3 1B ๊ทœ๋ชจ์˜ ํ•œ๊ณ„์™€ ๊ฐ€๋Šฅ์„ฑ

ํ˜„์‹ค์  ํ•œ๊ณ„:

  • ko_ifeval 30% ์ดˆ๊ณผ ์–ด๋ ค์›€ (instruction following ๋ณต์žก๋„)
  • ์ˆ˜ํ•™/์ฝ”๋“œ: ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ์—†์œผ๋ฉด ๊ฑฐ์˜ ๋ถˆ๊ฐ€
  • ์žฅ๋ฌธ ๋งฅ๋ฝ ์ดํ•ด: 4K context์—์„œ degradation ์‹œ์ž‘
  • ์‚ฌ์‹ค ๊ธฐ์–ต: ์„ธ๋ฐ€ํ•œ ์‚ฌ์‹ค ์ €์žฅ capacity ๋ถ€์กฑ

๊ฐ€๋Šฅํ•œ ๊ฒƒ:

  • ํ•œ๊ตญ์–ด ๊ธฐ๋ณธ QA, ์š”์•ฝ, ๋ถ„๋ฅ˜
  • ๊ฐ„๋‹จํ•œ ์ง€์‹œ ๋”ฐ๋ฅด๊ธฐ (1~2๋‹จ๊ณ„)
  • ํ•œ๊ตญ์–ด ์ž๋™์™„์„ฑ, ๊ต์ •
  • ๋„๋ฉ”์ธ ํŠนํ™” ํƒœ์Šคํฌ (์ œํ•œ๋œ ํ˜•์‹)

2. ๋‹จ๊ณ„๋ณ„ ๋กœ๋“œ๋งต

Phase 1: SFT ๊ฒ€์ฆ (์ง€๊ธˆ โ†’ ~1์ฃผ)

๋ชฉํ‘œ

SFT 5,000 steps ๊ฒฐ๊ณผ๊ฐ€ ์‹ค์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์ธ์ง€ ํŒ์ •

์ฒดํฌ๋ฆฌ์ŠคํŠธ

โ–ก 1-1. ์ƒ์„ฑ ํ’ˆ์งˆ ๋น ๋ฅธ ์ ๊ฒ€ (30๋ถ„)
   - temperature=0.8, top_p=0.9์œผ๋กœ 10๊ฐœ ํ”„๋กฌํ”„ํŠธ ์ƒ์„ฑ
   - ์ฒดํฌ: ๋ฐ˜๋ณต ํ‡ดํ™” ๋น„์œจ (๋ชฉํ‘œ: < 20%)
   - ์ฒดํฌ: ํ•œ๊ตญ์–ด ์–ด๋ฏธ/์กฐ์‚ฌ ์ฒ˜๋ฆฌ ์ž์—ฐ์Šค๋Ÿฌ์šด๊ฐ€
   - ์ฒดํฌ: instruction ๋”ฐ๋ฅด๋Š”๊ฐ€ (base์™€ ๋น„๊ต)

โ–ก 1-2. ๊ณต์‹ ๋ฒค์น˜๋งˆํฌ (2~4์‹œ๊ฐ„)
   - lm-evaluation-harness ์„ค์น˜ ๋ฐ ko_ifeval ์‹คํ–‰
   - lm-evaluation-harness ko_winogrande ์‹คํ–‰
   - ์„ ํƒ: ko_gsm8k (์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ์—†์œผ๋ฉด skip ๊ฐ€๋Šฅ)

โ–ก 1-3. SFT ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ ๊ฒ€
   - SFT ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹ ํ™•์ธ
   - ๋ฐ์ดํ„ฐ ์ˆ˜ (๋ช‡ ๊ฐœ ์ƒ˜ํ”Œ์ธ๊ฐ€?)
   - 5,000 steps ร— batch_size = ์ด ํ† ํฐ ์ˆ˜ ์‚ฐ์ถœ
   - ์—ํญ ์ˆ˜ ๊ณ„์‚ฐ: epoch 2์— ์ง„์ž…ํ–ˆ์œผ๋ฏ€๋กœ ์ตœ์†Œ 1 ์—ํญ ์™„๋ฃŒ ํ™•์ธ๋จ

โ–ก 1-4. Base vs SFT ๋น„๊ต
   - ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— base (pretrained)์™€ SFT ๊ฒฐ๊ณผ ๋น„๊ต
   - SFT๊ฐ€ instruction following ๋Šฅ๋ ฅ์„ ๋ถ€์—ฌํ–ˆ๋Š”๊ฐ€?

Pass/Fail ๊ธฐ์ค€ (์ˆ˜์น˜ํ™”)

์ง€ํ‘œ Pass โœ… ๊ฒฝ๊ณ„์„  โš ๏ธ Fail โŒ
ko_ifeval (prompt strict) > 25% 15~25% < 15%
ko_winogrande > 53% 50~53% < 50%
๋ฐ˜๋ณต ํ‡ดํ™”์œจ (greedy) < 20% 20~40% > 40%
temperature ์ƒ˜ํ”Œ๋ง ํ’ˆ์งˆ ์ž์—ฐ์Šค๋Ÿฌ์›€ ์–ด์ƒ‰ํ•จ ๋ฌด์˜๋ฏธ
Base ๋Œ€๋น„ SFT ๊ฐœ์„  ๋ช…ํ™•ํ•œ instruction ๋”ฐ๋ฅด๊ธฐ ๋ฏธ๋ฏธํ•œ ๊ฐœ์„  ๊ฐœ์„  ์—†์Œ/์•…ํ™”

์ฐธ๊ณ : ko_winogrande 50%๋Š” random (binary choice) ์ˆ˜์ค€. ์‹ค์งˆ์  ์˜๋ฏธ ์žˆ์œผ๋ ค๋ฉด 53%+.

์‹คํŒจ ์‹œ ๋Œ€์‘

  • ko_ifeval < 15% + ๋ฐ˜๋ณต > 40%: SFT ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ๋˜๋Š” steps ๋ถ€์กฑ โ†’ Phase 2A
  • Base ๋Œ€๋น„ ๊ฐœ์„  ์—†์Œ: SFT ๋ฐ์ดํ„ฐ ํ˜•์‹/ํ’ˆ์งˆ ์ ๊ฒ€, ํ•™์Šต๋ฅ  ์žฌ๊ฒ€ํ† 
  • ๋ชจ๋“  ์ง€ํ‘œ Fail: ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ๋ถ€ํ„ฐ ์žฌ๊ฒ€ํ† 

Phase 2A: SFT ๊ฐœ์„  (์„ ํƒ์ , Phase 1 ๊ฒฐ๊ณผ ๊ธฐ์ค€)

์–ธ์ œ ์ง„์ž…ํ•˜๋Š”๊ฐ€?

Phase 2A ์ง„์ž… ์กฐ๊ฑด:
โ”œโ”€โ”€ ko_ifeval < 25% AND ๋ฐ˜๋ณต > 20%     โ†’ ์ฆ‰์‹œ ์ง„์ž…
โ”œโ”€โ”€ ko_ifeval 25-30% AND ๋ฐ˜๋ณต 20-30%   โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„ ์ง„์ž…
โ””โ”€โ”€ ko_ifeval > 30% AND ๋ฐ˜๋ณต < 15%     โ†’ Phase 2B ๋˜๋Š” 4๋กœ ๋ฐ”๋กœ

์˜ต์…˜๋ณ„ ๋ถ„์„

์˜ต์…˜ A: Steps ์ฆ๊ฐ€ (5k โ†’ 10k~20k)

  • ์–ธ์ œ: ๋ฐ์ดํ„ฐ๋Š” ์ถฉ๋ถ„ํ•˜๊ณ  ์•„์ง ์ˆ˜๋ ดํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ
  • ํ™•์ธ ๋ฐฉ๋ฒ•: Loss ๊ณก์„ ์ด ์•„์ง ํ•˜๊ฐ• ์ค‘์ธ๊ฐ€? (5,000 steps์—์„œ 1.97 โ€” ์ˆ˜๋ ด ๊ทผ์ ‘)
  • ์˜ˆ์ƒ ํšจ๊ณผ: ์†Œํญ ๊ฐœ์„  (loss 1.97 โ†’ 1.80 ๋ชฉํ‘œ, ko_ifeval +3~7%p ์˜ˆ์ƒ)
  • ๋น„์šฉ: B200 1๊ฐœ ๊ธฐ์ค€ 1.5~3์‹œ๊ฐ„ ์ถ”๊ฐ€
  • ์ฃผ์˜: ์ด๋ฏธ epoch 2์— ์ง„์ž… โ€” ๊ณผ์ ํ•ฉ ์œ„ํ—˜ ์žˆ์Œ

์˜ต์…˜ B: ๋” ์ข‹์€ ๋ฐ์ดํ„ฐ

  • ์–ธ์ œ: ํ˜„์žฌ SFT ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜ ํ’ˆ์งˆ์ด ๋‚ฎ์„ ๋•Œ (๊ฐ€์žฅ ํ”ํ•œ ์ด์œ )
  • ์ถ”์ฒœ ๋ฐ์ดํ„ฐ์…‹:
    • beomi/KoAlpaca-v1.1a โ€” 21K ํ•œ๊ตญ์–ด instruction
    • HAERAE-HUB/KMMLU โ€” ํ•œ๊ตญ์–ด ์ง€์‹ QA
    • nayohan/llama3-baseline-ko-dataset โ€” ๋‹ค์–‘ํ•œ instruction
    • squarelike/sharegpt_deepl_ko_ko-en โ€” ShareGPT ํ•œ๊ตญ์–ด
    • ํ•ฉ์‚ฐ ๋ชฉํ‘œ: 50K~200K ๊ณ ํ’ˆ์งˆ ์ƒ˜ํ”Œ
  • ์˜ˆ์ƒ ํšจ๊ณผ: ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„ ์ด steps 2๋ฐฐ๋ณด๋‹ค ํšจ๊ณผ์  (๊ฒฝํ—˜์  ๋ฒ•์น™)
  • ๋น„์šฉ: ๋ฐ์ดํ„ฐ ์ •์ œ 12์ผ, ํ•™์Šต ์ถ”๊ฐ€ 26์‹œ๊ฐ„

์˜ต์…˜ C: ORPO (Odds Ratio Preference Optimization)

  • ์–ธ์ œ: SFT baseline ํ™•๋ณด ํ›„ preference ์ •๋ ฌ์ด ํ•„์š”ํ•  ๋•Œ
  • ์žฅ์ : reference model ๋ถˆํ•„์š” โ†’ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ, ํ•™์Šต ๋‹จ์ˆœํ™”
  • ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ: kuotient/orca-math-korean-preference (193K), heegyu/orca-math-korean-preference-cleaned (192K) ์กด์žฌ
  • ์˜ˆ์ƒ ํšจ๊ณผ: ๋ฐ˜๋ณต ํ‡ดํ™” -1020%p, instruction following +510%p
  • ๋น„์šฉ: ๋ฐ์ดํ„ฐ ์ค€๋น„ 1์ผ, ํ•™์Šต 3~6์‹œ๊ฐ„

์˜ต์…˜ D: DPO (Direct Preference Optimization)

  • ์–ธ์ œ: ORPO๋ณด๋‹ค ๋” ๊ฐ•ํ•œ ์ •๋ ฌ์ด ํ•„์š”ํ•  ๋•Œ, ๋˜๋Š” SFT๊ฐ€ ์–ด๋А ์ •๋„ ์ž˜ ๋์„ ๋•Œ
  • ์žฅ์ : RLHF์™€ ์œ ์‚ฌํ•œ ํšจ๊ณผ, PPO๋ณด๋‹ค ์•ˆ์ •์ 
  • ๋‹จ์ : reference model ํ•„์š” (๋ฉ”๋ชจ๋ฆฌ 2ร—)
  • B200์—์„œ ๊ฐ€๋Šฅ์„ฑ: 1.19B ร— 2 = ~2.4B params โ€” ๋‹จ์ผ B200 183GB์—์„œ ์ถฉ๋ถ„
  • ๋น„์šฉ: ํ•™์Šต 4~8์‹œ๊ฐ„

๊ถŒ์žฅ ์ˆœ์„œ

๋ฐ์ดํ„ฐ ์ ๊ฒ€ โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• (์˜ต์…˜ B) โ†’ Steps ์ถ”๊ฐ€ (์˜ต์…˜ A) โ†’ ORPO (์˜ต์…˜ C)

Phase 2B: ์Šค์ผ€์ผ์—… โ€” 3B ๋ชจ๋ธ

๋ฐ์ดํ„ฐ ์ถฉ๋ถ„์„ฑ ๋ถ„์„

๊ธฐ์ค€ ํ•„์š” ํ† ํฐ ํ˜„์žฌ ๋ณด์œ  ํŒ์ •
Chinchilla ์ตœ์†Œ (20ร—) 3B ร— 20 = 60B ~150B โœ… ์ถฉ๋ถ„
Chinchilla ์ตœ์  (70ร—) 3B ร— 70 = 210B ~150B โš ๏ธ 71% ์ˆ˜์ค€
Llama ๋ฐฉ์‹ (๊ณ ํ’ˆ์งˆ ์ง‘์ค‘) 3B ร— 100 = 300B ~150B โŒ ๋ถ€์กฑ

๊ฒฐ๋ก : ์ง€๊ธˆ ๋ฐ์ดํ„ฐ๋กœ 3B ํ•™์Šต ๊ฐ€๋Šฅ. ๋‹จ, optimal์€ ์•„๋‹˜. ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ 50B ์ถ”๊ฐ€ ์ˆ˜์ง‘ํ•˜๋ฉด optimal ๊ทผ์ ‘.

์˜ˆ์ƒ ํ•™์Šต ์‹œ๊ฐ„ (8ร— B200 ๊ธฐ์ค€)

3B ๋ชจ๋ธ ์„ค์ • ์ถ”์ •:
- ์ฒ˜๋ฆฌ ์†๋„: ~2.5~3M tok/s (8ร— B200, 1.19B ๊ธฐ์ค€ 2.64M)
  โ†’ 3B ๋ชจ๋ธ์€ ์†๋„ ~40% ๊ฐ์†Œ ์˜ˆ์ƒ (๋ฉ”๋ชจ๋ฆฌ/์—ฐ์‚ฐ ์ฆ๊ฐ€)
  โ†’ ์‹คํšจ ์†๋„: ~1.6M tok/s (์ถ”์ •)

60B tokens (์ตœ์†Œ): 60B / 1.6M = 37,500์ดˆ โ‰ˆ 10.4์‹œ๊ฐ„
150B tokens (ํ˜„์žฌ ๋ณด์œ  ์ „๋Ÿ‰): 150B / 1.6M = 93,750์ดˆ โ‰ˆ 26์‹œ๊ฐ„
210B tokens (optimal): 210B / 1.6M = 131,250์ดˆ โ‰ˆ 36.5์‹œ๊ฐ„

โ†’ ํ˜„์‹ค์  ํ•™์Šต ๊ธฐ๊ฐ„: 1~2์ผ (8ร— B200)

3B ํ•™์Šต ์ค€๋น„์‚ฌํ•ญ

โ–ก ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์„ค์ • ๋ณ€๊ฒฝ:
  - d_model: 2048 โ†’ 2560 (๋˜๋Š” 3072)
  - n_layers: 24 โ†’ 32
  - n_heads: 16 โ†’ 32
  - n_kv_heads (GQA): 4 โ†’ 8
  - d_ffn: 5472 โ†’ ~8192
  โ†’ ์˜ˆ์ƒ ํŒŒ๋ผ๋ฏธํ„ฐ: ~3B

โ–ก ๋ฐ์ดํ„ฐ ์ค€๋น„:
  - cc100 ko ์žฌ๋‹ค์šด๋กœ๋“œ (๋ฒ„๊ทธ ์ˆ˜์ • ํ›„)
  - CulturaX 24.8B ํ™œ์šฉ
  - ์ด 150B+ ํ† ํฐ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ

โ–ก configs/korean_3b_fp8.yaml ์ž‘์„ฑ
โ–ก ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ์ „๋žต: ๋งค 5,000 steps
โ–ก FP8 ์„ค์ • ์œ ์ง€ (B200 ์ตœ์ ํ™”)

1B SFT ๊ฒฐ๊ณผ์˜ 3B ์ง„ํ–‰ ์—ฌ๋ถ€ ์˜ํ–ฅ

1B SFT ๊ฒฐ๊ณผ              โ†’ 3B ์ง„ํ–‰ ์—ฌ๋ถ€
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ko_ifeval > 30%          โ†’ ๊ฐ•๋ ฅํžˆ ์ถ”์ฒœ: 1B๊ฐ€ ์ด๋ฏธ ์ข‹์Œ, 3B๋Š” ํ™•์‹คํžˆ ๋” ์ข‹์„ ๊ฒƒ
ko_ifeval 20-30%         โ†’ ์กฐ๊ฑด๋ถ€ ์ถ”์ฒœ: ๋ฐ์ดํ„ฐ/๋ฐฉ๋ฒ•๋ก  ํ™•์ธ ํ›„ 3B
ko_ifeval < 20%          โ†’ 3B ์ „์— ์›์ธ ๋ถ„์„ ํ•„์ˆ˜: ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ 3B์—๋„ ์žฌํ˜„๋จ
๋ฐ˜๋ณต ํ‡ดํ™” > 40%          โ†’ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ์˜์‹ฌ: 3B๋„ ๋™์ผ ๋ฌธ์ œ ๊ฐ€๋Šฅ
SFT ๊ฐœ์„  ์—†์Œ            โ†’ SFT ํŒŒ์ดํ”„๋ผ์ธ ์ˆ˜์ • ํ›„ 3B

Phase 3: RLHF / Preference Optimization (์„ ํƒ์ )

์–ธ์ œ ํ•„์š”ํ•œ๊ฐ€?

์‹œ๋‚˜๋ฆฌ์˜ค ํ•„์š”์„ฑ
์„œ๋น„์Šค ๋ฐฐํฌ (์‚ฌ์šฉ์ž ๋Œ€๋ฉด) ๊ฐ•๋ ฅํžˆ ํ•„์š” โ€” safety, coherence
๋ฆฌ๋”๋ณด๋“œ ์ ์ˆ˜ ๊ทน๋Œ€ํ™” ํ•„์š” โ€” DPO/ORPO๋กœ +5~15%p
๋‚ด๋ถ€ ์—ฐ๊ตฌ/์‹คํ—˜ ๋ถˆํ•„์š”
RAG ์‹œ์Šคํ…œ ๋ฐฑ์—”๋“œ ๋ถˆํ•„์š”

ORPO vs DPO vs PPO ๋น„๊ต

๋ฐฉ๋ฒ• ์–ธ์ œ ๋ฉ”๋ชจ๋ฆฌ ๋ณต์žก๋„ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ
ORPO SFT์™€ ๋™์‹œ, ๋น ๋ฅธ ์ •๋ ฌ 1ร— (ref ์—†์Œ) ๋‚ฎ์Œ 193K+ ์กด์žฌ
DPO SFT ์ดํ›„, ์•ˆ์ •์  ์ •๋ ฌ 2ร— (ref ํ•„์š”) ์ค‘๊ฐ„ 193K+ ์กด์žฌ
SimPO ref ์—†์ด DPO ํšจ๊ณผ 1ร— ์ค‘๊ฐ„ ๋ฒ”์šฉ ์ ์šฉ
PPO RLHF ์™„์ „ ๊ตฌํ˜„ 3~4ร— ๋†’์Œ reward model ํ•„์š”

B200 ํ™˜๊ฒฝ์—์„œ ์ถ”์ฒœ: ORPO ๋˜๋Š” SimPO (reference model ์—†์Œ, ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ)

ํ•œ๊ตญ์–ด Preference ๋ฐ์ดํ„ฐ ํ˜„ํ™ฉ (HuggingFace)

kuotient/orca-math-korean-preference     193K ์ƒ˜ํ”Œ  ์ˆ˜ํ•™ ์ค‘์‹ฌ
heegyu/orca-math-korean-preference-cleaned  192K   ์ˆ˜ํ•™ (์ •์ œ๋ณธ)
lemon-mint/korean-realqa-reasoning-v01-preference  7.7K  ์ถ”๋ก 
ChuGyouk/argilla-distilabel-math-preference-dpo-korean  2.4K  ์†Œ๊ทœ๋ชจ

โ†’ ์ˆ˜ํ•™ ํŠนํ™” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์Œ. ์ผ๋ฐ˜ ํ•œ๊ตญ์–ด preference๋Š” ๋ถ€์กฑ.
โ†’ ์ผ๋ฐ˜ preference๋Š” ์ž์ฒด ์ƒ์„ฑ ๋˜๋Š” ๋ฒˆ์—ญ์œผ๋กœ ๋ณด๊ฐ• ํ•„์š”.
  ๋ฐฉ๋ฒ•: GPT-4/Claude๋กœ chosen/rejected ์Œ ์ƒ์„ฑ (Self-Play)

Phase 4: ๋ฐฐํฌ

์„œ๋น™ ์˜ต์…˜ ๋น„๊ต

์˜ต์…˜ ํŠน์ง• B200 ์ ํ•ฉ์„ฑ ์ถ”์ฒœ ์ƒํ™ฉ
vLLM PagedAttention, ๊ณ ์ฒ˜๋ฆฌ๋Ÿ‰ โœ… ์ตœ์šฐ์ˆ˜ API ์„œ๋ฒ„, ๋ฐฐ์น˜ ์ถ”๋ก 
TGI (Text Generation Inference) HF ๊ณต์‹, ์•ˆ์ •์  โœ… ์šฐ์ˆ˜ HF Hub ์—ฐ๋™
llama.cpp + GGUF CPU/์ €์‚ฌ์–‘ ๊ฐ€๋Šฅ โš ๏ธ B200์—์„  ๊ณผ์†Œ ์—ฃ์ง€ ๋ฐฐํฌ, Ollama
Ollama ๋กœ์ปฌ ๋ฐฐํฌ ํŽธ์˜์„ฑ โš ๏ธ ๊ฐœ์ธ ์‚ฌ์šฉ, ๋ฐ๋ชจ

B200 ๊ธฐ์ค€ vLLM ์˜ˆ์ƒ throughput (1.19B ๋ชจ๋ธ):

1.19B ๋ชจ๋ธ (BF16):
  - ๋ฉ”๋ชจ๋ฆฌ: ~2.4GB (ํŒŒ๋ผ๋ฏธํ„ฐ) + KV cache
  - ๋‹จ์ผ B200 183GB: KV cache ๊ทน๋Œ€ํ™” ๊ฐ€๋Šฅ
  - ์˜ˆ์ƒ throughput: 5,000~15,000 tokens/s (๋ฐฐ์น˜ ์ฒ˜๋ฆฌ)
  - ๋‹จ์ผ ์ŠคํŠธ๋ฆฌ๋ฐ: 200~500 tokens/s (์‚ฌ์šฉ์ž ์ฒด๊ฐ)
  โ†’ ๋™์‹œ ์‚ฌ์šฉ์ž 100~500๋ช… ์ง€์› ๊ฐ€๋Šฅ (๋‹จ์ผ GPU)

์–‘์žํ™” ์˜ต์…˜ (B200 ํ™˜๊ฒฝ)

ํฌ๋งท ์ •๋ฐ€๋„ ์†์‹ค ํฌ๊ธฐ B200 ์ ํ•ฉ์„ฑ ์ถ”์ฒœ
FP8 (Native) ์—†์Œ 1.2GB โœ… ์ตœ์šฐ์ˆ˜ (HW ์ง€์›) ์ตœ์šฐ์„ 
BF16 ์—†์Œ 2.4GB โœ… ๊ธฐ๋ณธ ๊ธฐ์ค€์„ 
AWQ (W4A16) ๋งค์šฐ ์ ์Œ 0.6GB โœ… ์šฐ์ˆ˜ ์—ฃ์ง€/์ €๋ฉ”๋ชจ๋ฆฌ
GPTQ (W4) ์ ์Œ 0.6GB โœ… ์šฐ์ˆ˜ CPU ์˜คํ”„๋กœ๋“œ
GGUF Q4_K_M ์ ์Œ ~0.7GB โš ๏ธ (CPU์šฉ) Ollama ๋ฐฐํฌ์šฉ

B200 ๊ถŒ์žฅ: FP8 โ†’ AWQ ์ˆœ์„œ๋กœ ๊ณ ๋ ค. B200์€ FP8 ํ•˜๋“œ์›จ์–ด ์ง€์›์œผ๋กœ ์–‘์žํ™” ์—†์ด ์ด๋ฏธ ํšจ์œจ์ .

HuggingFace Hub ์—…๋กœ๋“œ

ํ•„์š” ์ž‘์—…:
โ–ก HF ํฌ๋งท ๋ณ€ํ™˜: config.json, model.safetensors, tokenizer_config.json
โ–ก model card ์ž‘์„ฑ (ํ•œ๊ตญ์–ด ์„ค๋ช…, ๋ฒค์น˜๋งˆํฌ ๊ฒฐ๊ณผ, ์‚ฌ์šฉ๋ฒ•)
โ–ก ๋ผ์ด์„ ์Šค ์„ค์ • (Apache 2.0 ๊ถŒ์žฅ)
โ–ก eval ๊ฒฐ๊ณผ ํฌํ•จ
โ–ก Open Ko-LLM Leaderboard ์ œ์ถœ (ํ‰๊ฐ€ ์š”์ฒญ)

3. ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ (์ˆ˜์น˜ ๊ธฐ๋ฐ˜)

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                    [Phase 1: SFT ํ‰๊ฐ€ ๊ฒฐ๊ณผ]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

โ”œโ”€โ”€ ko_ifeval > 30% AND ๋ฐ˜๋ณต์œจ < 15%
โ”‚   โ”œโ”€โ”€ ๋ฐ์ดํ„ฐ 150B ๋ชจ๋‘ ์‚ฌ์šฉ ๊ฐ€๋Šฅ? โ†’ Phase 2B (3B ์‚ฌ์ „ํ•™์Šต)
โ”‚   โ””โ”€โ”€ ์ง€๊ธˆ ๋‹น์žฅ ๋ฐฐํฌ๊ฐ€ ๋ชฉํ‘œ? โ†’ Phase 4 (vLLM ์„œ๋น™ + HF ์—…๋กœ๋“œ)
โ”‚
โ”œโ”€โ”€ ko_ifeval 20~30% AND ๋ฐ˜๋ณต์œจ 15~30%
โ”‚   โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ < 10K ์ƒ˜ํ”Œ? โ†’ Phase 2A-B (๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ์ตœ์šฐ์„ )
โ”‚   โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ 10~50K ์ƒ˜ํ”Œ? โ†’ Phase 2A-A (steps ์ถ”๊ฐ€) + 2A-C (ORPO)
โ”‚   โ””โ”€โ”€ SFT ๋ฐ์ดํ„ฐ๊ฐ€ > 50K ์ƒ˜ํ”Œ? โ†’ Phase 2A-A (steps ์ถ”๊ฐ€) OR 2B (3B)
โ”‚
โ”œโ”€โ”€ ko_ifeval 10~20% AND ๋ฐ˜๋ณต์œจ 30~50%
โ”‚   โ”œโ”€โ”€ base ๋ชจ๋ธ๊ณผ SFT ์ฐจ์ด ์—†์Œ? โ†’ SFT ํŒŒ์ดํ”„๋ผ์ธ ๋ฒ„๊ทธ ์ ๊ฒ€
โ”‚   โ”œโ”€โ”€ SFT ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์˜์‹ฌ? โ†’ ๋ฐ์ดํ„ฐ ์ „์ˆ˜ ์ ๊ฒ€ ํ›„ Phase 2A-B
โ”‚   โ””โ”€โ”€ base PPL์ด ๋†’์Œ (> 15)? โ†’ ์‚ฌ์ „ํ•™์Šต ๋” ํ•„์š” (๋ฐ์ดํ„ฐ ์ถ”๊ฐ€)
โ”‚
โ””โ”€โ”€ ko_ifeval < 10% OR ๋ฐ˜๋ณต์œจ > 50%
    โ”œโ”€โ”€ base ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ์ด๋ฏธ ๋ฐ˜๋ณต > 30%? โ†’ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋ฌธ์ œ
    โ”‚   โ””โ”€โ”€ โ†’ cc100 ๋…ธ์ด์ฆˆ ํ•„ํ„ฐ๋ง ํ›„ ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šต
    โ”œโ”€โ”€ SFT loss๊ฐ€ ๋ฐœ์‚ฐํ–ˆ๋Š”๊ฐ€? โ†’ ํ•™์Šต๋ฅ /optimizer ์„ค์ • ์žฌ๊ฒ€ํ† 
    โ””โ”€โ”€ ๋ชจ๋“  ์ƒ์„ฑ์ด ๋ฌด์˜๋ฏธ? โ†’ ์ฒดํฌํฌ์ธํŠธ ์†์ƒ ํ™•์ธ, ์ด์ „ ์ฒดํฌํฌ์ธํŠธ ๋ณต์›

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                 [Phase 2A ๋‚ด๋ถ€ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Phase 2A ์ง„์ž… ํ›„:

โ”œโ”€โ”€ ํ˜„์žฌ SFT ๋ฐ์ดํ„ฐ < 20K ์ƒ˜ํ”Œ?
โ”‚   โ””โ”€โ”€ โ†’ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ•์ด steps ์ถ”๊ฐ€๋ณด๋‹ค ํšจ๊ณผ์  (์ตœ์šฐ์„ )
โ”‚       ๋ฐ์ดํ„ฐ: beomi/KoAlpaca, squarelike/sharegpt_ko, nayohan/llama3-ko
โ”‚
โ”œโ”€โ”€ loss curve๊ฐ€ ์•„์ง ํ•˜๊ฐ• ์ค‘ (step 4000~5000 ์ฐจ์ด > 0.05)?
โ”‚   โ””โ”€โ”€ โ†’ steps 2๋ฐฐ ์ถ”๊ฐ€ ์‹œ๋„ (10k๊นŒ์ง€)
โ”‚
โ”œโ”€โ”€ ๋ฐ˜๋ณต์œจ > 30% (์ฃผ์š” ๋ฌธ์ œ)?
โ”‚   โ””โ”€โ”€ โ†’ ORPO ๋˜๋Š” repetition penalty ์ ์šฉ ๋จผ์ €
โ”‚       ORPO ๋ฐ์ดํ„ฐ: kuotient/orca-math-korean-preference (193K)
โ”‚
โ””โ”€โ”€ ko_ifeval < 20% + ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„์—๋„ ๊ฐœ์„  ์—†์Œ?
    โ””โ”€โ”€ โ†’ 3B ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์ „ํ™˜ (1B SFT ํ•œ๊ณ„ ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ)

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                 [Phase 2B ๋‚ด๋ถ€ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

3B ์‚ฌ์ „ํ•™์Šต ์ง„ํ–‰ ๊ฒฐ์ • ์‹œ:

โ”œโ”€โ”€ ํ˜„์žฌ 150B ํ† ํฐ์ด ํ•œ๊ตญ์–ด ๋‹จ์ผ ์–ธ์–ด?
โ”‚   โ””โ”€โ”€ โ†’ ์˜์–ด ๋ฐ์ดํ„ฐ 10~30% ํ˜ผํ•ฉ ๊ถŒ์žฅ (cross-lingual transfer)
โ”‚       ์˜์–ด ์ˆ˜ํ•™/์ฝ”๋“œ ํฌํ•จํ•˜๋ฉด ko_gsm8k ๋“ฑ ์ถ”๊ฐ€ ๊ฐœ์„  ๊ฐ€๋Šฅ
โ”‚
โ”œโ”€โ”€ cc100 ko ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์™„๋ฃŒ?
โ”‚   โ””โ”€โ”€ No โ†’ CulturaX 24.8B๋งŒ์œผ๋กœ ์‹œ์ž‘ ๊ฐ€๋Šฅ (60B ๋ชฉํ‘œ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ)
โ”‚
โ””โ”€โ”€ 3B ํ•™์Šต ์ค‘ ์ค‘๊ฐ„ checkpoint์—์„œ SFT ํ…Œ์ŠคํŠธ?
    โ””โ”€โ”€ โ†’ 1B๋ณด๋‹ค 3B base๊ฐ€ SFT ๋ฐ˜์‘์„ฑ์ด ๋†’์œผ๋ฉด 3B SFT๋กœ ๋ฐ”๋กœ ์ง„ํ–‰

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                    [Phase 4 ๋ฐฐํฌ ์˜์‚ฌ๊ฒฐ์ •]
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๋ฐฐํฌ ๋ฐฉ์‹ ์„ ํƒ:

โ”œโ”€โ”€ ์—ฐ๊ตฌ/๋ฐ๋ชจ ๋ชฉ์ ?
โ”‚   โ””โ”€โ”€ โ†’ HF Hub ์—…๋กœ๋“œ + Gradio Space ์ƒ์„ฑ (๋ฌด๋ฃŒ)
โ”‚
โ”œโ”€โ”€ ๋‚ด๋ถ€ API ์„œ๋น™?
โ”‚   โ””โ”€โ”€ โ†’ vLLM (FP8 native) + OpenAI ํ˜ธํ™˜ ์—”๋“œํฌ์ธํŠธ
โ”‚       ์ปค๋งจ๋“œ: vllm serve ./checkpoints/korean_1b_sft --dtype fp8
โ”‚
โ”œโ”€โ”€ ๊ฐœ์ธ/ํŒ€ ๋กœ์ปฌ ์‚ฌ์šฉ?
โ”‚   โ””โ”€โ”€ โ†’ GGUF Q4_K_M ๋ณ€ํ™˜ + Ollama (์ด๋ฏธ Modelfile ์กด์žฌ)
โ”‚
โ””โ”€โ”€ Open Ko-LLM ๋ฆฌ๋”๋ณด๋“œ ๋“ฑ์žฌ?
    โ””โ”€โ”€ โ†’ HF Hub ์—…๋กœ๋“œ ํ•„์ˆ˜ โ†’ ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ ์–‘์‹ ์ž‘์„ฑ

4. ์ถ”๊ฐ€ ํ™•์žฅ Job ํ›„๋ณด๊ตฐ (์šฐ์„ ์ˆœ์œ„ ์ˆœ)

์ฆ‰์‹œ ๊ฐ€๋Šฅ (์ง€๊ธˆ ์„œ๋ฒ„์—์„œ ๋ฐ”๋กœ, ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ๋ถˆํ•„์š”)

์šฐ์„ ์ˆœ์œ„ Job ์˜ˆ์ƒ ์‹œ๊ฐ„ ๊ธฐ๋Œ€ ํšจ๊ณผ
โญโญโญ SFT ๋ชจ๋ธ ์ƒ์„ฑ ํ…Œ์ŠคํŠธ (temperature sampling) 30๋ถ„ ๋ฐ˜๋ณต์œจ ํ˜„ํ™ฉ ํŒŒ์•…
โญโญโญ lm-eval-harness ์„ค์น˜ + ko_ifeval ์‹คํ–‰ 2~4์‹œ๊ฐ„ ๊ณต์‹ ๋ฒค์น˜๋งˆํฌ ์ˆ˜์น˜
โญโญโญ ko_winogrande ์‹คํ–‰ 1~2์‹œ๊ฐ„ ์–ธ์–ด ์ดํ•ด ์ˆ˜์น˜
โญโญ Base vs SFT ๋น„๊ต ์ƒ์„ฑ (๋™์ผ ํ”„๋กฌํ”„ํŠธ) 1์‹œ๊ฐ„ SFT ํšจ๊ณผ ์ธก์ •
โญโญ SFT ํ•™์Šต ์†์‹ค ๊ณก์„  ๋ถ„์„ (tensorboard) 30๋ถ„ ์ˆ˜๋ ด ์—ฌ๋ถ€ ํŒ๋‹จ
โญโญ ๋ฐ˜๋ณต ํ‡ดํ™” ์ •๋Ÿ‰ ์ธก์ • (repetition_penalty ํšจ๊ณผ) 1์‹œ๊ฐ„ ๋ฐฐํฌ ๊ฐ€๋Šฅ์„ฑ ํŒ๋‹จ
โญ vLLM ์„œ๋น™ ํ…Œ์ŠคํŠธ (FP8) 1~2์‹œ๊ฐ„ throughput ์ธก์ •
โญ HF ํฌ๋งท ๋ณ€ํ™˜ (config.json, safetensors) 2~3์‹œ๊ฐ„ HF Hub ์—…๋กœ๋“œ ์ค€๋น„

๋ฐ์ดํ„ฐ ์ค€๋น„ ํ•„์š”

์šฐ์„ ์ˆœ์œ„ Job ์ค€๋น„ ์‹œ๊ฐ„ ๊ธฐ๋Œ€ ํšจ๊ณผ
โญโญโญ SFT ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• (KoAlpaca + ShareGPT-ko 50K~) 1~2์ผ ko_ifeval +5~15%p
โญโญโญ cc100 ์žฌ์ˆ˜์ง‘ (๋ฒ„๊ทธ ์ˆ˜์ • ํ›„) 0.5~1์ผ 150B+ ํ† ํฐ ํ™•๋ณด
โญโญ ORPO ๋ฐ์ดํ„ฐ ์ค€๋น„ (orca-math-korean 193K) 0.5์ผ ๋ฐ˜๋ณต ํ‡ดํ™” -20%p
โญโญ 3B ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ (150B ํ† ํฐ ํ†ตํ•ฉ) 1~2์ผ 3B ํ•™์Šต ์ค€๋น„
โญ ์ผ๋ฐ˜ ํ•œ๊ตญ์–ด preference ๋ฐ์ดํ„ฐ ์ƒ์„ฑ (GPT-4 ํ™œ์šฉ) 3~7์ผ ๋ฒ”์šฉ ORPO/DPO
โญ ์˜์–ด/์ฝ”๋“œ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ (10~30% ํ˜ผํ•ฉ) 1~3์ผ ์ˆ˜ํ•™/์ฝ”๋“œ ๊ฐœ์„ 

์™ธ๋ถ€ ๋ฆฌ์†Œ์Šค ํ•„์š”

์šฐ์„ ์ˆœ์œ„ Job ํ•„์š” ๋ฆฌ์†Œ์Šค ๊ธฐ๋Œ€ ํšจ๊ณผ
โญโญ HuggingFace Hub ๊ณ„์ • ์—…๋กœ๋“œ HF ๊ณ„์ •, ์ธํ„ฐ๋„ท ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ ๊ฐ€๋Šฅ
โญโญ Open Ko-LLM Leaderboard ์ œ์ถœ HF ๊ณ„์ • ๊ณต์‹ ์ˆœ์œ„ ํ™•์ธ
โญ KoMT-Bench / LogicKor ํ‰๊ฐ€ ์™ธ๋ถ€ API ๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ ์งˆ์  ํ‰๊ฐ€
โญ VRAM ์ฆ์„ค ๋˜๋Š” Multi-GPU SFT ํ˜„์žฌ 12GB โ†’ ๊ฐ€๋Šฅ ๋” ํ•„์š”? ๋” ํฐ ๋ฐฐ์น˜

5. ๋ฆฌ์Šคํฌ ๋ถ„์„

5.1 ํ˜„์žฌ ํ•™์Šต ๋ฐฉ์‹์˜ ์ž ์žฌ์  ๋ฌธ์ œ์ 

๋ฆฌ์Šคํฌ ์‹ฌ๊ฐ๋„ ํ˜„์žฌ ์ฆ๊ฑฐ ์™„ํ™” ๋ฐฉ๋ฒ•
SFT steps ๊ณผ์†Œ (5k) ๐Ÿ”ด ๋†’์Œ epoch 2 ์ง„์ž…, loss ์•„์ง 1.97 steps ์ฆ๊ฐ€ ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ณด๊ฐ•
์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ€์กฑ (~8.91B) ๐ŸŸก ์ค‘๊ฐ„ Chinchilla ๋Œ€๋น„ 1B ร— 20 = 20B ํ•„์š” โ†’ ๋ฏธ๋‹ฌ 150B ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ํ•™์Šต
์ฝ”๋“œ/์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ์—†์Œ ๐ŸŸก ์ค‘๊ฐ„ ko_gsm8k ๊ฑฐ์˜ 0 ์˜ˆ์ƒ ์˜์–ด ์ฝ”๋“œ/์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ ํ˜ผํ•ฉ
Greedy decoding ๋ฐ˜๋ณต ํ‡ดํ™” ๐Ÿ”ด ๋†’์Œ base์—์„œ 30% ๋ฐœ์ƒ ํ™•์ธ SFT + repetition_penalty + ORPO

5.2 cc100 ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ด์Šˆ

์•Œ๋ ค์ง„ ๋ฌธ์ œ:

  • cc100์€ CommonCrawl์—์„œ ์ถ”์ถœ๋œ ์›น ํ…์ŠคํŠธ๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์‹ฌํ•จ
  • ํ•œ๊ตญ์–ด cc100 ํŠนํžˆ: ๊ด‘๊ณ  ํ…์ŠคํŠธ, ์ŠคํŒธ, ๋ฐ˜๋ณต ์ฝ˜ํ…์ธ  ๋‹ค์ˆ˜
  • ์ค‘๋ณต๋ฅ : ๋ฌธ์„œ ์ˆ˜์ค€ ์ค‘๋ณต 10~30% ์ถ”์ • (MinHash ์ œ๊ฑฐ ํ•„์š”)

์‹ค์ œ ์˜ํ–ฅ:

๋…ธ์ด์ฆˆ ํฌํ•จ ํ•™์Šต โ†’ ๋ชจ๋ธ์ด ๊ด‘๊ณ /์ŠคํŒธ ํŒจํ„ด ํ•™์Šต โ†’ ์ƒ์„ฑ ํ’ˆ์งˆ ์ €ํ•˜
์ค‘๋ณต ๋ฐ์ดํ„ฐ โ†’ ํŠน์ • ํŒจํ„ด ๊ณผ๋„ ์•”๊ธฐ โ†’ ๋ฐ˜๋ณต ํ‡ดํ™” ์•…ํ™”

๊ถŒ์žฅ ์ „์ฒ˜๋ฆฌ:

# 1. ์ค‘๋ณต ์ œ๊ฑฐ (MinHash LSH)
python scripts/dedup_minhash.py --input cc100_ko.bin --threshold 0.8

# 2. ํ’ˆ์งˆ ํ•„ํ„ฐ๋ง (perplexity ๊ธฐ๋ฐ˜)
# ๋‚ฎ์€ ํ’ˆ์งˆ ํ…์ŠคํŠธ: PPL > 1000 ์ œ๊ฑฐ
python scripts/quality_filter.py --max_ppl 1000

# 3. ๊ธธ์ด ํ•„ํ„ฐ๋ง
# ๋„ˆ๋ฌด ์งง์€ ๋ฌธ์žฅ (< 50 tokens) ์ œ๊ฑฐ

5.3 Tokenizer ์„ ํƒ (korean_sp 64K)์˜ ์˜ํ–ฅ

ํ˜„์žฌ ์„ค์ •: SentencePiece Unigram 64K vocab, ํ•œ๊ตญ์–ด ํŠนํ™”

์žฅ์ :

  • ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„๋ฆฌ์— ์ตœ์ ํ™” โ†’ ํšจ์œจ์  ์ธ์ฝ”๋”ฉ
  • 64K vocab์œผ๋กœ ์˜์–ด vs ํ•œ๊ตญ์–ด token fertility ๊ท ํ˜•
  • ํ•œ๊ตญ์–ด ๊ธ€์ž 1๊ฐœ = ํ‰๊ท  1.2~1.8 tokens (BPE ๋Œ€๋น„ ํšจ์œจ์ )

์ž ์žฌ์  ๋ฌธ์ œ:

๋ฌธ์ œ ์‹ฌ๊ฐ๋„ ์„ค๋ช…
์˜์–ด vocabulary ๋ถ€์กฑ ๐ŸŸก ์ค‘๊ฐ„ ์˜์–ด ์ฝ”๋“œ/์ˆ˜ํ•™ ์ฒ˜๋ฆฌ ํšจ์œจ ๋‚ฎ์Œ (byte fallback)
๊ธฐ์กด ๋ชจ๋ธ๊ณผ ํ˜ธํ™˜ ๋ถˆ๊ฐ€ ๐ŸŸก ์ค‘๊ฐ„ RLHF ๋ฐ์ดํ„ฐ ์žฌํ† ํฌ๋‚˜์ด์ง• ํ•„์š”
์‹ ์กฐ์–ด/์™ธ๋ž˜์–ด ์ฒ˜๋ฆฌ ๐ŸŸก ์ค‘๊ฐ„ OOV ์ฒ˜๋ฆฌ๋Š” byte fallback์ด์ง€๋งŒ ๋А๋ฆผ
ํ‘œ์ค€ Llama/Mistral ํ† ํฌ๋‚˜์ด์ €์™€ ๋‹ค๋ฆ„ ๐ŸŸข ๋‚ฎ์Œ HF ์—…๋กœ๋“œ ์‹œ tokenizer ํฌํ•จํ•˜๋ฉด OK

์™„ํ™”:

  • ํ–ฅํ›„ 3B ๋ชจ๋ธ์—์„œ๋Š” tiktoken (cl100k_base) ๋˜๋Š” Llama ๊ณ„์—ด ํ† ํฌ๋‚˜์ด์ € ์ฑ„ํƒ ๊ณ ๋ ค
  • ํ˜„์žฌ 1.19B ๋ชจ๋ธ์€ ํ˜„์žฌ ํ† ํฌ๋‚˜์ด์ € ์œ ์ง€ (์žฌํ•™์Šต ๋น„์šฉ too high)

6. ์‹œ๋‚˜๋ฆฌ์˜ค ๋ชฉ๋ก ("๋งŒ์•ฝ X๋ผ๋ฉด Y๋ฅผ ํ•ด์•ผ ํ•œ๋‹ค")

# ์กฐ๊ฑด (IF) ์•ก์…˜ (THEN)
1 ko_ifeval > 30% AND ๋ฐ˜๋ณต < 15% โ†’ ์ฆ‰์‹œ HF Hub ์—…๋กœ๋“œ + ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ + 3B ์‚ฌ์ „ํ•™์Šต ๋ณ‘๋ ฌ ์ง„ํ–‰
2 ko_ifeval 2030% AND ๋ฐ˜๋ณต 1530% โ†’ KoAlpaca+ShareGPT-ko๋กœ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ํ›„ 10k steps SFT ์žฌ์‹คํ–‰
3 ko_ifeval < 20% AND base์™€ ์ฐจ์ด ์—†์Œ โ†’ SFT ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๋ฒ„๊ทธ ์ ๊ฒ€ (๋ฐ์ดํ„ฐ ๋กœ๋”ฉ, ํฌ๋งท ํ™•์ธ)
4 ๋ฐ˜๋ณต์œจ > 40% โ†’ ORPO (orca-math-korean 193K) ์ฆ‰์‹œ ์ ์šฉ
5 ๋ชจ๋“  SFT ์‹œ๋„ ํ›„์—๋„ ko_ifeval < 20% โ†’ 1B ํ•œ๊ณ„ ์ธ์ •, 3B ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์ „ํ™˜
6 cc100 ์ˆ˜์ง‘ ์™„๋ฃŒ (65~100B) โ†’ 3B ์‚ฌ์ „ํ•™์Šต ๋ฐ”๋กœ ์‹œ์ž‘ (26์‹œ๊ฐ„, 8ร— B200)
7 3B base PPL < 8 ๋‹ฌ์„ฑ โ†’ 3B SFT (KoAlpaca + ORPO) โ†’ ๋ฆฌ๋”๋ณด๋“œ ๋ชฉํ‘œ ko_ifeval 40%+
8 ์„œ๋น„์Šค ๋ฐฐํฌ ๊ฒฐ์ • โ†’ vLLM FP8 ์„œ๋น™ + GGUF Q4_K_M Ollama ๋ณ‘ํ–‰
9 ์ˆ˜ํ•™/์ฝ”๋“œ ์„ฑ๋Šฅ ํ•„์š” โ†’ ์˜์–ด ์ˆ˜ํ•™+์ฝ”๋“œ ๋ฐ์ดํ„ฐ 20% ํ˜ผํ•ฉํ•˜์—ฌ 3B ์žฌํ•™์Šต
10 ํ•œ๊ตญ์–ด preference ๋ฐ์ดํ„ฐ ์ž์ฒด ์ƒ์„ฑ ์›ํ•จ โ†’ Claude/GPT-4๋กœ chosen/rejected ์Œ 10K ์ƒ์„ฑ ํ›„ DPO

7. ์ „์ฒด ํƒ€์ž„๋ผ์ธ

ํ˜„์žฌ (2026-02-26)
โ”‚
โ”œโ”€ Week 1: Phase 1 ๊ฒ€์ฆ
โ”‚  โ”œโ”€ D+0: SFT ์ƒ์„ฑ ํ…Œ์ŠคํŠธ (30๋ถ„)
โ”‚  โ”œโ”€ D+0: lm-eval ko_ifeval + ko_winogrande (4์‹œ๊ฐ„)
โ”‚  โ””โ”€ D+2: ๊ฒฐ๊ณผ ๋ถ„์„ + ๋‹ค์Œ ๋‹จ๊ณ„ ๊ฒฐ์ •
โ”‚
โ”œโ”€ Week 2~3: Phase 2A ๋˜๋Š” 2B ๊ฒฐ์ • ํ›„ ์‹คํ–‰
โ”‚  โ”œโ”€ [2A ๊ฒฝ๋กœ] ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• (3~5์ผ) + ์žฌํ•™์Šต (1~2์ผ)
โ”‚  โ””โ”€ [2B ๊ฒฝ๋กœ] 3B ์‚ฌ์ „ํ•™์Šต (26์‹œ๊ฐ„) + 3B SFT (3~6์‹œ๊ฐ„)
โ”‚
โ”œโ”€ Week 4: Phase 3 (ํ•„์š”์‹œ)
โ”‚  โ””โ”€ ORPO ํ•™์Šต (193K ๋ฐ์ดํ„ฐ, 3~6์‹œ๊ฐ„)
โ”‚
โ””โ”€ Week 4~5: Phase 4 ๋ฐฐํฌ
   โ”œโ”€ HF ํฌ๋งท ๋ณ€ํ™˜ (2~3์‹œ๊ฐ„)
   โ”œโ”€ HF Hub ์—…๋กœ๋“œ + Model Card
   โ”œโ”€ vLLM ์„œ๋น™ ์„ค์ •
   โ””โ”€ Ko-LLM ๋ฆฌ๋”๋ณด๋“œ ์ œ์ถœ

์ด ์˜ˆ์ƒ ๊ธฐ๊ฐ„: 3~5์ฃผ (3B ์Šค์ผ€์ผ์—… ํฌํ•จ)

8. ์ฆ‰๊ฐ์ ์ธ ๋‹ค์Œ ๋‹จ๊ณ„ (Action Items)

# Step 1: lm-evaluation-harness ์„ค์น˜
pip install lm-eval

# Step 2: ko_ifeval ์‹คํ–‰ (SFT ์ฒดํฌํฌ์ธํŠธ)
lm_eval \
  --model hf \
  --model_args pretrained=/PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_1b_sft/checkpoint-0005000,dtype=bfloat16 \
  --tasks ko_ifeval \
  --device cuda:0 \
  --output_path ./eval/results/sft_5k_ko_ifeval.json

# Step 3: ko_winogrande ์‹คํ–‰
lm_eval \
  --model hf \
  --model_args pretrained=/PROJECT/0325120031_A/ghong/taketimes/llm-bang/checkpoints/korean_1b_sft/checkpoint-0005000,dtype=bfloat16 \
  --tasks ko_winogrande \
  --device cuda:0 \
  --output_path ./eval/results/sft_5k_ko_winogrande.json

์ด ๋ฌธ์„œ๋Š” ํ‰๊ฐ€ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ์—…๋ฐ์ดํŠธ ์˜ˆ์ •.
๋‹ค์Œ ์—…๋ฐ์ดํŠธ: Phase 1 ํ‰๊ฐ€ ์™„๋ฃŒ ํ›„ (์˜ˆ์ƒ: D+1~2)