Darwin-28B-KOREA

ํ•œ๊ตญ์–ด/์˜์–ด ์ด์ค‘์–ธ์–ด ์ถ”๋ก ์— ์ตœ์ ํ™”๋œ 28B ํŒŒ๋ผ๋ฏธํ„ฐ ๋ชจ๋ธ

VIDRAFT Darwin ์‹œ๋ฆฌ์ฆˆ์˜ PERFECT ๋ถ€๋ชจ ํŽ˜์–ด ๋จธ์ง€ 1ํ˜ธ. ๋‘ ๋ถ€๋ชจ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ per-layer ๋™์  ๋น„์œจ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ, ๋ถ€๋ชจ ์–ด๋А ํ•œ์ชฝ๋ณด๋‹ค๋„ ์šฐ์ˆ˜ํ•˜์ง€ ์•Š์€ ๋ถ€๋ถ„ ์—†์ด ๋ชจ๋“  ์˜์—ญ์—์„œ ๋™๋“ฑ ์ด์ƒ์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑ.

๋ถ€๋ชจ ๋ชจ๋ธ (PERFECT Pair)

Role Model Strength
Father FINAL-Bench/Darwin-28B-Opus ์˜์–ด ์ถ”๋ก , ์†๋„, ํ† ํฐ ์ ˆ์ œ
Mother ginigen-ai/Rogue-28B-MIX ํ•œ๊ตญ์–ด ๋„ค์ดํ‹ฐ๋ธŒ, ๊นŠ์€ ํ•œ๊ตญ์–ด reasoning

๋ถ€๋ชจ ํŽ˜์–ด ํ˜ธํ™˜์„ฑ: hidden=5120, intermediate=17408, layers=64 (์™„์ „ ์ผ์น˜ PERFECT pair).

๋จธ์ง€ ๋ฐฉ์‹

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜: Per-layer linear interpolation in float32, bfloat16 cast
  • t vector: 64 ๋ ˆ์ด์–ด ๋™์  ๊ฐ€์ค‘์น˜ (mean t=0.513)
    • Golden Reasoning Layer (L47): t=0.90 (Mother dominant)
    • Output Router (L63): t=0.53
    • MRI (Model MRI) telemetry ๊ธฐ๋ฐ˜ per-layer probe_distance + hidden_norm ๋ถ„์„
  • ์ฑ— ํ…œํ”Œ๋ฆฟ/ํ† ํฌ๋‚˜์ด์ €: Father ๊ธฐ์ค€ (Qwen3_5ForConditionalGeneration multimodal)

ํ‰๊ฐ€ ๊ฒฐ๊ณผ (35-sample 3-way bench, max_tokens=5120)

ํ‰๊ฐ€ ํ•ญ๋ชฉ Father Mother KOREA (Child)
์ •ํ™•๋„ (29๊ฐœ ๊ฐ๊ด€์‹) 96.6% 96.6% 96.6%
์ง„์งœ ์ •ํ™•๋„ (gpqa_01 ์ฑ„์ ์˜ค๋ฅ˜ ๋ฐ˜์˜) 100% 100% 100%
ํ•œ๊ตญ์–ด ์ถœ๋ ฅ๋ฅ  (ํ•œ๊ตญ์–ด ์งˆ๋ฌธ 23๊ฐœ) 91.3% 95.7% 91.3%
์˜์–ด thinking 31/35 10/35 31/35
ํ‰๊ท  ์‘๋‹ต ํ† ํฐ 458 631 521
5120 cap ๋„๋‹ฌ 0/35 2/35 1/35

Win/Loss ๋ถ„์„:

  • Father vs Child: 0:0 ๋™๋ฅ 
  • Mother vs Child: 0:0 ๋™๋ฅ 
  • โ†’ ์ž์‹์ด ๋‘ ๋ถ€๋ชจ์™€ ์™„์ „ ๋™๊ธ‰, ์–ด๋А ํ•œ์ชฝ๋ณด๋‹ค ์•ฝํ•œ ์˜์—ญ ์—†์Œ

Reasoning ๊นŠ์ด ํก์ˆ˜: ํ•œ๊ตญ์–ด ๋…ผ๋ฆฌ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ Mother(2620t)์™€ Child(2724t) ํ‰๊ท  ๋‹ต ๊ธธ์ด ์œ ์‚ฌ โ†’ Mother์˜ long-chain reasoning ํŒจํ„ด ์ „์ด ์„ฑ๊ณต.

์‚ฌ์šฉ ๊ถŒ์žฅ

  • ๊ถŒ์žฅ max_tokens: 1024 ์ด์ƒ (chain-of-thought ํŠน์„ฑ์ƒ 256 ํ† ํฐ์—์„œ ๋‹ต์ด ์ž˜๋ฆด ์ˆ˜ ์žˆ์Œ)
  • ์‚ฌ๊ณ  ํŒจํ„ด: ์˜์–ด reasoning ํ›„ ํ•œ๊ตญ์–ด ๋‹ต๋ณ€. ๋‹ต๋งŒ ์ •ํ™•ํ•˜๋ฉด OK์ธ ๊ฒฝ์šฐ ๊ถŒ์žฅ.
  • ์ˆœ์ˆ˜ ํ•œ๊ตญ์–ด reasoning ์›ํ•˜๋ฉด: ๋ถ€๋ชจ Rogue-28B-MIX ๋‹จ๋… ์‚ฌ์šฉ ์ถ”์ฒœ.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "VIDraft/Darwin-28B-KOREA",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tok = AutoTokenizer.from_pretrained("VIDraft/Darwin-28B-KOREA", trust_remote_code=True)

msgs = [{"role": "user", "content": "๋Œ€ํ•œ๋ฏผ๊ตญ ํ—Œ๋ฒ• ์ œ10์กฐ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์„ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์š”์•ฝ."}]
inputs = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=1024, do_sample=False, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

License

Apache 2.0 (๋ถ€๋ชจ ๋ชจ๋ธ ์ƒ์†).

Citation / Acknowledgment

VIDRAFT Darwin Family โ€” Evolutionary Model Merge Research. Pair: Darwin-28B-Opus ร— Rogue-28B-MIX โ†’ Darwin-28B-KOREA (PERFECT pair, 2026-05-14)


Built with the Darwin Factory pipeline. 16 customer orders bridged by single base model.

Downloads last month
19
Safetensors
Model size
28B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for VIDraft/Darwin-28B-KOREA

Finetuned
(1)
this model