Darwin-2B-Opus / README.md
SeaWolf-AI's picture
Update README.md
31f1a2c verified
|
Raw
History Blame Contribute Delete
5.64 kB
---
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
- darwin
- darwin-v8
- opus-distilled
- qwen3.5
- reasoning
- korean
- claude-opus
- lora-merged
language:
- en
- ko
- zh
- ja
pipeline_tag: text-generation
library_name: transformers
---
# ๐Ÿง  Darwin-2B-Opus
**Darwin V8 ์‹œ๋ฆฌ์ฆˆ์˜ 2B ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ**
Claude Opus 4.5/4.6 ๋ฐ Sonnet 4.6์˜ ์ถ”๋ก  ์Šคํƒ€์ผ์„ ์ฃผ์ž…ํ•œ Qwen3.5-2B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ.
---
## ๐Ÿงฌ ๊ฐ€๊ณ„๋„ (Pedigree)
- ๐Ÿ‘จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B)
- ๐Ÿ‘ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Darwin-2B-Opus-LoRA`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-LoRA)
- ๐Ÿ‘ถ **Child (This model)**: `FINAL-Bench/Darwin-2B-Opus` โ€” merged full-weight standalone
---
## ๐Ÿ† Darwin V8 ์‹œ๋ฆฌ์ฆˆ ์ •๋ณด
| ํ•ญ๋ชฉ | ๊ฐ’ |
|------|-----|
| ๋ชจ๋ธ ํฌ๊ธฐ | 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ |
| ์•„ํ‚คํ…์ฒ˜ | Qwen3.5 (hybrid attention) |
| ํ•™์Šต ๋ฐฉ์‹ | SFT with LoRA (all-linear, rank=16) |
| ํ•™์Šต ๋ฐ์ดํ„ฐ | 9,762 ์ƒ˜ํ”Œ (Claude Opus/Sonnet + ํ•œ๊ตญ์–ด reasoning) |
| ํ•™์Šต ์‹œ๊ฐ„ | 29๋ถ„ (8ร—B200 GPU) |
| ์ตœ์ข… Loss | 0.837 |
| Token Accuracy | 76.6% |
### ๐Ÿ“Š ๋ฒค์น˜๋งˆํฌ (GPQA Diamond 198)
- **์ •ํ™•๋„**: 37.37% (74/198)
- **๋‹ต๋ณ€ ์ถ”์ถœ ์„ฑ๊ณต๋ฅ  ๊ธฐ์ค€ ์ •๋‹ต๋ฅ **: 50.7%
---
## ๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ•
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "FINAL-Bench/Darwin-2B-Opus"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
messages = [
{"role": "user", "content": "2024๋…„ ํ•œ๊ตญ ์ตœ์ €์‹œ๊ธ‰ 9,860์›์ด๋‹ค. ์ฃผ 40์‹œ๊ฐ„ ร— 4์ฃผ ์ž„๊ธˆ์€?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=800,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
---
## ๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ
```
[Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ)
+
[9,762 Claude Opus/Sonnet + ํ•œ๊ตญ์–ด Reasoning ์ƒ˜ํ”Œ]
โ†“
[SFT Training]
- LoRA (all-linear, r=16, ฮฑ=32)
- Learning rate: 2e-4 (V8 rule: ร—10 FullFT)
- 2 epochs, bf16, 8ร—B200 DDP
- Loss: 0.991 โ†’ 0.837 (-15%)
- Token accuracy: 73.9% โ†’ 76.6% (+2.7%p)
โ†“
[LoRA merge into base weights]
โ†“
[Darwin-2B-Opus] โ† ์ด ๋ชจ๋ธ
```
---
## ๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ
| ์นดํ…Œ๊ณ ๋ฆฌ | ์ƒ˜ํ”Œ ์ˆ˜ | % | ์ถœ์ฒ˜ |
|---------|--------|---|-----|
| General Reasoning | 4,422 | 45% | Opus 4.5/4.6, Sonnet 4.6 |
| Math (English) | 1,960 | 20% | DeepSeek-v3.2 OpenR1-Math |
| Code (English) | 1,680 | 17% | DeepSeek-v3.2 CodeReasoning + GPT-5 Codex |
| Korean Thinking | 200 | 2% | Multilingual-Thinking-Korean |
| **Korean Math** | **1,500** | **15%** | orca-math-word-problems-korean |
| **ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„)** | **9,762** | 100% | - |
---
## ๐ŸŽฏ Darwin V8 ์„ค๊ณ„ ์ฒ ํ•™
1. **LoRA Without Regret** โ€” `all-linear` target, LR ร— 10, rank=16์œผ๋กœ ์ถฉ๋ถ„
2. **Response Distillation** โ€” Pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜
3. **ํ•œ๊ตญ์–ด Reasoning ๊ฐ•ํ™”** โ€” KoAlpaca ๊ฐ„๋‹จ QA ๋Œ€์‹  Claude ์ถ”๋ก  ๊ถค์  ์‚ฌ์šฉ
4. **Merge-and-Deploy** โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ
---
## ๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (5๋ฌธ์ œ)
| ์œ ํ˜• | ์ •๋‹ต | ๋น„๊ณ  |
|-----|:---:|-----|
| ์˜์–ด ์ˆ˜ํ•™ (๊ธฐ์ฐจ ์†๋„) | โœ… 80 km/h | LaTeX ๋‹จ๊ณ„๋ณ„ ํ’€์ด |
| ์˜์–ด ๋…ผ๋ฆฌ (ํ‚ค ๋น„๊ต) | โœ… Carol | ์ถ”์ด์œจ ๋ช…์‹œ |
| ์˜์–ด ์ฝ”๋“œ (์†Œ์ˆ˜ ํŒ๋ณ„) | โœ… ์ •ํ™• | docstring + ๋ณต์žก๋„ ๋ถ„์„ |
| **ํ•œ๊ตญ์–ด ์‹œ๊ธ‰ ๊ณ„์‚ฐ** | โœ… **1,577,600์›** | ๋‹จ๊ณ„๋ณ„ ํ•œ๊ตญ์–ด ์„ค๋ช… |
| **ํ•œ๊ตญ์–ด ์—ฐ๋ฆฝ๋ฐฉ์ •์‹** | โœ… **1,200์›** | ์ •์„ ํ’€์ด + ๊ฒ€์ฆ |
**5/5 ์ •๋‹ต** โ€” ์˜์–ด+ํ•œ๊ตญ์–ด ๋ชจ๋‘ ์™„๋ฒฝ โญ
---
## โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ
- **๊ทœ๋ชจ**: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (Darwin ์‹œ๋ฆฌ์ฆˆ ์ตœ์†Œ)
- **GPQA Diamond**: 37.37% (๋Œ€ํ˜• ๋ชจ๋ธ ๋Œ€๋น„ ๋‚ฎ์ง€๋งŒ 2B ์ค‘ ์ตœ๊ณ  ์ˆ˜์ค€)
- **๊ธด ์ปจํ…์ŠคํŠธ**: ํ•™์Šต ์‹œ `max_length=4,096`๋กœ ํ•™์Šต๋จ
- **์ง€์‹ ํ•œ๊ณ„**: 2B ๋ชจ๋ธ์€ ๋ฐฑ๊ณผ์‚ฌ์ „์  ์ง€์‹ ํ•œ๊ณ„ ์žˆ์Œ
---
## ๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ
- ๐Ÿงฉ [`FINAL-Bench/Darwin-2B-Opus-LoRA`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-LoRA) โ€” ์ด ๋ชจ๋ธ์˜ **LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „** (67MB)
- โšก [`FINAL-Bench/Darwin-2B-Opus-ONNX`](https://huggingface.co/FINAL-Bench/Darwin-2B-Opus-ONNX) โ€” **๋ธŒ๋ผ์šฐ์ €/WebGPU์šฉ ONNX ์–‘์žํ™” ๋ฒ„์ „** (์˜ˆ์ •)
### ๐Ÿ† Darwin ์‹œ๋ฆฌ์ฆˆ
- [`Darwin-31B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-31B-Opus) โ€” GPQA 85.9%
- [`Darwin-27B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-27B-Opus) โ€” GPQA 86.9%
- [`Darwin-9B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus)
- [`Darwin-4B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-4B-Opus)
- **Darwin-2B-Opus** (์ด ๋ชจ๋ธ) โญ ์ตœ๊ฒฝ๋Ÿ‰
---
## ๐Ÿชช ๋ผ์ด์„ ์Šค
- Base model: Apache 2.0 (Qwen)
- ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ
- ์ด ๋ชจ๋ธ: Apache 2.0
---
## ๐Ÿ™ ํฌ๋ ˆ๋”ง
- **Base**: Qwen team (Alibaba)
- **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
- **๋ฐ์ดํ„ฐ ๊ณต๊ฐœ**: nohurry, TeichAI, kuotient, PoSTMEDIA
- **Training & Release**: **FINAL-Bench / VIDRAFT_LAB**
---
*Darwin V8 ยท Part of the evolutionary model series by FINAL-Bench*
This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386).