File size: 2,981 Bytes
8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b 8d221f0 5c6711b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | ---
license: apache-2.0
language:
- ko
tags:
- reasoning
- math
- code
- from-scratch
- korean
- gpt
pipeline_tag: text-generation
model-index:
- name: SOVYN-85M
results:
- task:
type: reasoning
name: Custom Reasoning Benchmark
metrics:
- type: accuracy
value: 86.5
name: Overall Accuracy
---
# SOVYN-85M
์ฒ์๋ถํฐ ํ์ตํ 85M ํ๋ผ๋ฏธํฐ ํ๊ตญ์ด ์ถ๋ก ๋ชจ๋ธ.
์ํ, ์ฝ๋ ํธ๋ ์ด์ฑ, ๋
ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํํ, ์๋ฌผ, ์ง๊ตฌ๊ณผํ, ํ๊ตญ์ฌ, ๋ฏธ์ ๋ถ ๋ฑ 119๊ฐ ์นดํ
๊ณ ๋ฆฌ์ ๋ฌธ์ ๋ฅผ ๋จ๊ณ๋ณ๋ก ํ์ดํ๋ค.
## ์คํ
| | |
|---|---|
| ํ๋ผ๋ฏธํฐ | 85.4M |
| ์ํคํ
์ฒ | GPT (Decoder-only) |
| ๋ ์ด์ด | 12 |
| ์ดํ
์
ํค๋ | 12 |
| ์๋ฒ ๋ฉ ์ฐจ์ | 768 |
| ์ปจํ
์คํธ ๊ธธ์ด | 512 |
| ์ดํ ํฌ๊ธฐ | 16,384 (BPE) |
| ์ดํ
์
| Flash Attention (SDPA) |
| ์ ๋ฐ๋ | float16 |
## ํ์ต
- ๋ฐ์ดํฐ: 591,261๊ฐ ํฉ์ฑ ์ถ๋ก ๋ฌธ์ (119 ์นดํ
๊ณ ๋ฆฌ), 27.97M ํ ํฐ
- ์ตํฐ๋ง์ด์ : AdamW (lr=3e-4, weight_decay=0.1)
- ์ค์ผ์ค: Cosine decay + warmup 500 steps
- ๋ฐฐ์น: 16 x 4 grad_accum = effective 64
- ์คํ
: 20,000
- GPU: RTX 5080 16GB
- ํ์ต ์๊ฐ: ~4์๊ฐ
## ๋ฒค์น๋งํฌ
์์ฒด ๋ฒค์น๋งํฌ 52๋ฌธ์ , 10๊ฐ ์นดํ
๊ณ ๋ฆฌ.
| ์นดํ
๊ณ ๋ฆฌ | ์ ํ๋ |
|---------|--------|
| ์ฐ์ | 100% |
| ์ฝ๋ ํธ๋ ์ด์ฑ | 100% |
| ์ซ์ ์ฑ์ง | 100% |
| ์์ ํ | 100% |
| ์ฐ์ฐ ์ฐ์ ์์ | 88% |
| ๋ฆฌ์คํธ ์ฐ์ฐ | 83% |
| ๊ดํธ ์ฐ์ฐ | 80% |
| ๋ฐฉ์ ์ | 80% |
| ๋
ผ๋ฆฌ | 80% |
| ์์ด | 33% |
| **์ ์ฒด** | **86.5%** |
## ์ฌ์ฉ๋ฒ
```bash
pip install torch safetensors tokenizers huggingface_hub
```
```python
import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
# ๋ค์ด๋ก๋
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
# ์ํคํ
์ฒ ๋ก๋
import importlib.util
spec = importlib.util.spec_from_file_location("model", code_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
# ๋ชจ๋ธ ๋ก๋
model = mod.SOVYN85M()
state_dict = load_file(model_path)
state_dict = {k: v.float() for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()
tokenizer = Tokenizer.from_file(tok_path)
# ์ถ๋ก
prompt = "๋ฌธ์ : 3x + 7 = 22์ผ ๋, x์ ๊ฐ์ ๊ตฌํ์์ค.\nํ์ด:\n"
ids = torch.tensor([tokenizer.encode(prompt).ids])
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
print(tokenizer.decode(out[0].tolist()))
```
## ํ๋กฌํํธ ํ์
```
๋ฌธ์ : {๋ด์ฉ}
ํ์ด:
```
"ํ์ด:" ์ดํ๋ฅผ ์์ฑ. ๋จ๊ณ๋ณ ํ์ด + "๋ต: {์ ๋ต}" ํํ๋ก ์ถ๋ ฅ.
## ์ ํ์ฌํญ
- ํฉ์ฑ ๋ฐ์ดํฐ๋ก๋ง ํ์ต. ์์ ๋ํ ๋ถ๊ฐ.
- ์์ด(๋ฑ๋น/ํผ๋ณด๋์น) ์ฝํจ.
- ์ปจํ
์คํธ 512 ํ ํฐ ์ ํ.
## ๋ผ์ด์ ์ค
Apache-2.0 |