SOVYN-85M / README.md
SOVYN's picture
Update README.md
5c6711b verified
---
license: apache-2.0
language:
- ko
tags:
- reasoning
- math
- code
- from-scratch
- korean
- gpt
pipeline_tag: text-generation
model-index:
- name: SOVYN-85M
results:
- task:
type: reasoning
name: Custom Reasoning Benchmark
metrics:
- type: accuracy
value: 86.5
name: Overall Accuracy
---
# SOVYN-85M
์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•œ 85M ํŒŒ๋ผ๋ฏธํ„ฐ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ.
์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํ™”ํ•™, ์ƒ๋ฌผ, ์ง€๊ตฌ๊ณผํ•™, ํ•œ๊ตญ์‚ฌ, ๋ฏธ์ ๋ถ„ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•œ๋‹ค.
## ์ŠคํŽ™
| | |
|---|---|
| ํŒŒ๋ผ๋ฏธํ„ฐ | 85.4M |
| ์•„ํ‚คํ…์ฒ˜ | GPT (Decoder-only) |
| ๋ ˆ์ด์–ด | 12 |
| ์–ดํ…์…˜ ํ—ค๋“œ | 12 |
| ์ž„๋ฒ ๋”ฉ ์ฐจ์› | 768 |
| ์ปจํ…์ŠคํŠธ ๊ธธ์ด | 512 |
| ์–ดํœ˜ ํฌ๊ธฐ | 16,384 (BPE) |
| ์–ดํ…์…˜ | Flash Attention (SDPA) |
| ์ •๋ฐ€๋„ | float16 |
## ํ•™์Šต
- ๋ฐ์ดํ„ฐ: 591,261๊ฐœ ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (119 ์นดํ…Œ๊ณ ๋ฆฌ), 27.97M ํ† ํฐ
- ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW (lr=3e-4, weight_decay=0.1)
- ์Šค์ผ€์ค„: Cosine decay + warmup 500 steps
- ๋ฐฐ์น˜: 16 x 4 grad_accum = effective 64
- ์Šคํ…: 20,000
- GPU: RTX 5080 16GB
- ํ•™์Šต ์‹œ๊ฐ„: ~4์‹œ๊ฐ„
## ๋ฒค์น˜๋งˆํฌ
์ž์ฒด ๋ฒค์น˜๋งˆํฌ 52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ.
| ์นดํ…Œ๊ณ ๋ฆฌ | ์ •ํ™•๋„ |
|---------|--------|
| ์‚ฐ์ˆ  | 100% |
| ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ | 100% |
| ์ˆซ์ž ์„ฑ์งˆ | 100% |
| ์„œ์ˆ ํ˜• | 100% |
| ์—ฐ์‚ฐ ์šฐ์„ ์ˆœ์œ„ | 88% |
| ๋ฆฌ์ŠคํŠธ ์—ฐ์‚ฐ | 83% |
| ๊ด„ํ˜ธ ์—ฐ์‚ฐ | 80% |
| ๋ฐฉ์ •์‹ | 80% |
| ๋…ผ๋ฆฌ | 80% |
| ์ˆ˜์—ด | 33% |
| **์ „์ฒด** | **86.5%** |
## ์‚ฌ์šฉ๋ฒ•
```bash
pip install torch safetensors tokenizers huggingface_hub
```
```python
import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
# ๋‹ค์šด๋กœ๋“œ
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
# ์•„ํ‚คํ…์ฒ˜ ๋กœ๋“œ
import importlib.util
spec = importlib.util.spec_from_file_location("model", code_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
# ๋ชจ๋ธ ๋กœ๋“œ
model = mod.SOVYN85M()
state_dict = load_file(model_path)
state_dict = {k: v.float() for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()
tokenizer = Tokenizer.from_file(tok_path)
# ์ถ”๋ก 
prompt = "๋ฌธ์ œ: 3x + 7 = 22์ผ ๋•Œ, x์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.\nํ’€์ด:\n"
ids = torch.tensor([tokenizer.encode(prompt).ids])
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
print(tokenizer.decode(out[0].tolist()))
```
## ํ”„๋กฌํ”„ํŠธ ํ˜•์‹
```
๋ฌธ์ œ: {๋‚ด์šฉ}
ํ’€์ด:
```
"ํ’€์ด:" ์ดํ›„๋ฅผ ์ƒ์„ฑ. ๋‹จ๊ณ„๋ณ„ ํ’€์ด + "๋‹ต: {์ •๋‹ต}" ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ.
## ์ œํ•œ์‚ฌํ•ญ
- ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต. ์ž์œ  ๋Œ€ํ™” ๋ถˆ๊ฐ€.
- ์ˆ˜์—ด(๋“ฑ๋น„/ํ”ผ๋ณด๋‚˜์น˜) ์•ฝํ•จ.
- ์ปจํ…์ŠคํŠธ 512 ํ† ํฐ ์ œํ•œ.
## ๋ผ์ด์„ ์Šค
Apache-2.0