SOVYN-85M
์ฒ์๋ถํฐ ํ์ตํ 85M ํ๋ผ๋ฏธํฐ ํ๊ตญ์ด ์ถ๋ก ๋ชจ๋ธ.
์ํ, ์ฝ๋ ํธ๋ ์ด์ฑ, ๋ ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํํ, ์๋ฌผ, ์ง๊ตฌ๊ณผํ, ํ๊ตญ์ฌ, ๋ฏธ์ ๋ถ ๋ฑ 119๊ฐ ์นดํ ๊ณ ๋ฆฌ์ ๋ฌธ์ ๋ฅผ ๋จ๊ณ๋ณ๋ก ํ์ดํ๋ค.
์คํ
| ํ๋ผ๋ฏธํฐ | 85.4M |
| ์ํคํ ์ฒ | GPT (Decoder-only) |
| ๋ ์ด์ด | 12 |
| ์ดํ ์ ํค๋ | 12 |
| ์๋ฒ ๋ฉ ์ฐจ์ | 768 |
| ์ปจํ ์คํธ ๊ธธ์ด | 512 |
| ์ดํ ํฌ๊ธฐ | 16,384 (BPE) |
| ์ดํ ์ | Flash Attention (SDPA) |
| ์ ๋ฐ๋ | float16 |
ํ์ต
- ๋ฐ์ดํฐ: 591,261๊ฐ ํฉ์ฑ ์ถ๋ก ๋ฌธ์ (119 ์นดํ ๊ณ ๋ฆฌ), 27.97M ํ ํฐ
- ์ตํฐ๋ง์ด์ : AdamW (lr=3e-4, weight_decay=0.1)
- ์ค์ผ์ค: Cosine decay + warmup 500 steps
- ๋ฐฐ์น: 16 x 4 grad_accum = effective 64
- ์คํ : 20,000
- GPU: RTX 5080 16GB
- ํ์ต ์๊ฐ: ~4์๊ฐ
๋ฒค์น๋งํฌ
์์ฒด ๋ฒค์น๋งํฌ 52๋ฌธ์ , 10๊ฐ ์นดํ ๊ณ ๋ฆฌ.
| ์นดํ ๊ณ ๋ฆฌ | ์ ํ๋ |
|---|---|
| ์ฐ์ | 100% |
| ์ฝ๋ ํธ๋ ์ด์ฑ | 100% |
| ์ซ์ ์ฑ์ง | 100% |
| ์์ ํ | 100% |
| ์ฐ์ฐ ์ฐ์ ์์ | 88% |
| ๋ฆฌ์คํธ ์ฐ์ฐ | 83% |
| ๊ดํธ ์ฐ์ฐ | 80% |
| ๋ฐฉ์ ์ | 80% |
| ๋ ผ๋ฆฌ | 80% |
| ์์ด | 33% |
| ์ ์ฒด | 86.5% |
์ฌ์ฉ๋ฒ
pip install torch safetensors tokenizers huggingface_hub
import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
# ๋ค์ด๋ก๋
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
# ์ํคํ
์ฒ ๋ก๋
import importlib.util
spec = importlib.util.spec_from_file_location("model", code_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
# ๋ชจ๋ธ ๋ก๋
model = mod.SOVYN85M()
state_dict = load_file(model_path)
state_dict = {k: v.float() for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()
tokenizer = Tokenizer.from_file(tok_path)
# ์ถ๋ก
prompt = "๋ฌธ์ : 3x + 7 = 22์ผ ๋, x์ ๊ฐ์ ๊ตฌํ์์ค.\nํ์ด:\n"
ids = torch.tensor([tokenizer.encode(prompt).ids])
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
print(tokenizer.decode(out[0].tolist()))
ํ๋กฌํํธ ํ์
๋ฌธ์ : {๋ด์ฉ}
ํ์ด:
"ํ์ด:" ์ดํ๋ฅผ ์์ฑ. ๋จ๊ณ๋ณ ํ์ด + "๋ต: {์ ๋ต}" ํํ๋ก ์ถ๋ ฅ.
์ ํ์ฌํญ
- ํฉ์ฑ ๋ฐ์ดํฐ๋ก๋ง ํ์ต. ์์ ๋ํ ๋ถ๊ฐ.
- ์์ด(๋ฑ๋น/ํผ๋ณด๋์น) ์ฝํจ.
- ์ปจํ ์คํธ 512 ํ ํฐ ์ ํ.
๋ผ์ด์ ์ค
Apache-2.0
- Downloads last month
- 300
Evaluation results
- Overall Accuracyself-reported86.500