| --- |
| license: apache-2.0 |
| language: |
| - ko |
| tags: |
| - reasoning |
| - math |
| - code |
| - from-scratch |
| - korean |
| - gpt |
| pipeline_tag: text-generation |
| model-index: |
| - name: SOVYN-85M |
| results: |
| - task: |
| type: reasoning |
| name: Custom Reasoning Benchmark |
| metrics: |
| - type: accuracy |
| value: 86.5 |
| name: Overall Accuracy |
| --- |
| |
| # SOVYN-85M |
|
|
| ์ฒ์๋ถํฐ ํ์ตํ 85M ํ๋ผ๋ฏธํฐ ํ๊ตญ์ด ์ถ๋ก ๋ชจ๋ธ. |
|
|
| ์ํ, ์ฝ๋ ํธ๋ ์ด์ฑ, ๋
ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํํ, ์๋ฌผ, ์ง๊ตฌ๊ณผํ, ํ๊ตญ์ฌ, ๋ฏธ์ ๋ถ ๋ฑ 119๊ฐ ์นดํ
๊ณ ๋ฆฌ์ ๋ฌธ์ ๋ฅผ ๋จ๊ณ๋ณ๋ก ํ์ดํ๋ค. |
|
|
| ## ์คํ |
|
|
| | | | |
| |---|---| |
| | ํ๋ผ๋ฏธํฐ | 85.4M | |
| | ์ํคํ
์ฒ | GPT (Decoder-only) | |
| | ๋ ์ด์ด | 12 | |
| | ์ดํ
์
ํค๋ | 12 | |
| | ์๋ฒ ๋ฉ ์ฐจ์ | 768 | |
| | ์ปจํ
์คํธ ๊ธธ์ด | 512 | |
| | ์ดํ ํฌ๊ธฐ | 16,384 (BPE) | |
| | ์ดํ
์
| Flash Attention (SDPA) | |
| | ์ ๋ฐ๋ | float16 | |
|
|
| ## ํ์ต |
|
|
| - ๋ฐ์ดํฐ: 591,261๊ฐ ํฉ์ฑ ์ถ๋ก ๋ฌธ์ (119 ์นดํ
๊ณ ๋ฆฌ), 27.97M ํ ํฐ |
| - ์ตํฐ๋ง์ด์ : AdamW (lr=3e-4, weight_decay=0.1) |
| - ์ค์ผ์ค: Cosine decay + warmup 500 steps |
| - ๋ฐฐ์น: 16 x 4 grad_accum = effective 64 |
| - ์คํ
: 20,000 |
| - GPU: RTX 5080 16GB |
| - ํ์ต ์๊ฐ: ~4์๊ฐ |
|
|
| ## ๋ฒค์น๋งํฌ |
|
|
| ์์ฒด ๋ฒค์น๋งํฌ 52๋ฌธ์ , 10๊ฐ ์นดํ
๊ณ ๋ฆฌ. |
|
|
| | ์นดํ
๊ณ ๋ฆฌ | ์ ํ๋ | |
| |---------|--------| |
| | ์ฐ์ | 100% | |
| | ์ฝ๋ ํธ๋ ์ด์ฑ | 100% | |
| | ์ซ์ ์ฑ์ง | 100% | |
| | ์์ ํ | 100% | |
| | ์ฐ์ฐ ์ฐ์ ์์ | 88% | |
| | ๋ฆฌ์คํธ ์ฐ์ฐ | 83% | |
| | ๊ดํธ ์ฐ์ฐ | 80% | |
| | ๋ฐฉ์ ์ | 80% | |
| | ๋
ผ๋ฆฌ | 80% | |
| | ์์ด | 33% | |
| | **์ ์ฒด** | **86.5%** | |
|
|
| ## ์ฌ์ฉ๋ฒ |
|
|
| ```bash |
| pip install torch safetensors tokenizers huggingface_hub |
| ``` |
|
|
| ```python |
| import torch |
| from safetensors.torch import load_file |
| from tokenizers import Tokenizer |
| from huggingface_hub import hf_hub_download |
| |
| # ๋ค์ด๋ก๋ |
| model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors") |
| tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json") |
| code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py") |
| |
| # ์ํคํ
์ฒ ๋ก๋ |
| import importlib.util |
| spec = importlib.util.spec_from_file_location("model", code_path) |
| mod = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(mod) |
| |
| # ๋ชจ๋ธ ๋ก๋ |
| model = mod.SOVYN85M() |
| state_dict = load_file(model_path) |
| state_dict = {k: v.float() for k, v in state_dict.items()} |
| model.load_state_dict(state_dict) |
| model.eval() |
| |
| tokenizer = Tokenizer.from_file(tok_path) |
| |
| # ์ถ๋ก |
| prompt = "๋ฌธ์ : 3x + 7 = 22์ผ ๋, x์ ๊ฐ์ ๊ตฌํ์์ค.\nํ์ด:\n" |
| ids = torch.tensor([tokenizer.encode(prompt).ids]) |
| out = model.generate(ids, max_new_tokens=200, temperature=0.3) |
| print(tokenizer.decode(out[0].tolist())) |
| ``` |
|
|
| ## ํ๋กฌํํธ ํ์ |
|
|
| ``` |
| ๋ฌธ์ : {๋ด์ฉ} |
| ํ์ด: |
| ``` |
|
|
| "ํ์ด:" ์ดํ๋ฅผ ์์ฑ. ๋จ๊ณ๋ณ ํ์ด + "๋ต: {์ ๋ต}" ํํ๋ก ์ถ๋ ฅ. |
|
|
| ## ์ ํ์ฌํญ |
|
|
| - ํฉ์ฑ ๋ฐ์ดํฐ๋ก๋ง ํ์ต. ์์ ๋ํ ๋ถ๊ฐ. |
| - ์์ด(๋ฑ๋น/ํผ๋ณด๋์น) ์ฝํจ. |
| - ์ปจํ
์คํธ 512 ํ ํฐ ์ ํ. |
|
|
| ## ๋ผ์ด์ ์ค |
|
|
| Apache-2.0 |