Update README.md
Browse files
README.md
CHANGED
|
@@ -24,47 +24,106 @@ model-index:
|
|
| 24 |
|
| 25 |
# SOVYN-85M
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
|
|
|
|
| 31 |
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
| 35 |
-
|
|
| 36 |
-
|
|
| 37 |
-
|
|
| 38 |
-
|
|
| 39 |
-
|
|
| 40 |
-
|
|
| 41 |
-
|
|
|
|
|
| 42 |
|
| 43 |
-
##
|
| 44 |
|
| 45 |
-
-
|
| 46 |
-
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
|
|
|
| 51 |
|
| 52 |
-
##
|
| 53 |
|
| 54 |
-
์์ฒด ๋ฒค์น๋งํฌ
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
-
| :--- | :--- | :--- | :--- |
|
| 59 |
-
| **Arithmetic** | 100% | **Number Property** | 100% |
|
| 60 |
-
| **Code Tracing** | 100% | **Word Problems** | 100% |
|
| 61 |
-
| **Precedence** | 88% | **List Operations** | 83% |
|
| 62 |
-
| **Equations** | 80% | **Logic** | 80% |
|
| 63 |
-
| **Parentheses** | 80% | **Series** | 33% |
|
| 64 |
-
| **Overall** | **86.5%** | | |
|
| 65 |
|
| 66 |
-
## Usage
|
| 67 |
-
|
| 68 |
-
### Dependencies
|
| 69 |
```bash
|
| 70 |
pip install torch safetensors tokenizers huggingface_hub
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
# SOVYN-85M
|
| 26 |
|
| 27 |
+
์ฒ์๋ถํฐ ํ์ตํ 85M ํ๋ผ๋ฏธํฐ ํ๊ตญ์ด ์ถ๋ก ๋ชจ๋ธ.
|
| 28 |
|
| 29 |
+
์ํ, ์ฝ๋ ํธ๋ ์ด์ฑ, ๋
ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํํ, ์๋ฌผ, ์ง๊ตฌ๊ณผํ, ํ๊ตญ์ฌ, ๋ฏธ์ ๋ถ ๋ฑ 119๊ฐ ์นดํ
๊ณ ๋ฆฌ์ ๋ฌธ์ ๋ฅผ ๋จ๊ณ๋ณ๋ก ํ์ดํ๋ค.
|
| 30 |
|
| 31 |
+
## ์คํ
|
| 32 |
|
| 33 |
+
| | |
|
| 34 |
+
|---|---|
|
| 35 |
+
| ํ๋ผ๋ฏธํฐ | 85.4M |
|
| 36 |
+
| ์ํคํ
์ฒ | GPT (Decoder-only) |
|
| 37 |
+
| ๋ ์ด์ด | 12 |
|
| 38 |
+
| ์ดํ
์
ํค๋ | 12 |
|
| 39 |
+
| ์๋ฒ ๋ฉ ์ฐจ์ | 768 |
|
| 40 |
+
| ์ปจํ
์คํธ ๊ธธ์ด | 512 |
|
| 41 |
+
| ์ดํ ํฌ๊ธฐ | 16,384 (BPE) |
|
| 42 |
+
| ์ดํ
์
| Flash Attention (SDPA) |
|
| 43 |
+
| ์ ๋ฐ๋ | float16 |
|
| 44 |
|
| 45 |
+
## ํ์ต
|
| 46 |
|
| 47 |
+
- ๋ฐ์ดํฐ: 591,261๊ฐ ํฉ์ฑ ์ถ๋ก ๋ฌธ์ (119 ์นดํ
๊ณ ๋ฆฌ), 27.97M ํ ํฐ
|
| 48 |
+
- ์ตํฐ๋ง์ด์ : AdamW (lr=3e-4, weight_decay=0.1)
|
| 49 |
+
- ์ค์ผ์ค: Cosine decay + warmup 500 steps
|
| 50 |
+
- ๋ฐฐ์น: 16 x 4 grad_accum = effective 64
|
| 51 |
+
- ์คํ
: 20,000
|
| 52 |
+
- GPU: RTX 5080 16GB
|
| 53 |
+
- ํ์ต ์๊ฐ: ~4์๊ฐ
|
| 54 |
|
| 55 |
+
## ๋ฒค์น๋งํฌ
|
| 56 |
|
| 57 |
+
์์ฒด ๋ฒค์น๋งํฌ 52๋ฌธ์ , 10๊ฐ ์นดํ
๊ณ ๋ฆฌ.
|
| 58 |
|
| 59 |
+
| ์นดํ
๊ณ ๋ฆฌ | ์ ํ๋ |
|
| 60 |
+
|---------|--------|
|
| 61 |
+
| ์ฐ์ | 100% |
|
| 62 |
+
| ์ฝ๋ ํธ๋ ์ด์ฑ | 100% |
|
| 63 |
+
| ์ซ์ ์ฑ์ง | 100% |
|
| 64 |
+
| ์์ ํ | 100% |
|
| 65 |
+
| ์ฐ์ฐ ์ฐ์ ์์ | 88% |
|
| 66 |
+
| ๋ฆฌ์คํธ ์ฐ์ฐ | 83% |
|
| 67 |
+
| ๊ดํธ ์ฐ์ฐ | 80% |
|
| 68 |
+
| ๋ฐฉ์ ์ | 80% |
|
| 69 |
+
| ๋
ผ๋ฆฌ | 80% |
|
| 70 |
+
| ์์ด | 33% |
|
| 71 |
+
| **์ ์ฒด** | **86.5%** |
|
| 72 |
|
| 73 |
+
## ์ฌ์ฉ๋ฒ
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
|
|
|
|
|
|
|
|
|
| 75 |
```bash
|
| 76 |
pip install torch safetensors tokenizers huggingface_hub
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
```python
|
| 80 |
+
import torch
|
| 81 |
+
from safetensors.torch import load_file
|
| 82 |
+
from tokenizers import Tokenizer
|
| 83 |
+
from huggingface_hub import hf_hub_download
|
| 84 |
+
|
| 85 |
+
# ๋ค์ด๋ก๋
|
| 86 |
+
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
|
| 87 |
+
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
|
| 88 |
+
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
|
| 89 |
+
|
| 90 |
+
# ์ํคํ
์ฒ ๋ก๋
|
| 91 |
+
import importlib.util
|
| 92 |
+
spec = importlib.util.spec_from_file_location("model", code_path)
|
| 93 |
+
mod = importlib.util.module_from_spec(spec)
|
| 94 |
+
spec.loader.exec_module(mod)
|
| 95 |
+
|
| 96 |
+
# ๋ชจ๋ธ ๋ก๋
|
| 97 |
+
model = mod.SOVYN85M()
|
| 98 |
+
state_dict = load_file(model_path)
|
| 99 |
+
state_dict = {k: v.float() for k, v in state_dict.items()}
|
| 100 |
+
model.load_state_dict(state_dict)
|
| 101 |
+
model.eval()
|
| 102 |
+
|
| 103 |
+
tokenizer = Tokenizer.from_file(tok_path)
|
| 104 |
+
|
| 105 |
+
# ์ถ๋ก
|
| 106 |
+
prompt = "๋ฌธ์ : 3x + 7 = 22์ผ ๋, x์ ๊ฐ์ ๊ตฌํ์์ค.\nํ์ด:\n"
|
| 107 |
+
ids = torch.tensor([tokenizer.encode(prompt).ids])
|
| 108 |
+
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
|
| 109 |
+
print(tokenizer.decode(out[0].tolist()))
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## ํ๋กฌํํธ ํ์
|
| 113 |
+
|
| 114 |
+
```
|
| 115 |
+
๋ฌธ์ : {๋ด์ฉ}
|
| 116 |
+
ํ์ด:
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
"ํ์ด:" ์ดํ๋ฅผ ์์ฑ. ๋จ๊ณ๋ณ ํ์ด + "๋ต: {์ ๋ต}" ํํ๋ก ์ถ๋ ฅ.
|
| 120 |
+
|
| 121 |
+
## ์ ํ์ฌํญ
|
| 122 |
+
|
| 123 |
+
- ํฉ์ฑ ๋ฐ์ดํฐ๋ก๋ง ํ์ต. ์์ ๋ํ ๋ถ๊ฐ.
|
| 124 |
+
- ์์ด(๋ฑ๋น/ํผ๋ณด๋์น) ์ฝํจ.
|
| 125 |
+
- ์ปจํ
์คํธ 512 ํ ํฐ ์ ํ.
|
| 126 |
+
|
| 127 |
+
## ๋ผ์ด์ ์ค
|
| 128 |
+
|
| 129 |
+
Apache-2.0
|