Update README.md
Browse files
README.md
CHANGED
|
@@ -1,129 +1,70 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- ko
|
| 5 |
-
tags:
|
| 6 |
-
- reasoning
|
| 7 |
-
- math
|
| 8 |
-
- code
|
| 9 |
-
- from-scratch
|
| 10 |
-
- korean
|
| 11 |
-
- gpt
|
| 12 |
-
pipeline_tag: text-generation
|
| 13 |
-
model-index:
|
| 14 |
-
- name: SOVYN-85M
|
| 15 |
-
results:
|
| 16 |
-
- task:
|
| 17 |
-
type: reasoning
|
| 18 |
-
name: Custom Reasoning Benchmark
|
| 19 |
-
metrics:
|
| 20 |
-
- type: accuracy
|
| 21 |
-
value: 86.5
|
| 22 |
-
name: Overall Accuracy
|
| 23 |
-
---
|
| 24 |
-
|
| 25 |
-
# SOVYN-85M
|
| 26 |
-
|
| 27 |
-
์ฒ์๋ถํฐ ํ์ต
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
| | |
|
| 34 |
-
|
|
| 35 |
-
|
|
| 36 |
-
|
|
| 37 |
-
|
|
| 38 |
-
|
|
| 39 |
-
|
|
| 40 |
-
|
|
| 41 |
-
|
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
| **์ ์ฒด** | **86.5%** |
|
| 72 |
-
|
| 73 |
-
## ์ฌ์ฉ๋ฒ
|
| 74 |
-
|
| 75 |
-
```bash
|
| 76 |
-
pip install torch safetensors tokenizers huggingface_hub
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
```python
|
| 80 |
-
import torch
|
| 81 |
-
from safetensors.torch import load_file
|
| 82 |
-
from tokenizers import Tokenizer
|
| 83 |
-
from huggingface_hub import hf_hub_download
|
| 84 |
-
|
| 85 |
-
# ๋ค์ด๋ก๋
|
| 86 |
-
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
|
| 87 |
-
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
|
| 88 |
-
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
|
| 89 |
-
|
| 90 |
-
# ์ํคํ
์ฒ ๋ก๋
|
| 91 |
-
import importlib.util
|
| 92 |
-
spec = importlib.util.spec_from_file_location("model", code_path)
|
| 93 |
-
mod = importlib.util.module_from_spec(spec)
|
| 94 |
-
spec.loader.exec_module(mod)
|
| 95 |
-
|
| 96 |
-
# ๋ชจ๋ธ ๋ก๋
|
| 97 |
-
model = mod.SOVYN85M()
|
| 98 |
-
state_dict = load_file(model_path)
|
| 99 |
-
state_dict = {k: v.float() for k, v in state_dict.items()}
|
| 100 |
-
model.load_state_dict(state_dict)
|
| 101 |
-
model.eval()
|
| 102 |
-
|
| 103 |
-
tokenizer = Tokenizer.from_file(tok_path)
|
| 104 |
-
|
| 105 |
-
# ์ถ๋ก
|
| 106 |
-
prompt = "๋ฌธ์ : 3x + 7 = 22์ผ ๋, x์ ๊ฐ์ ๊ตฌํ์์ค.\nํ์ด:\n"
|
| 107 |
-
ids = torch.tensor([tokenizer.encode(prompt).ids])
|
| 108 |
-
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
|
| 109 |
-
print(tokenizer.decode(out[0].tolist()))
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
## ํ๋กฌํํธ ํ์
|
| 113 |
-
|
| 114 |
-
```
|
| 115 |
-
๋ฌธ์ : {๋ด์ฉ}
|
| 116 |
-
ํ์ด:
|
| 117 |
-
```
|
| 118 |
-
|
| 119 |
-
"ํ์ด:" ์ดํ๋ฅผ ์์ฑ. ๋จ๊ณ๋ณ ํ์ด + "๋ต: {์ ๋ต}" ํํ๋ก ์ถ๋ ฅ.
|
| 120 |
-
|
| 121 |
-
## ์ ํ์ฌํญ
|
| 122 |
-
|
| 123 |
-
- ํฉ์ฑ ๋ฐ์ดํฐ๋ก๋ง ํ์ต. ์์ ๋ํ ๋ถ๊ฐ.
|
| 124 |
-
- ์์ด(๋ฑ๋น/ํผ๋ณด๋์น) ์ฝํจ.
|
| 125 |
-
- ์ปจํ
์คํธ 512 ํ ํฐ ์ ํ.
|
| 126 |
-
|
| 127 |
-
## ๋ผ์ด์ ์ค
|
| 128 |
-
|
| 129 |
-
Apache-2.0
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
tags:
|
| 6 |
+
- reasoning
|
| 7 |
+
- math
|
| 8 |
+
- code
|
| 9 |
+
- from-scratch
|
| 10 |
+
- korean
|
| 11 |
+
- gpt
|
| 12 |
+
pipeline_tag: text-generation
|
| 13 |
+
model-index:
|
| 14 |
+
- name: SOVYN-85M
|
| 15 |
+
results:
|
| 16 |
+
- task:
|
| 17 |
+
type: reasoning
|
| 18 |
+
name: Custom Reasoning Benchmark
|
| 19 |
+
metrics:
|
| 20 |
+
- type: accuracy
|
| 21 |
+
value: 86.5
|
| 22 |
+
name: Overall Accuracy
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
# SOVYN-85M
|
| 26 |
+
|
| 27 |
+
SOVYN-85M์ ์ฒ์๋ถํฐ(From Scratch) ํ์ต๋ 85.4M ํ๋ผ๋ฏธํฐ ๊ท๋ชจ์ ํ๊ตญ์ด ์ถ๋ก ๋ชจ๋ธ์
๋๋ค. ์ํ, ์ฝ๋ ํธ๋ ์ด์ฑ, ๋
ผ๋ฆฌ ๋ฑ 119๊ฐ ์นดํ
๊ณ ๋ฆฌ์ ๋ฌธ์ ๋ฅผ ๋จ๊ณ๋ณ๋ก ํ์ดํ๋๋ก ์ค๊ณ๋์์ต๋๋ค.
|
| 28 |
+
|
| 29 |
+
## Model Specifications
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
| Attribute | Value |
|
| 33 |
+
| :--- | :--- |
|
| 34 |
+
| **Parameters** | 85.4M |
|
| 35 |
+
| **Architecture** | GPT (Decoder-only) |
|
| 36 |
+
| **Layers / Heads** | 12 / 12 |
|
| 37 |
+
| **Embedding Dim** | 768 |
|
| 38 |
+
| **Context Length** | 512 |
|
| 39 |
+
| **Vocabulary Size** | 16,384 (BPE) |
|
| 40 |
+
| **Precision** | Float16 |
|
| 41 |
+
| **Attention** | Flash Attention (SDPA) |
|
| 42 |
+
|
| 43 |
+
## Training Details
|
| 44 |
+
|
| 45 |
+
- **Dataset**: 591,261 ํฉ์ฑ ์ถ๋ก ๋ฌธ์ (27.97M ํ ํฐ)
|
| 46 |
+
- **Optimizer**: AdamW (lr=3e-4, weight_decay=0.1)
|
| 47 |
+
- **Schedule**: Cosine decay (Warmup: 500 steps)
|
| 48 |
+
- **Batch Size**: Effective 64 (16 x 4 grad_accum)
|
| 49 |
+
- **Total Steps**: 20,000
|
| 50 |
+
- **Hardware**: RTX 5080 16GB (Training time: ~4h)
|
| 51 |
+
|
| 52 |
+
## Benchmarks
|
| 53 |
+
|
| 54 |
+
์์ฒด ๋ฒค์น๋งํฌ(52๋ฌธ์ , 10๊ฐ ์นดํ
๊ณ ๋ฆฌ) ๊ฒฐ๊ณผ์
๋๋ค.
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
| Category | Accuracy | Category | Accuracy |
|
| 58 |
+
| :--- | :--- | :--- | :--- |
|
| 59 |
+
| **Arithmetic** | 100% | **Number Property** | 100% |
|
| 60 |
+
| **Code Tracing** | 100% | **Word Problems** | 100% |
|
| 61 |
+
| **Precedence** | 88% | **List Operations** | 83% |
|
| 62 |
+
| **Equations** | 80% | **Logic** | 80% |
|
| 63 |
+
| **Parentheses** | 80% | **Series** | 33% |
|
| 64 |
+
| **Overall** | **86.5%** | | |
|
| 65 |
+
|
| 66 |
+
## Usage
|
| 67 |
+
|
| 68 |
+
### Dependencies
|
| 69 |
+
```bash
|
| 70 |
+
pip install torch safetensors tokenizers huggingface_hub
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|