SOVYN commited on
Commit
8d221f0
ยท
verified ยท
1 Parent(s): ffeff6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -129
README.md CHANGED
@@ -1,129 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - ko
5
- tags:
6
- - reasoning
7
- - math
8
- - code
9
- - from-scratch
10
- - korean
11
- - gpt
12
- pipeline_tag: text-generation
13
- model-index:
14
- - name: SOVYN-85M
15
- results:
16
- - task:
17
- type: reasoning
18
- name: Custom Reasoning Benchmark
19
- metrics:
20
- - type: accuracy
21
- value: 86.5
22
- name: Overall Accuracy
23
- ---
24
-
25
- # SOVYN-85M
26
-
27
- ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•œ 85M ํŒŒ๋ผ๋ฏธํ„ฐ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ.
28
-
29
- ์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํ™”ํ•™, ์ƒ๋ฌผ, ์ง€๊ตฌ๊ณผํ•™, ํ•œ๊ตญ์‚ฌ, ๋ฏธ์ ๋ถ„ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•œ๋‹ค.
30
-
31
- ## ์ŠคํŽ™
32
-
33
- | | |
34
- |---|---|
35
- | ํŒŒ๋ผ๋ฏธํ„ฐ | 85.4M |
36
- | ์•„ํ‚คํ…์ฒ˜ | GPT (Decoder-only) |
37
- | ๋ ˆ์ด์–ด | 12 |
38
- | ์–ดํ…์…˜ ํ—ค๋“œ | 12 |
39
- | ์ž„๋ฒ ๋”ฉ ์ฐจ์› | 768 |
40
- | ์ปจํ…์ŠคํŠธ ๊ธธ์ด | 512 |
41
- | ์–ดํœ˜ ํฌ๊ธฐ | 16,384 (BPE) |
42
- | ์–ดํ…์…˜ | Flash Attention (SDPA) |
43
- | ์ •๋ฐ€๋„ | float16 |
44
-
45
- ## ํ•™์Šต
46
-
47
- - ๋ฐ์ดํ„ฐ: 591,261๊ฐœ ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (119 ์นดํ…Œ๊ณ ๋ฆฌ), 27.97M ํ† ํฐ
48
- - ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW (lr=3e-4, weight_decay=0.1)
49
- - ์Šค์ผ€์ค„: Cosine decay + warmup 500 steps
50
- - ๋ฐฐ์น˜: 16 x 4 grad_accum = effective 64
51
- - ์Šคํ…: 20,000
52
- - GPU: RTX 5080 16GB
53
- - ํ•™์Šต ์‹œ๊ฐ„: ~4์‹œ๊ฐ„
54
-
55
- ## ๋ฒค์น˜๋งˆํฌ
56
-
57
- ์ž์ฒด ๋ฒค์น˜๋งˆํฌ 52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ.
58
-
59
- | ์นดํ…Œ๊ณ ๋ฆฌ | ์ •ํ™•๋„ |
60
- |---------|--------|
61
- | ์‚ฐ์ˆ  | 100% |
62
- | ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ | 100% |
63
- | ์ˆซ์ž ์„ฑ์งˆ | 100% |
64
- | ์„œ์ˆ ํ˜• | 100% |
65
- | ์—ฐ์‚ฐ ์šฐ์„ ์ˆœ์œ„ | 88% |
66
- | ๋ฆฌ์ŠคํŠธ ์—ฐ์‚ฐ | 83% |
67
- | ๊ด„ํ˜ธ ์—ฐ์‚ฐ | 80% |
68
- | ๋ฐฉ์ •์‹ | 80% |
69
- | ๋…ผ๋ฆฌ | 80% |
70
- | ์ˆ˜์—ด | 33% |
71
- | **์ „์ฒด** | **86.5%** |
72
-
73
- ## ์‚ฌ์šฉ๋ฒ•
74
-
75
- ```bash
76
- pip install torch safetensors tokenizers huggingface_hub
77
- ```
78
-
79
- ```python
80
- import torch
81
- from safetensors.torch import load_file
82
- from tokenizers import Tokenizer
83
- from huggingface_hub import hf_hub_download
84
-
85
- # ๋‹ค์šด๋กœ๋“œ
86
- model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
87
- tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
88
- code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
89
-
90
- # ์•„ํ‚คํ…์ฒ˜ ๋กœ๋“œ
91
- import importlib.util
92
- spec = importlib.util.spec_from_file_location("model", code_path)
93
- mod = importlib.util.module_from_spec(spec)
94
- spec.loader.exec_module(mod)
95
-
96
- # ๋ชจ๋ธ ๋กœ๋“œ
97
- model = mod.SOVYN85M()
98
- state_dict = load_file(model_path)
99
- state_dict = {k: v.float() for k, v in state_dict.items()}
100
- model.load_state_dict(state_dict)
101
- model.eval()
102
-
103
- tokenizer = Tokenizer.from_file(tok_path)
104
-
105
- # ์ถ”๋ก 
106
- prompt = "๋ฌธ์ œ: 3x + 7 = 22์ผ ๋•Œ, x์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.\nํ’€์ด:\n"
107
- ids = torch.tensor([tokenizer.encode(prompt).ids])
108
- out = model.generate(ids, max_new_tokens=200, temperature=0.3)
109
- print(tokenizer.decode(out[0].tolist()))
110
- ```
111
-
112
- ## ํ”„๋กฌํ”„ํŠธ ํ˜•์‹
113
-
114
- ```
115
- ๋ฌธ์ œ: {๋‚ด์šฉ}
116
- ํ’€์ด:
117
- ```
118
-
119
- "ํ’€์ด:" ์ดํ›„๋ฅผ ์ƒ์„ฑ. ๋‹จ๊ณ„๋ณ„ ํ’€์ด + "๋‹ต: {์ •๋‹ต}" ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ.
120
-
121
- ## ์ œํ•œ์‚ฌํ•ญ
122
-
123
- - ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต. ์ž์œ  ๋Œ€ํ™” ๋ถˆ๊ฐ€.
124
- - ์ˆ˜์—ด(๋“ฑ๋น„/ํ”ผ๋ณด๋‚˜์น˜) ์•ฝํ•จ.
125
- - ์ปจํ…์ŠคํŠธ 512 ํ† ํฐ ์ œํ•œ.
126
-
127
- ## ๋ผ์ด์„ ์Šค
128
-
129
- Apache-2.0
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ko
5
+ tags:
6
+ - reasoning
7
+ - math
8
+ - code
9
+ - from-scratch
10
+ - korean
11
+ - gpt
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: SOVYN-85M
15
+ results:
16
+ - task:
17
+ type: reasoning
18
+ name: Custom Reasoning Benchmark
19
+ metrics:
20
+ - type: accuracy
21
+ value: 86.5
22
+ name: Overall Accuracy
23
+ ---
24
+
25
+ # SOVYN-85M
26
+
27
+ SOVYN-85M์€ ์ฒ˜์Œ๋ถ€ํ„ฐ(From Scratch) ํ•™์Šต๋œ 85.4M ํŒŒ๋ผ๋ฏธํ„ฐ ๊ทœ๋ชจ์˜ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
28
+
29
+ ## Model Specifications
30
+
31
+
32
+ | Attribute | Value |
33
+ | :--- | :--- |
34
+ | **Parameters** | 85.4M |
35
+ | **Architecture** | GPT (Decoder-only) |
36
+ | **Layers / Heads** | 12 / 12 |
37
+ | **Embedding Dim** | 768 |
38
+ | **Context Length** | 512 |
39
+ | **Vocabulary Size** | 16,384 (BPE) |
40
+ | **Precision** | Float16 |
41
+ | **Attention** | Flash Attention (SDPA) |
42
+
43
+ ## Training Details
44
+
45
+ - **Dataset**: 591,261 ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (27.97M ํ† ํฐ)
46
+ - **Optimizer**: AdamW (lr=3e-4, weight_decay=0.1)
47
+ - **Schedule**: Cosine decay (Warmup: 500 steps)
48
+ - **Batch Size**: Effective 64 (16 x 4 grad_accum)
49
+ - **Total Steps**: 20,000
50
+ - **Hardware**: RTX 5080 16GB (Training time: ~4h)
51
+
52
+ ## Benchmarks
53
+
54
+ ์ž์ฒด ๋ฒค์น˜๋งˆํฌ(52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ) ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
55
+
56
+
57
+ | Category | Accuracy | Category | Accuracy |
58
+ | :--- | :--- | :--- | :--- |
59
+ | **Arithmetic** | 100% | **Number Property** | 100% |
60
+ | **Code Tracing** | 100% | **Word Problems** | 100% |
61
+ | **Precedence** | 88% | **List Operations** | 83% |
62
+ | **Equations** | 80% | **Logic** | 80% |
63
+ | **Parentheses** | 80% | **Series** | 33% |
64
+ | **Overall** | **86.5%** | | |
65
+
66
+ ## Usage
67
+
68
+ ### Dependencies
69
+ ```bash
70
+ pip install torch safetensors tokenizers huggingface_hub