SOVYN commited on
Commit
5c6711b
ยท
verified ยท
1 Parent(s): 8d221f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -32
README.md CHANGED
@@ -24,47 +24,106 @@ model-index:
24
 
25
  # SOVYN-85M
26
 
27
- SOVYN-85M์€ ์ฒ˜์Œ๋ถ€ํ„ฐ(From Scratch) ํ•™์Šต๋œ 85.4M ํŒŒ๋ผ๋ฏธํ„ฐ ๊ทœ๋ชจ์˜ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
28
 
29
- ## Model Specifications
30
 
 
31
 
32
- | Attribute | Value |
33
- | :--- | :--- |
34
- | **Parameters** | 85.4M |
35
- | **Architecture** | GPT (Decoder-only) |
36
- | **Layers / Heads** | 12 / 12 |
37
- | **Embedding Dim** | 768 |
38
- | **Context Length** | 512 |
39
- | **Vocabulary Size** | 16,384 (BPE) |
40
- | **Precision** | Float16 |
41
- | **Attention** | Flash Attention (SDPA) |
 
42
 
43
- ## Training Details
44
 
45
- - **Dataset**: 591,261 ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (27.97M ํ† ํฐ)
46
- - **Optimizer**: AdamW (lr=3e-4, weight_decay=0.1)
47
- - **Schedule**: Cosine decay (Warmup: 500 steps)
48
- - **Batch Size**: Effective 64 (16 x 4 grad_accum)
49
- - **Total Steps**: 20,000
50
- - **Hardware**: RTX 5080 16GB (Training time: ~4h)
 
51
 
52
- ## Benchmarks
53
 
54
- ์ž์ฒด ๋ฒค์น˜๋งˆํฌ(52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ) ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- | Category | Accuracy | Category | Accuracy |
58
- | :--- | :--- | :--- | :--- |
59
- | **Arithmetic** | 100% | **Number Property** | 100% |
60
- | **Code Tracing** | 100% | **Word Problems** | 100% |
61
- | **Precedence** | 88% | **List Operations** | 83% |
62
- | **Equations** | 80% | **Logic** | 80% |
63
- | **Parentheses** | 80% | **Series** | 33% |
64
- | **Overall** | **86.5%** | | |
65
 
66
- ## Usage
67
-
68
- ### Dependencies
69
  ```bash
70
  pip install torch safetensors tokenizers huggingface_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  # SOVYN-85M
26
 
27
+ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•œ 85M ํŒŒ๋ผ๋ฏธํ„ฐ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ.
28
 
29
+ ์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํ™”ํ•™, ์ƒ๋ฌผ, ์ง€๊ตฌ๊ณผํ•™, ํ•œ๊ตญ์‚ฌ, ๋ฏธ์ ๋ถ„ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•œ๋‹ค.
30
 
31
+ ## ์ŠคํŽ™
32
 
33
+ | | |
34
+ |---|---|
35
+ | ํŒŒ๋ผ๋ฏธํ„ฐ | 85.4M |
36
+ | ์•„ํ‚คํ…์ฒ˜ | GPT (Decoder-only) |
37
+ | ๋ ˆ์ด์–ด | 12 |
38
+ | ์–ดํ…์…˜ ํ—ค๋“œ | 12 |
39
+ | ์ž„๋ฒ ๋”ฉ ์ฐจ์› | 768 |
40
+ | ์ปจํ…์ŠคํŠธ ๊ธธ์ด | 512 |
41
+ | ์–ดํœ˜ ํฌ๊ธฐ | 16,384 (BPE) |
42
+ | ์–ดํ…์…˜ | Flash Attention (SDPA) |
43
+ | ์ •๋ฐ€๋„ | float16 |
44
 
45
+ ## ํ•™์Šต
46
 
47
+ - ๋ฐ์ดํ„ฐ: 591,261๊ฐœ ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (119 ์นดํ…Œ๊ณ ๋ฆฌ), 27.97M ํ† ํฐ
48
+ - ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW (lr=3e-4, weight_decay=0.1)
49
+ - ์Šค์ผ€์ค„: Cosine decay + warmup 500 steps
50
+ - ๋ฐฐ์น˜: 16 x 4 grad_accum = effective 64
51
+ - ์Šคํ…: 20,000
52
+ - GPU: RTX 5080 16GB
53
+ - ํ•™์Šต ์‹œ๊ฐ„: ~4์‹œ๊ฐ„
54
 
55
+ ## ๋ฒค์น˜๋งˆํฌ
56
 
57
+ ์ž์ฒด ๋ฒค์น˜๋งˆํฌ 52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ.
58
 
59
+ | ์นดํ…Œ๊ณ ๋ฆฌ | ์ •ํ™•๋„ |
60
+ |---------|--------|
61
+ | ์‚ฐ์ˆ  | 100% |
62
+ | ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ | 100% |
63
+ | ์ˆซ์ž ์„ฑ์งˆ | 100% |
64
+ | ์„œ์ˆ ํ˜• | 100% |
65
+ | ์—ฐ์‚ฐ ์šฐ์„ ์ˆœ์œ„ | 88% |
66
+ | ๋ฆฌ์ŠคํŠธ ์—ฐ์‚ฐ | 83% |
67
+ | ๊ด„ํ˜ธ ์—ฐ์‚ฐ | 80% |
68
+ | ๋ฐฉ์ •์‹ | 80% |
69
+ | ๋…ผ๋ฆฌ | 80% |
70
+ | ์ˆ˜์—ด | 33% |
71
+ | **์ „์ฒด** | **86.5%** |
72
 
73
+ ## ์‚ฌ์šฉ๋ฒ•
 
 
 
 
 
 
 
74
 
 
 
 
75
  ```bash
76
  pip install torch safetensors tokenizers huggingface_hub
77
+ ```
78
+
79
+ ```python
80
+ import torch
81
+ from safetensors.torch import load_file
82
+ from tokenizers import Tokenizer
83
+ from huggingface_hub import hf_hub_download
84
+
85
+ # ๋‹ค์šด๋กœ๋“œ
86
+ model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
87
+ tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
88
+ code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")
89
+
90
+ # ์•„ํ‚คํ…์ฒ˜ ๋กœ๋“œ
91
+ import importlib.util
92
+ spec = importlib.util.spec_from_file_location("model", code_path)
93
+ mod = importlib.util.module_from_spec(spec)
94
+ spec.loader.exec_module(mod)
95
+
96
+ # ๋ชจ๋ธ ๋กœ๋“œ
97
+ model = mod.SOVYN85M()
98
+ state_dict = load_file(model_path)
99
+ state_dict = {k: v.float() for k, v in state_dict.items()}
100
+ model.load_state_dict(state_dict)
101
+ model.eval()
102
+
103
+ tokenizer = Tokenizer.from_file(tok_path)
104
+
105
+ # ์ถ”๋ก 
106
+ prompt = "๋ฌธ์ œ: 3x + 7 = 22์ผ ๋•Œ, x์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.\nํ’€์ด:\n"
107
+ ids = torch.tensor([tokenizer.encode(prompt).ids])
108
+ out = model.generate(ids, max_new_tokens=200, temperature=0.3)
109
+ print(tokenizer.decode(out[0].tolist()))
110
+ ```
111
+
112
+ ## ํ”„๋กฌํ”„ํŠธ ํ˜•์‹
113
+
114
+ ```
115
+ ๋ฌธ์ œ: {๋‚ด์šฉ}
116
+ ํ’€์ด:
117
+ ```
118
+
119
+ "ํ’€์ด:" ์ดํ›„๋ฅผ ์ƒ์„ฑ. ๋‹จ๊ณ„๋ณ„ ํ’€์ด + "๋‹ต: {์ •๋‹ต}" ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ.
120
+
121
+ ## ์ œํ•œ์‚ฌํ•ญ
122
+
123
+ - ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต. ์ž์œ  ๋Œ€ํ™” ๋ถˆ๊ฐ€.
124
+ - ์ˆ˜์—ด(๋“ฑ๋น„/ํ”ผ๋ณด๋‚˜์น˜) ์•ฝํ•จ.
125
+ - ์ปจํ…์ŠคํŠธ 512 ํ† ํฐ ์ œํ•œ.
126
+
127
+ ## ๋ผ์ด์„ ์Šค
128
+
129
+ Apache-2.0