SOVYN commited on
Commit
c151b29
ยท
verified ยท
1 Parent(s): 397a9b3

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +107 -0
  2. benchmark_result.json +59 -0
  3. config.json +16 -0
  4. pytorch_model.bin +3 -0
  5. tokenizer.json +0 -0
README.md CHANGED
@@ -1,3 +1,110 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ko
5
+ tags:
6
+ - reasoning
7
+ - math
8
+ - code
9
+ - from-scratch
10
+ - korean
11
+ - gpt
12
+ model-index:
13
+ - name: SOVYN-85M
14
+ results:
15
+ - task:
16
+ type: reasoning
17
+ name: Custom Reasoning Benchmark
18
+ metrics:
19
+ - type: accuracy
20
+ value: 86.5
21
+ name: Overall Accuracy
22
  ---
23
+
24
+ # SOVYN-85M
25
+
26
+ **85.4M ํŒŒ๋ผ๋ฏธํ„ฐ ํ•œ๊ตญ์–ด ์ถ”๋ก  ํŠนํ™” GPT ๋ชจ๋ธ**
27
+
28
+ ์™„์ „ํžˆ ์ฒ˜์Œ๋ถ€ํ„ฐ(from scratch) ํ•™์Šต๋œ ํ•œ๊ตญ์–ด ์ถ”๋ก  AI์ž…๋‹ˆ๋‹ค.
29
+ ์ˆ˜ํ•™, ์ฝ”๋”ฉ, ๋…ผ๋ฆฌ, ๊ณผํ•™ ๋“ฑ ๋‹ค์–‘ํ•œ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•ฉ๋‹ˆ๋‹ค.
30
+
31
+ ## ๋ชจ๋ธ ๊ตฌ์กฐ
32
+
33
+ | ํ•ญ๋ชฉ | ๊ฐ’ |
34
+ |------|-----|
35
+ | Architecture | GPT (Decoder-only Transformer) |
36
+ | Parameters | 85.4M |
37
+ | Layers | 12 |
38
+ | Heads | 12 |
39
+ | Embed Dim | 768 |
40
+ | Context Length | 512 |
41
+ | Vocab Size | 16,384 (BPE) |
42
+ | Attention | Flash Attention (SDPA) |
43
+
44
+ ## ํ•™์Šต ๋ฐ์ดํ„ฐ
45
+
46
+ - **591,261๊ฐœ** ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (119 ์นดํ…Œ๊ณ ๋ฆฌ)
47
+ - **27.97M ํ† ํฐ** (BPE, vocab 16,384)
48
+ - ์นดํ…Œ๊ณ ๋ฆฌ: ์ˆ˜ํ•™, ๋Œ€์ˆ˜, ๋ฏธ์ ๋ถ„, ๋ฌผ๋ฆฌ, ํ™”ํ•™, ์ƒ๋ฌผ, ์ง€๊ตฌ๊ณผํ•™, ํ•œ๊ตญ์‚ฌ, ์ฝ”๋”ฉ, ๋…ผ๋ฆฌ, ์˜์–ด, ํ•œ๊ตญ์–ด, ํ•จ์ˆ˜ ๋“ฑ
49
+
50
+ ## ํ•™์Šต ์„ค์ •
51
+
52
+ - Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
53
+ - Schedule: Cosine decay with warmup (500 steps)
54
+ - Batch: 16 ร— 4 grad_accum = effective 64
55
+ - Steps: 20,000
56
+ - Mixed Precision: AMP + GradScaler
57
+ - Hardware: NVIDIA RTX 5080 (16GB)
58
+
59
+ ## ๋ฒค์น˜๋งˆํฌ ๊ฒฐ๊ณผ
60
+
61
+ | ์นดํ…Œ๊ณ ๋ฆฌ | ์ •ํ™•๋„ |
62
+ |---------|--------|
63
+ | ์‚ฐ์ˆ _๊ธฐ๋ณธ | 100% |
64
+ | ์ฝ”๋“œ_ํŠธ๋ ˆ์ด์‹ฑ | 100% |
65
+ | ์ˆซ์ž_์„ฑ์งˆ | 100% |
66
+ | ์„œ์ˆ ํ˜• | 100% |
67
+ | ์—ฐ์‚ฐ_์šฐ์„ ์ˆœ์œ„ | 88% |
68
+ | ๋ฆฌ์ŠคํŠธ_์—ฐ์‚ฐ | 83% |
69
+ | ๊ด„ํ˜ธ_์—ฐ์‚ฐ | 80% |
70
+ | ๋ฐฉ์ •์‹ | 80% |
71
+ | ๋…ผ๋ฆฌ | 80% |
72
+ | ์ˆ˜์—ด | 33% |
73
+ | **์ „์ฒด** | **86.5% (A๋“ฑ๊ธ‰)** |
74
+
75
+ ## ์‚ฌ์šฉ๋ฒ•
76
+
77
+ ```python
78
+ import torch
79
+ from tokenizers import Tokenizer
80
+
81
+ # ๋ชจ๋ธ ๋กœ๋“œ (์ปค์Šคํ…€ ์•„ํ‚คํ…์ฒ˜ ํ•„์š”)
82
+ from train_125m import GPT125M, ModelConfig
83
+
84
+ cfg = ModelConfig()
85
+ model = GPT125M(cfg)
86
+ state_dict = torch.load("pytorch_model.bin", map_location="cpu")
87
+ model.load_state_dict(state_dict)
88
+ model.eval()
89
+
90
+ # ํ† ํฌ๋‚˜์ด์ €
91
+ tokenizer = Tokenizer.from_file("tokenizer.json")
92
+
93
+ # ์ถ”๋ก 
94
+ prompt = "๋ฌธ์ œ: 3x + 7 = 22์ผ ๋•Œ, x์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.\nํ’€์ด:\n"
95
+ input_ids = tokenizer.encode(prompt).ids
96
+ input_tensor = torch.tensor([input_ids])
97
+
98
+ with torch.no_grad():
99
+ output = model.generate(input_tensor, max_new_tokens=200)
100
+ result = tokenizer.decode(output[0].tolist())
101
+ print(result)
102
+ ```
103
+
104
+ ## ๋ผ์ด์„ ์Šค
105
+
106
+ Apache-2.0
107
+
108
+ ## ๋งŒ๋“  ์ด
109
+
110
+ SOVYN
benchmark_result.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_correct": 45,
3
+ "total_count": 52,
4
+ "overall_accuracy": 86.5,
5
+ "grade": "A (์šฐ์ˆ˜)",
6
+ "total_time": 6.4,
7
+ "categories": {
8
+ "์‚ฐ์ˆ _๊ธฐ๋ณธ": {
9
+ "correct": 5,
10
+ "total": 5,
11
+ "accuracy": 100.0
12
+ },
13
+ "์—ฐ์‚ฐ_์šฐ์„ ์ˆœ์œ„": {
14
+ "correct": 7,
15
+ "total": 8,
16
+ "accuracy": 87.5
17
+ },
18
+ "๊ด„ํ˜ธ_์—ฐ์‚ฐ": {
19
+ "correct": 4,
20
+ "total": 5,
21
+ "accuracy": 80.0
22
+ },
23
+ "๋ฐฉ์ •์‹": {
24
+ "correct": 4,
25
+ "total": 5,
26
+ "accuracy": 80.0
27
+ },
28
+ "๋ฆฌ์ŠคํŠธ_์—ฐ์‚ฐ": {
29
+ "correct": 5,
30
+ "total": 6,
31
+ "accuracy": 83.33333333333334
32
+ },
33
+ "์ฝ”๋“œ_ํŠธ๋ ˆ์ด์‹ฑ": {
34
+ "correct": 5,
35
+ "total": 5,
36
+ "accuracy": 100.0
37
+ },
38
+ "๋…ผ๋ฆฌ": {
39
+ "correct": 4,
40
+ "total": 5,
41
+ "accuracy": 80.0
42
+ },
43
+ "์ˆซ์ž_์„ฑ์งˆ": {
44
+ "correct": 5,
45
+ "total": 5,
46
+ "accuracy": 100.0
47
+ },
48
+ "์„œ์ˆ ํ˜•": {
49
+ "correct": 5,
50
+ "total": 5,
51
+ "accuracy": 100.0
52
+ },
53
+ "์ˆ˜์—ด": {
54
+ "correct": 1,
55
+ "total": 3,
56
+ "accuracy": 33.33333333333333
57
+ }
58
+ }
59
+ }
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "sovyn-gpt",
3
+ "architectures": [
4
+ "GPT125M"
5
+ ],
6
+ "vocab_size": 16384,
7
+ "context_length": 512,
8
+ "embed_dim": 768,
9
+ "num_heads": 12,
10
+ "num_layers": 12,
11
+ "dropout": 0.1,
12
+ "bias": false,
13
+ "parameters": "85.4M",
14
+ "training_steps": 10000,
15
+ "best_val_loss": 0.46251316606998444
16
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e0b1683b8c5a53e597782f223b28205b6f52d54e91cc8a5418450ad662fc23d
3
+ size 391833139
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff