Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +107 -0
benchmark_result.json +59 -0
config.json +16 -0
pytorch_model.bin +3 -0
tokenizer.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,110 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - ko
+tags:
+  - reasoning
+  - math
+  - code
+  - from-scratch
+  - korean
+  - gpt
+model-index:
+  - name: SOVYN-85M
+    results:
+      - task:
+          type: reasoning
+          name: Custom Reasoning Benchmark
+        metrics:
+          - type: accuracy
+            value: 86.5
+            name: Overall Accuracy
 ---
+# SOVYN-85M
+**85.4M 파라미터 한국어 추론 특화 GPT 모델**
+완전히 처음부터(from scratch) 학습된 한국어 추론 AI입니다.
+수학, 코딩, 논리, 과학 등 다양한 추론 문제를 단계별로 풀이합니다.
+## 모델 구조
+| 항목 | 값 |
+|------|-----|
+| Architecture | GPT (Decoder-only Transformer) |
+| Parameters | 85.4M |
+| Layers | 12 |
+| Heads | 12 |
+| Embed Dim | 768 |
+| Context Length | 512 |
+| Vocab Size | 16,384 (BPE) |
+| Attention | Flash Attention (SDPA) |
+## 학습 데이터
+- **591,261개** 합성 추론 문제 (119 카테고리)
+- **27.97M 토큰** (BPE, vocab 16,384)
+- 카테고리: 수학, 대수, 미적분, 물리, 화학, 생물, 지구과학, 한국사, 코딩, 논리, 영어, 한국어, 함수 등
+## 학습 설정
+- Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
+- Schedule: Cosine decay with warmup (500 steps)
+- Batch: 16 × 4 grad_accum = effective 64
+- Steps: 20,000
+- Mixed Precision: AMP + GradScaler
+- Hardware: NVIDIA RTX 5080 (16GB)
+## 벤치마크 결과
+| 카테고리 | 정확도 |
+|---------|--------|
+| 산술_기본 | 100% |
+| 코드_트레이싱 | 100% |
+| 숫자_성질 | 100% |
+| 서술형 | 100% |
+| 연산_우선순위 | 88% |
+| 리스트_연산 | 83% |
+| 괄호_연산 | 80% |
+| 방정식 | 80% |
+| 논리 | 80% |
+| 수열 | 33% |
+| **전체** | **86.5% (A등급)** |
+## 사용법
+```python
+import torch
+from tokenizers import Tokenizer
+# 모델 로드 (커스텀 아키텍처 필요)
+from train_125m import GPT125M, ModelConfig
+cfg = ModelConfig()
+model = GPT125M(cfg)
+state_dict = torch.load("pytorch_model.bin", map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+# 토크나이저
+tokenizer = Tokenizer.from_file("tokenizer.json")
+# 추론
+prompt = "문제: 3x + 7 = 22일 때, x의 값을 구하시오.\n풀이:\n"
+input_ids = tokenizer.encode(prompt).ids
+input_tensor = torch.tensor([input_ids])
+with torch.no_grad():
+    output = model.generate(input_tensor, max_new_tokens=200)
+    result = tokenizer.decode(output[0].tolist())
+    print(result)
+```
+## 라이선스
+Apache-2.0
+## 만든 이
+SOVYN

benchmark_result.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "total_correct": 45,
+  "total_count": 52,
+  "overall_accuracy": 86.5,
+  "grade": "A (우수)",
+  "total_time": 6.4,
+  "categories": {
+    "산술_기본": {
+      "correct": 5,
+      "total": 5,
+      "accuracy": 100.0
+    },
+    "연산_우선순위": {
+      "correct": 7,
+      "total": 8,
+      "accuracy": 87.5
+    },
+    "괄호_연산": {
+      "correct": 4,
+      "total": 5,
+      "accuracy": 80.0
+    },
+    "방정식": {
+      "correct": 4,
+      "total": 5,
+      "accuracy": 80.0
+    },
+    "리스트_연산": {
+      "correct": 5,
+      "total": 6,
+      "accuracy": 83.33333333333334
+    },
+    "코드_트레이싱": {
+      "correct": 5,
+      "total": 5,
+      "accuracy": 100.0
+    },
+    "논리": {
+      "correct": 4,
+      "total": 5,
+      "accuracy": 80.0
+    },
+    "숫자_성질": {
+      "correct": 5,
+      "total": 5,
+      "accuracy": 100.0
+    },
+    "서술형": {
+      "correct": 5,
+      "total": 5,
+      "accuracy": 100.0
+    },
+    "수열": {
+      "correct": 1,
+      "total": 3,
+      "accuracy": 33.33333333333333
+    }
+  }
+}

config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "model_type": "sovyn-gpt",
+  "architectures": [
+    "GPT125M"
+  ],
+  "vocab_size": 16384,
+  "context_length": 512,
+  "embed_dim": 768,
+  "num_heads": 12,
+  "num_layers": 12,
+  "dropout": 0.1,
+  "bias": false,
+  "parameters": "85.4M",
+  "training_steps": 10000,
+  "best_val_loss": 0.46251316606998444
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e0b1683b8c5a53e597782f223b28205b6f52d54e91cc8a5418450ad662fc23d
+size 391833139

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff