gyung commited on
Commit
4cee208
Β·
verified Β·
1 Parent(s): 963ad17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -26
README.md CHANGED
@@ -13,24 +13,26 @@ tags:
13
 
14
  # HybriKo: Korean Hybrid Language Model
15
 
16
- A Griffin-inspired hybrid architecture combining RNN and Attention mechanisms for Korean language modeling.
17
 
18
- ## Model Details
19
 
20
- - **Parameters**: 117.8M
21
- - **Architecture**: 2:1 RNN-to-Attention ratio (Griffin-inspired)
22
- - **Context Length**: 1024 tokens
23
- - **Vocab Size**: 32,000 (SentencePiece)
24
- - **Training Data**: Korean Wikipedia
25
 
26
- ## Training Results (Exp3)
27
 
28
  | Phase | Steps | Loss | PPL |
29
  |-------|-------|------|-----|
30
  | Phase 1 | 0-10K | 1.80 | ~6.0 |
31
  | Phase 2 | 10K-30K | 1.60 | ~4.95 |
32
 
33
- ## Architecture
 
 
34
 
35
  ```
36
  HybriKo (117.8M params)
@@ -38,54 +40,68 @@ HybriKo (117.8M params)
38
  β”œβ”€β”€ Layers (12x)
39
  β”‚ β”œβ”€β”€ Layer 1,2: GriffinBlock (RNN)
40
  β”‚ β”œβ”€β”€ Layer 3: AttentionBlock
41
- β”‚ └── (pattern repeats)
42
  └── LM Head (weight-tied)
43
  ```
44
 
45
- Key features:
46
  - **RGLRU**: Real-Gated Linear Recurrent Unit
47
  - **GQA**: Grouped Query Attention (1:4 KV reduction)
48
- - **Flash Attention 2**: Optimized attention computation
49
- - **GeGLU**: Gated activation in FFN
50
 
51
- ## Usage
52
 
53
  ```python
54
  import torch
55
  from hybridko.model import HybriKoModel, HybriKoConfig
56
  from hybridko.data import load_tokenizer
57
 
58
- # Load model
59
  config = HybriKoConfig.from_yaml("config.yaml")
60
  model = HybriKoModel(config)
61
  model.load_state_dict(torch.load("pytorch_model.pt"))
62
 
63
- # Load tokenizer
64
  tokenizer = load_tokenizer("HybriKo_tok.model")
65
 
66
- # Generate
67
  from hybridko.inference import generate_with_cache
68
  output = generate_with_cache(model, tokenizer, "ν•œκ΅­μ˜ μˆ˜λ„λŠ”", max_tokens=50)
69
  print(output)
70
  ```
71
 
72
- ## Files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
- - `pytorch_model.pt`: Model weights (450MB)
75
- - `config.yaml`: Model configuration
76
- - `HybriKo_tok.model`: SentencePiece tokenizer
77
- - `HybriKo_tok.vocab`: Tokenizer vocabulary
78
 
79
- ## Citation
80
 
81
  ```bibtex
82
- @misc{hybridko2024,
83
  title={HybriKo: Korean Hybrid Language Model},
84
- year={2024},
85
  url={https://huggingface.co/gyunggyung/HybriKo-117M}
86
  }
87
  ```
88
 
89
- ## License
90
 
91
  Apache 2.0
 
13
 
14
  # HybriKo: Korean Hybrid Language Model
15
 
16
+ RNNκ³Ό Attention λ©”μ»€λ‹ˆμ¦˜μ„ κ²°ν•©ν•œ Griffin μ•„ν‚€ν…μ²˜ 기반 ν•œκ΅­μ–΄ ν•˜μ΄λΈŒλ¦¬λ“œ μ–Έμ–΄ λͺ¨λΈμž…λ‹ˆλ‹€.
17
 
18
+ ## λͺ¨λΈ 상세
19
 
20
+ - **νŒŒλΌλ―Έν„°**: 117.8M
21
+ - **μ•„ν‚€ν…μ²˜**: 2:1 RNN-to-Attention λΉ„μœ¨ (Griffin 기반)
22
+ - **μ»¨ν…μŠ€νŠΈ 길이**: 1024 토큰
23
+ - **μ–΄νœ˜ 크기**: 32,000 (SentencePiece)
24
+ - **ν•™μŠ΅ 데이터**: ν•œκ΅­μ–΄ μœ„ν‚€ν”Όλ””μ•„
25
 
26
+ ## ν•™μŠ΅ κ²°κ³Ό (Exp3)
27
 
28
  | Phase | Steps | Loss | PPL |
29
  |-------|-------|------|-----|
30
  | Phase 1 | 0-10K | 1.80 | ~6.0 |
31
  | Phase 2 | 10K-30K | 1.60 | ~4.95 |
32
 
33
+ ## μ•„ν‚€ν…μ²˜
34
+
35
+ ![HybriKo Architecture](Architecture.png)
36
 
37
  ```
38
  HybriKo (117.8M params)
 
40
  β”œβ”€β”€ Layers (12x)
41
  β”‚ β”œβ”€β”€ Layer 1,2: GriffinBlock (RNN)
42
  β”‚ β”œβ”€β”€ Layer 3: AttentionBlock
43
+ β”‚ └── (νŒ¨ν„΄ 반볡)
44
  └── LM Head (weight-tied)
45
  ```
46
 
47
+ μ£Όμš” νŠΉμ§•:
48
  - **RGLRU**: Real-Gated Linear Recurrent Unit
49
  - **GQA**: Grouped Query Attention (1:4 KV reduction)
50
+ - **Flash Attention 2**: μ΅œμ ν™”λœ μ–΄ν…μ…˜ μ—°μ‚°
51
+ - **GeGLU**: FFN의 Gated activation
52
 
53
+ ## λΉ λ₯Έ μ‹œμž‘ (Google Colab)
54
 
55
  ```python
56
  import torch
57
  from hybridko.model import HybriKoModel, HybriKoConfig
58
  from hybridko.data import load_tokenizer
59
 
60
+ # λͺ¨λΈ λ‘œλ“œ
61
  config = HybriKoConfig.from_yaml("config.yaml")
62
  model = HybriKoModel(config)
63
  model.load_state_dict(torch.load("pytorch_model.pt"))
64
 
65
+ # ν† ν¬λ‚˜μ΄μ € λ‘œλ“œ
66
  tokenizer = load_tokenizer("HybriKo_tok.model")
67
 
68
+ # ν…μŠ€νŠΈ 생성
69
  from hybridko.inference import generate_with_cache
70
  output = generate_with_cache(model, tokenizer, "ν•œκ΅­μ˜ μˆ˜λ„λŠ”", max_tokens=50)
71
  print(output)
72
  ```
73
 
74
+ ### μ—¬λŸ¬ ν”„λ‘¬ν”„νŠΈ ν…ŒμŠ€νŠΈ
75
+
76
+ ```python
77
+ prompts = ["ν•œκ΅­μ–΄", "λŒ€ν•œλ―Όκ΅­", "μ„œμšΈ", "인곡지λŠ₯", "였늘 날씨가"]
78
+
79
+ for prompt in prompts:
80
+ input_ids = torch.tensor([[2] + sp.EncodeAsIds(prompt)]).to(device)
81
+ output = model.generate(input_ids, max_new_tokens=30, temperature=0.8, top_k=50)
82
+ generated = sp.DecodeIds(output[0].tolist())
83
+ print(f"πŸ“ {prompt}")
84
+ print(f" β†’ {generated}")
85
+ print("-" * 50)
86
+ ```
87
+
88
+ ## 파일 λͺ©λ‘
89
 
90
+ - `pytorch_model.pt`: λͺ¨λΈ κ°€μ€‘μΉ˜ (450MB)
91
+ - `config.yaml`: λͺ¨λΈ μ„€μ •
92
+ - `HybriKo_tok.model`: SentencePiece ν† ν¬λ‚˜μ΄μ €
93
+ - `HybriKo_tok.vocab`: ν† ν¬λ‚˜μ΄μ € μ–΄νœ˜
94
 
95
+ ## 인용
96
 
97
  ```bibtex
98
+ @misc{hybridko2026,
99
  title={HybriKo: Korean Hybrid Language Model},
100
+ year={2026},
101
  url={https://huggingface.co/gyunggyung/HybriKo-117M}
102
  }
103
  ```
104
 
105
+ ## λΌμ΄μ„ μŠ€
106
 
107
  Apache 2.0