Update README.md
Browse files
README.md
CHANGED
|
@@ -13,24 +13,26 @@ tags:
|
|
| 13 |
|
| 14 |
# HybriKo: Korean Hybrid Language Model
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
##
|
| 19 |
|
| 20 |
-
-
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
|
| 26 |
-
##
|
| 27 |
|
| 28 |
| Phase | Steps | Loss | PPL |
|
| 29 |
|-------|-------|------|-----|
|
| 30 |
| Phase 1 | 0-10K | 1.80 | ~6.0 |
|
| 31 |
| Phase 2 | 10K-30K | 1.60 | ~4.95 |
|
| 32 |
|
| 33 |
-
##
|
|
|
|
|
|
|
| 34 |
|
| 35 |
```
|
| 36 |
HybriKo (117.8M params)
|
|
@@ -38,54 +40,68 @@ HybriKo (117.8M params)
|
|
| 38 |
βββ Layers (12x)
|
| 39 |
β βββ Layer 1,2: GriffinBlock (RNN)
|
| 40 |
β βββ Layer 3: AttentionBlock
|
| 41 |
-
β βββ (
|
| 42 |
βββ LM Head (weight-tied)
|
| 43 |
```
|
| 44 |
|
| 45 |
-
|
| 46 |
- **RGLRU**: Real-Gated Linear Recurrent Unit
|
| 47 |
- **GQA**: Grouped Query Attention (1:4 KV reduction)
|
| 48 |
-
- **Flash Attention 2**:
|
| 49 |
-
- **GeGLU**: Gated activation
|
| 50 |
|
| 51 |
-
##
|
| 52 |
|
| 53 |
```python
|
| 54 |
import torch
|
| 55 |
from hybridko.model import HybriKoModel, HybriKoConfig
|
| 56 |
from hybridko.data import load_tokenizer
|
| 57 |
|
| 58 |
-
#
|
| 59 |
config = HybriKoConfig.from_yaml("config.yaml")
|
| 60 |
model = HybriKoModel(config)
|
| 61 |
model.load_state_dict(torch.load("pytorch_model.pt"))
|
| 62 |
|
| 63 |
-
#
|
| 64 |
tokenizer = load_tokenizer("HybriKo_tok.model")
|
| 65 |
|
| 66 |
-
#
|
| 67 |
from hybridko.inference import generate_with_cache
|
| 68 |
output = generate_with_cache(model, tokenizer, "νκ΅μ μλλ", max_tokens=50)
|
| 69 |
print(output)
|
| 70 |
```
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
- `pytorch_model.pt`:
|
| 75 |
-
- `config.yaml`:
|
| 76 |
-
- `HybriKo_tok.model`: SentencePiece
|
| 77 |
-
- `HybriKo_tok.vocab`:
|
| 78 |
|
| 79 |
-
##
|
| 80 |
|
| 81 |
```bibtex
|
| 82 |
-
@misc{
|
| 83 |
title={HybriKo: Korean Hybrid Language Model},
|
| 84 |
-
year={
|
| 85 |
url={https://huggingface.co/gyunggyung/HybriKo-117M}
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
| 89 |
-
##
|
| 90 |
|
| 91 |
Apache 2.0
|
|
|
|
| 13 |
|
| 14 |
# HybriKo: Korean Hybrid Language Model
|
| 15 |
|
| 16 |
+
RNNκ³Ό Attention λ©μ»€λμ¦μ κ²°ν©ν Griffin μν€ν
μ² κΈ°λ° νκ΅μ΄ νμ΄λΈλ¦¬λ μΈμ΄ λͺ¨λΈμ
λλ€.
|
| 17 |
|
| 18 |
+
## λͺ¨λΈ μμΈ
|
| 19 |
|
| 20 |
+
- **νλΌλ―Έν°**: 117.8M
|
| 21 |
+
- **μν€ν
μ²**: 2:1 RNN-to-Attention λΉμ¨ (Griffin κΈ°λ°)
|
| 22 |
+
- **컨ν
μ€νΈ κΈΈμ΄**: 1024 ν ν°
|
| 23 |
+
- **μ΄ν ν¬κΈ°**: 32,000 (SentencePiece)
|
| 24 |
+
- **νμ΅ λ°μ΄ν°**: νκ΅μ΄ μν€νΌλμ
|
| 25 |
|
| 26 |
+
## νμ΅ κ²°κ³Ό (Exp3)
|
| 27 |
|
| 28 |
| Phase | Steps | Loss | PPL |
|
| 29 |
|-------|-------|------|-----|
|
| 30 |
| Phase 1 | 0-10K | 1.80 | ~6.0 |
|
| 31 |
| Phase 2 | 10K-30K | 1.60 | ~4.95 |
|
| 32 |
|
| 33 |
+
## μν€ν
μ²
|
| 34 |
+
|
| 35 |
+

|
| 36 |
|
| 37 |
```
|
| 38 |
HybriKo (117.8M params)
|
|
|
|
| 40 |
βββ Layers (12x)
|
| 41 |
β βββ Layer 1,2: GriffinBlock (RNN)
|
| 42 |
β βββ Layer 3: AttentionBlock
|
| 43 |
+
β βββ (ν¨ν΄ λ°λ³΅)
|
| 44 |
βββ LM Head (weight-tied)
|
| 45 |
```
|
| 46 |
|
| 47 |
+
μ£Όμ νΉμ§:
|
| 48 |
- **RGLRU**: Real-Gated Linear Recurrent Unit
|
| 49 |
- **GQA**: Grouped Query Attention (1:4 KV reduction)
|
| 50 |
+
- **Flash Attention 2**: μ΅μ νλ μ΄ν
μ
μ°μ°
|
| 51 |
+
- **GeGLU**: FFNμ Gated activation
|
| 52 |
|
| 53 |
+
## λΉ λ₯Έ μμ (Google Colab)
|
| 54 |
|
| 55 |
```python
|
| 56 |
import torch
|
| 57 |
from hybridko.model import HybriKoModel, HybriKoConfig
|
| 58 |
from hybridko.data import load_tokenizer
|
| 59 |
|
| 60 |
+
# λͺ¨λΈ λ‘λ
|
| 61 |
config = HybriKoConfig.from_yaml("config.yaml")
|
| 62 |
model = HybriKoModel(config)
|
| 63 |
model.load_state_dict(torch.load("pytorch_model.pt"))
|
| 64 |
|
| 65 |
+
# ν ν¬λμ΄μ λ‘λ
|
| 66 |
tokenizer = load_tokenizer("HybriKo_tok.model")
|
| 67 |
|
| 68 |
+
# ν
μ€νΈ μμ±
|
| 69 |
from hybridko.inference import generate_with_cache
|
| 70 |
output = generate_with_cache(model, tokenizer, "νκ΅μ μλλ", max_tokens=50)
|
| 71 |
print(output)
|
| 72 |
```
|
| 73 |
|
| 74 |
+
### μ¬λ¬ ν둬ννΈ ν
μ€νΈ
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
prompts = ["νκ΅μ΄", "λνλ―Όκ΅", "μμΈ", "μΈκ³΅μ§λ₯", "μ€λ λ μ¨κ°"]
|
| 78 |
+
|
| 79 |
+
for prompt in prompts:
|
| 80 |
+
input_ids = torch.tensor([[2] + sp.EncodeAsIds(prompt)]).to(device)
|
| 81 |
+
output = model.generate(input_ids, max_new_tokens=30, temperature=0.8, top_k=50)
|
| 82 |
+
generated = sp.DecodeIds(output[0].tolist())
|
| 83 |
+
print(f"π {prompt}")
|
| 84 |
+
print(f" β {generated}")
|
| 85 |
+
print("-" * 50)
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## νμΌ λͺ©λ‘
|
| 89 |
|
| 90 |
+
- `pytorch_model.pt`: λͺ¨λΈ κ°μ€μΉ (450MB)
|
| 91 |
+
- `config.yaml`: λͺ¨λΈ μ€μ
|
| 92 |
+
- `HybriKo_tok.model`: SentencePiece ν ν¬λμ΄μ
|
| 93 |
+
- `HybriKo_tok.vocab`: ν ν¬λμ΄μ μ΄ν
|
| 94 |
|
| 95 |
+
## μΈμ©
|
| 96 |
|
| 97 |
```bibtex
|
| 98 |
+
@misc{hybridko2026,
|
| 99 |
title={HybriKo: Korean Hybrid Language Model},
|
| 100 |
+
year={2026},
|
| 101 |
url={https://huggingface.co/gyunggyung/HybriKo-117M}
|
| 102 |
}
|
| 103 |
```
|
| 104 |
|
| 105 |
+
## λΌμ΄μ μ€
|
| 106 |
|
| 107 |
Apache 2.0
|