Yaongi
/

HybriKo-117M

@@ -1,5 +1,5 @@
 ---
-license: mit
 language:
 - ko
 tags:
@@ -9,164 +9,83 @@ tags:
 - attention
 - griffin
 - language-model
-datasets:
-- wikipedia
-pipeline_tag: text-generation
 ---
-# HybriKo-117M
-Griffin 아키텍처에서 영감을 받은 한국어 하이브리드 언어 모델입니다.
-## 모델 설명
-HybriKo는 Google의 Griffin 아키텍처에서 영감을 받아, RNN(Real-Gated Linear Recurrent Unit)과 Attention 메커니즘을 2:1 비율로 결합한 하이브리드 언어 모델입니다.
-### 아키텍처
-- **타입**: Hybrid RNN-Attention (2:1 비율)
-- **파라미터**: 117.8M
-- **Hidden Dimension**: 768
-- **레이어**: 12개 (8 RNN + 4 Attention)
-- **Attention Heads**: 12개 (GQA, 3 KV heads)
-- **Vocabulary**: 32,000 (SentencePiece Unigram)
-- **최대 시퀀스 길이**: 512
-### 레이어 패턴
-```
-Layer 1-2: GriffinBlock (RGLRU 기반 RNN)
-Layer 3: AttentionBlock (RoPE 적용 Multi-Head Attention)
-Layer 4-5: GriffinBlock
-Layer 6: AttentionBlock
-... (패턴 반복)
-```
-### 아키텍처 다이어그램
-![HybriKo Architecture](Architecture.png)
-## 학습
-- **데이터셋**: 한국어 위키피디아 (~509K 문서)
-- **학습 스텝**: 1,000
-- **배치 사이즈**: 16
-- **Learning Rate**: 3e-4
-- **하드웨어**: A100
-### 학습 결과
-| 지표 | 값 |
-|------|-----|
-| 초기 Loss | 10.27 |
-| 최종 Loss | 3.65 |
-| 평균 Loss | 3.97 |
-## 빠른 시작 (Google Colab)
-아래 코드를 복사해서 바로 실행할 수 있습니다:
 ```python
-# 의존성 설치
-!pip install transformers sentencepiece -q
 import torch
-import sentencepiece as spm
-from huggingface_hub import hf_hub_download, list_repo_files
-# GPU 자동 감지
-device = "cuda" if torch.cuda.is_available() else "cpu"
-print(f"🖥️ 사용 디바이스: {device}")
-# 모델 코드 다운로드
-config_path = hf_hub_download("Yaongi/HybriKo-117M", "configuration_hybridko.py")
-model_path = hf_hub_download("Yaongi/HybriKo-117M", "modeling_hybridko.py")
-import sys, os
-sys.path.insert(0, os.path.dirname(config_path))
-from configuration_hybridko import HybriKoConfig
-from modeling_hybridko import HybriKoModel
-# 1. 모델 생성
-config = HybriKoConfig()
 model = HybriKoModel(config)
-# 2. 최신 체크포인트 자동 탐지 및 로드
-files = list_repo_files("Yaongi/HybriKo-117M")
-checkpoints = sorted([f for f in files if f.startswith("checkpoint_step_") and f.endswith(".pt")])
-latest_checkpoint = checkpoints[-1] if checkpoints else "checkpoint_step_1000.pt"
-print(f"📦 체크포인트 로드: {latest_checkpoint}")
-checkpoint_path = hf_hub_download("Yaongi/HybriKo-117M", latest_checkpoint)
-checkpoint = torch.load(checkpoint_path, map_location=device)
-model.load_state_dict(checkpoint["model_state_dict"])
-model = model.to(device)
-model.eval()
-# 3. 토크나이저 로드
-tokenizer_path = hf_hub_download("Yaongi/HybriKo-117M", "HybriKo_tok.model")
-sp = spm.SentencePieceProcessor()
-sp.Load(tokenizer_path)
-# 4. 텍스트 생성
-prompt = "한국의 수도는"
-input_ids = torch.tensor([[2] + sp.EncodeAsIds(prompt)]).to(device)
-output = model.generate(input_ids, max_new_tokens=50, temperature=0.8)
-print(sp.DecodeIds(output[0].tolist()))
-```
-## 여러 프롬프트 테스트
-```python
-prompts = ["한국어", "대한민국", "서울", "인공지능", "오늘 날씨가"]
-for prompt in prompts:
-    input_ids = torch.tensor([[2] + sp.EncodeAsIds(prompt)]).to(device)  # 👈 .to(device) 추가
-    output = model.generate(input_ids, max_new_tokens=30, temperature=0.8, top_k=50)
-    generated = sp.DecodeIds(output[0].tolist())
-    print(f"📝 {prompt}")
-    print(f"   → {generated}")
-    print("-" * 50)
 ```
-## 생성 파라미터
-| 파라미터 | 설명 | 권장 값 |
-|----------|------|---------|
-| `temperature` | 랜덤성 (낮을수록 결정적) | 0.7 - 1.0 |
-| `top_k` | 상위 K개 토큰만 샘플링 | 50 |
-| `top_p` | Nucleus 샘플링 임계값 | 0.9 |
-| `max_new_tokens` | 생성할 토큰 수 | 30 - 100 |
-## 제한 사항
-⚠️ **이 모델은 1,000 스텝만 학습된 개념 증명(Proof-of-Concept) 모델입니다.**
-생성된 텍스트는 초기 학습 패턴(숫자/연도 많이 출력)을 보입니다. 더 나은 품질을 위해:
-- 10,000+ 스텝 학습
-- 1B+ 파라미터로 스케일업
-- 특정 태스크에 파인튜닝
-## 레포지토리 파일
-| 파일 | 설명 |
-|------|------|
-| `checkpoint_step_*.pt` | 모델 가중치 |
-| `HybriKo_tok.model` | SentencePiece 토크나이저 |
-| `HybriKo_tok.vocab` | 토크나이저 어휘 |
-| `config.json` | HuggingFace 설정 |
-| `configuration_hybridko.py` | Config 클래스 |
-| `modeling_hybridko.py` | 모델 아키텍처 |
-## 인용
 ```bibtex
-@misc{hybridko2026,
-  title={HybriKo: 한국어 하이브리드 언어 모델},
-  author={Yaongi Team},
-  year={2026},
-  url={https://huggingface.co/Yaongi/HybriKo-117M}
 }
 ```
-## 라이선스
-MIT License

 ---
+license: apache-2.0
 language:
 - ko
 tags:
 - attention
 - griffin
 - language-model
 ---
+# HybriKo: Korean Hybrid Language Model
+A Griffin-inspired hybrid architecture combining RNN and Attention mechanisms for Korean language modeling.
+## Model Details
+- **Parameters**: 117.8M
+- **Architecture**: 2:1 RNN-to-Attention ratio (Griffin-inspired)
+- **Context Length**: 1024 tokens
+- **Vocab Size**: 32,000 (SentencePiece)
+- **Training Data**: Korean Wikipedia
+## Training Results (Exp3)
+| Phase | Steps | Loss | PPL |
+|-------|-------|------|-----|
+| Phase 1 | 0-10K | 1.80 | ~6.0 |
+| Phase 2 | 10K-30K | 1.60 | ~4.95 |
+## Architecture
+```
+HybriKo (117.8M params)
+├── Embedding (32000 → 768)
+├── Layers (12x)
+│   ├── Layer 1,2: GriffinBlock (RNN)
+│   ├── Layer 3: AttentionBlock
+│   └── (pattern repeats)
+└── LM Head (weight-tied)
+```
+Key features:
+- **RGLRU**: Real-Gated Linear Recurrent Unit
+- **GQA**: Grouped Query Attention (1:4 KV reduction)
+- **Flash Attention 2**: Optimized attention computation
+- **GeGLU**: Gated activation in FFN
+## Usage
 ```python
 import torch
+from hybridko.model import HybriKoModel, HybriKoConfig
+from hybridko.data import load_tokenizer
+# Load model
+config = HybriKoConfig.from_yaml("config.yaml")
 model = HybriKoModel(config)
+model.load_state_dict(torch.load("pytorch_model.pt"))
+# Load tokenizer
+tokenizer = load_tokenizer("HybriKo_tok.model")
+# Generate
+from hybridko.inference import generate_with_cache
+output = generate_with_cache(model, tokenizer, "한국의 수도는", max_tokens=50)
+print(output)
 ```
+## Files
+- `pytorch_model.pt`: Model weights (450MB)
+- `config.yaml`: Model configuration
+- `HybriKo_tok.model`: SentencePiece tokenizer
+- `HybriKo_tok.vocab`: Tokenizer vocabulary
+## Citation
 ```bibtex
+@misc{hybridko2024,
+  title={HybriKo: Korean Hybrid Language Model},
+  year={2024},
+  url={https://huggingface.co/gyunggyung/HybriKo-117M}
 }
 ```
+## License
+Apache 2.0

pytorch_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e02a596f9dc1a993cd1aa0a65022a5d4ec95409620be3c43f2829432e93b5077
+size 471349067