Configuration Parsing Warning:Invalid JSON for config file config.json
🧠 HybridLM-85M (RNN + GRU + Transformer + Mamba + Routed Fusion)
Multi-Architecture Routed Language Model (Numeric Token LM)
Tek model içinde 4 farklı mimari + router ile çalışan deneysel hibrit dil modeli.
🚀 GENEL BAKIŞ
HybridLM, klasik tek mimari yerine:
- RNN (temporal bias)
- GRU (gated memory)
- Transformer (global context)
- Linear/Mamba-like (fallback projection)
- Router
çıktılarını tek bir router ile birleştirir.
❗ Bu model klasik MoE değildir
✔ Soft-routed hybrid fusion model
🧱 MODEL ÖZELLİKLERİ
- 🔢 Token tipi: Numeric (0–91)
- 📏 Sequence length: 64
- 🧠 Parametre: ~85M
- ⚙️ Device: CPU compatible
- 🎛️ Routing: Softmax-based weighted fusion
🧠 MİMARİ AKIŞ
Input Tokens (B, T)
│
▼
Embedding (V → D)
│
├───────────────┬───────────────┬───────────────┬───────────────┐
▼ ▼ ▼ ▼
RNN GRU Transformer Linear
│ │ │ │
└───────┬───────┴───────┬───────┴───────┬───────┘
▼ ▼ ▼
Last Hidden States (r, g, t, m)
│
▼
Context Mean Pooling
│
▼
Router
(Softmax weights)
│
▼
Weighted Fusion: w₁r + w₂g + w₃t + w₄m
│
▼
Linear Head
│
▼
Next Token Logits
ROUTER MEKANİZMASI
Router input:
ctx = torch.mean(x, dim=1)
Routing:
weights = softmax(router(ctx) / temperature)
Fusion:
out = w1*r + w2*g + w3*t + w4*m
Router Özellikleri
Soft selection (hard routing yok)
Tüm mimariler katkı verir
Collapse riski düşük
Stabil training
🧠 MİMARİ ROLLERİ
Mimari Rol
Transformer Ana temsil gücü
GRU Pattern + gating
RNN Temporal bias
Linear Stabilizasyon / fallback
Router Dinamik ağırlıklandırma
⚙️ CONFIG
{
"SEQ_LEN": 64,
"VOCAB_SIZE": 92,
"DIM": 640,
"N_LAYERS": 12,
"N_HEADS": 8,
"FFN": 4096,
"ROUTER_TEMP": 0.7
}
📊 PARAMETRE DAĞILIMI (~85M)
Bileşen Param
Transformer ~83M
GRU ~2.4M
RNN ~0.8M
Linear ~0.4M
Embedding + Head ~0.1M
🧪 TRAINING
Loss: CrossEntropy
Target: next token (last position)
Optimizer: AdamW
Checkpoint: her 500 step
Dataset: numeric token stream
💬 INFERENCE
Autoregressive generation
Softmax sampling
Padding destekli (seq_len sabit)
⚠️ SINIRLAMALAR
❌ Text tokenizer yok
❌ Semantic understanding sınırlı
❌ Numeric dataset → anlam öğrenmez
❌ True MoE değil (soft fusion)
🔥 GÜÇLÜ YANLAR
✔ Multi-architecture learning
✔ Router-based dynamic fusion
✔ CPU compatible large model
✔ Experimental research design
GELECEK GELİŞTİRMELER
Hard routing (top-1 expert)
Tokenizer entegrasyonu
Real dataset ile training
Expert specialization loss
KV-cache inference
HybridLM:
❗ klasik transformer değil
❗ klasik MoE değil
✔ yeni bir hibrit yaklaşım
hybrid-lm multi-architecture rnn gru transformer router experimental numeric-lm
AUTHOR
BRSX Labs
- Downloads last month
- 46