Configuration Parsing Warning:Invalid JSON for config file config.json

🧠 HybridLM-85M (RNN + GRU + Transformer + Mamba + Routed Fusion)

Multi-Architecture Routed Language Model (Numeric Token LM)
Tek model içinde 4 farklı mimari + router ile çalışan deneysel hibrit dil modeli.


🚀 GENEL BAKIŞ

HybridLM, klasik tek mimari yerine:

  • RNN (temporal bias)
  • GRU (gated memory)
  • Transformer (global context)
  • Linear/Mamba-like (fallback projection)
  • Router

çıktılarını tek bir router ile birleştirir.

❗ Bu model klasik MoE değildir
Soft-routed hybrid fusion model


🧱 MODEL ÖZELLİKLERİ

  • 🔢 Token tipi: Numeric (0–91)
  • 📏 Sequence length: 64
  • 🧠 Parametre: ~85M
  • ⚙️ Device: CPU compatible
  • 🎛️ Routing: Softmax-based weighted fusion

🧠 MİMARİ AKIŞ

Input Tokens (B, T)
        │
        ▼
Embedding (V → D)
        │
        ├───────────────┬───────────────┬───────────────┬───────────────┐
        ▼               ▼               ▼               ▼
      RNN             GRU         Transformer        Linear
        │               │               │               │
        └───────┬───────┴───────┬───────┴───────┬───────┘
                ▼               ▼               ▼
             Last Hidden States (r, g, t, m)
                        │
                        ▼
               Context Mean Pooling
                        │
                        ▼
                    Router
              (Softmax weights)
                        │
                        ▼
     Weighted Fusion: w₁r + w₂g + w₃t + w₄m
                        │
                        ▼
                 Linear Head
                        │
                        ▼
                 Next Token Logits

ROUTER MEKANİZMASI

Router input:

ctx = torch.mean(x, dim=1)

Routing:

weights = softmax(router(ctx) / temperature)

Fusion:

out = w1*r + w2*g + w3*t + w4*m

 Router Özellikleri
Soft selection (hard routing yok)
Tüm mimariler katkı verir
Collapse riski düşük
Stabil training
🧠 MİMARİ ROLLERİ
Mimari	Rol
Transformer	Ana temsil gücü
GRU	Pattern + gating
RNN	Temporal bias
Linear	Stabilizasyon / fallback
Router	Dinamik ağırlıklandırma
⚙️ CONFIG
{
  "SEQ_LEN": 64,
  "VOCAB_SIZE": 92,
  "DIM": 640,
  "N_LAYERS": 12,
  "N_HEADS": 8,
  "FFN": 4096,
  "ROUTER_TEMP": 0.7
}
📊 PARAMETRE DAĞILIMI (~85M)
Bileşen	Param
Transformer	~83M
GRU	~2.4M
RNN	~0.8M
Linear	~0.4M
Embedding + Head	~0.1M
🧪 TRAINING
Loss: CrossEntropy
Target: next token (last position)
Optimizer: AdamW
Checkpoint: her 500 step
Dataset: numeric token stream
💬 INFERENCE
Autoregressive generation
Softmax sampling
Padding destekli (seq_len sabit)
⚠️ SINIRLAMALAR
❌ Text tokenizer yok
❌ Semantic understanding sınırlı
❌ Numeric dataset → anlam öğrenmez
❌ True MoE değil (soft fusion)
🔥 GÜÇLÜ YANLAR
✔ Multi-architecture learning
✔ Router-based dynamic fusion
✔ CPU compatible large model
✔ Experimental research design
  GELECEK GELİŞTİRMELER
Hard routing (top-1 expert)
Tokenizer entegrasyonu
Real dataset ile training
Expert specialization loss
KV-cache inference

HybridLM:

❗ klasik transformer değil
❗ klasik MoE değil
✔ yeni bir hibrit yaklaşım

hybrid-lm multi-architecture rnn gru transformer router experimental numeric-lm
 AUTHOR

BRSX Labs 
Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support