🌀 GPT-OSS Adamba: Hybrid MoE + Mamba

21.9B parameters | 32 experts | Mamba-enhanced reasoning backbone

Available Checkpoints

Variant	Parameters	Dim	Features	Status	Download
gptoss_phase1	21.9B	2880	mamba_integration, moe_32experts	✅	Download
gptoss_phase2	21.9B	2880	matryoshka, early_exit, moe_32experts	⏳	—
gptoss_phase3	30B+	4096	matryoshka, early_exit, moe_32experts, expansion	⏳	—
gptoss_sft	21.9B	2880	matryoshka, moe_32experts, sft	⏳	—

Architecture

Built on OpenAI GPT-OSS 20B with Mamba integration:

Component	Spec
Base Model	GPT-OSS 20B MoE
Hidden Dim	2880
Attention	24 layers (sliding + full alternating)
Mamba	12 layers (interleaved 2:1)
MoE	32 experts, top-4 routing
Vocab	201,088 tokens
Total Blocks	36 (24 Attn + 12 Mamba)

┌─────────────────────────────────────────────────────┐
│  GPT-OSS 20B (Attention + MoE)                       │
│       ↓ Surgery (inject 12 Mamba layers)             │
│  Hybrid: A-A-M-A-A-M-... pattern                     │
│       ↓ Phase 1 (train Mamba only)                   │
│  Mamba learns to "speak GPT-OSS language"            │
│       ↓ Phase 2 (enable Matryoshka)                  │
│  Adaptive compute: 128 → 2880 dim per layer          │
└─────────────────────────────────────────────────────┘

Training Status

Phase 1: Mamba integration (freeze Attention+MoE, train Mamba)

Usage

# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba

License

Apache 2.0 (same as GPT-OSS)

Downloads last month: -; Downloads are not tracked for this model. How to track