π GPT-OSS Adamba: Hybrid MoE + Mamba
21.9B parameters | 32 experts | Mamba-enhanced reasoning backbone
π GitHub | π€ Original Adamba
Available Checkpoints
| Variant | Parameters | Dim | Features | Status | Download |
|---|---|---|---|---|---|
| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | β | Download |
| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | β³ | β |
| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | β³ | β |
| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | β³ | β |
Architecture
Built on OpenAI GPT-OSS 20B with Mamba integration:
| Component | Spec |
|---|---|
| Base Model | GPT-OSS 20B MoE |
| Hidden Dim | 2880 |
| Attention | 24 layers (sliding + full alternating) |
| Mamba | 12 layers (interleaved 2:1) |
| MoE | 32 experts, top-4 routing |
| Vocab | 201,088 tokens |
| Total Blocks | 36 (24 Attn + 12 Mamba) |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GPT-OSS 20B (Attention + MoE) β
β β Surgery (inject 12 Mamba layers) β
β Hybrid: A-A-M-A-A-M-... pattern β
β β Phase 1 (train Mamba only) β
β Mamba learns to "speak GPT-OSS language" β
β β Phase 2 (enable Matryoshka) β
β Adaptive compute: 128 β 2880 dim per layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Training Status
Phase 1: Mamba integration (freeze Attention+MoE, train Mamba)
Usage
# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba
License
Apache 2.0 (same as GPT-OSS)