datasysdev
/

gptoss-adamba

+---
+license: apache-2.0
+tags:
+- pytorch
+- transformer
+- mamba
+- moe
+- hybrid
+- matryoshka
+- gpt-oss
+- adaptive-compute
+pipeline_tag: text-generation
+---
+# 🌀 GPT-OSS Adamba: Hybrid MoE + Mamba
+> **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone
+📂 **[GitHub](https://github.com/unixsysdev/adamba)** | 🤗 **[Original Adamba](https://huggingface.co/datasysdev/adamba)**
+## Available Checkpoints
+| Variant | Parameters | Dim | Features | Status | Download |
+|---------|------------|-----|----------|--------|----------|
+| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | ✅ | [Download](./checkpoints/gptoss_phase1.pt) |
+| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | ⏳ | — |
+| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | ⏳ | — |
+| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | ⏳ | — |
+## Architecture
+Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:
+| Component | Spec |
+|-----------|------|
+| **Base Model** | GPT-OSS 20B MoE |
+| **Hidden Dim** | 2880 |
+| **Attention** | 24 layers (sliding + full alternating) |
+| **Mamba** | 12 layers (interleaved 2:1) |
+| **MoE** | 32 experts, top-4 routing |
+| **Vocab** | 201,088 tokens |
+| **Total Blocks** | 36 (24 Attn + 12 Mamba) |
+```
+┌─────────────────────────────────────────────────────┐
+│  GPT-OSS 20B (Attention + MoE)                       │
+│       ↓ Surgery (inject 12 Mamba layers)             │
+│  Hybrid: A-A-M-A-A-M-... pattern                     │
+│       ↓ Phase 1 (train Mamba only)                   │
+│  Mamba learns to "speak GPT-OSS language"            │
+│       ↓ Phase 2 (enable Matryoshka)                  │
+│  Adaptive compute: 128 → 2880 dim per layer          │
+└─────────────────────────────────────────────────────┘
+```
+## Training Status
+**Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba)
+## Usage
+```python
+# Coming soon - inference code
+# See: https://github.com/unixsysdev/adamba
+```
+## License
+Apache 2.0 (same as GPT-OSS)