---
license: apache-2.0
tags:
- pytorch
- transformer
- mamba
- moe
- hybrid
- matryoshka
- gpt-oss
- adaptive-compute
pipeline_tag: text-generation
---

# 🌀 GPT-OSS Adamba: Hybrid MoE + Mamba

> **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone

📂 **[GitHub](https://github.com/unixsysdev/adamba)** | 🤗 **[Original Adamba](https://huggingface.co/datasysdev/adamba)**

## Available Checkpoints

| Variant | Parameters | Dim | Features | Status | Download |
|---------|------------|-----|----------|--------|----------|
| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | ✅ | [Download](./checkpoints/gptoss_phase1.pt) |
| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | ⏳ | — |
| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | ⏳ | — |
| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | ⏳ | — |

## Architecture

Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:

| Component | Spec |
|-----------|------|
| **Base Model** | GPT-OSS 20B MoE |
| **Hidden Dim** | 2880 |
| **Attention** | 24 layers (sliding + full alternating) |
| **Mamba** | 12 layers (interleaved 2:1) |
| **MoE** | 32 experts, top-4 routing |
| **Vocab** | 201,088 tokens |
| **Total Blocks** | 36 (24 Attn + 12 Mamba) |

```
┌─────────────────────────────────────────────────────┐
│  GPT-OSS 20B (Attention + MoE)                       │
│       ↓ Surgery (inject 12 Mamba layers)             │
│  Hybrid: A-A-M-A-A-M-... pattern                     │
│       ↓ Phase 1 (train Mamba only)                   │
│  Mamba learns to "speak GPT-OSS language"            │
│       ↓ Phase 2 (enable Matryoshka)                  │
│  Adaptive compute: 128 → 2880 dim per layer          │
└─────────────────────────────────────────────────────┘
```

## Training Status

**Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba)

## Usage

```python
# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba
```

## License

Apache 2.0 (same as GPT-OSS)