|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- pytorch |
|
|
- transformer |
|
|
- mamba |
|
|
- moe |
|
|
- hybrid |
|
|
- matryoshka |
|
|
- gpt-oss |
|
|
- adaptive-compute |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# π GPT-OSS Adamba: Hybrid MoE + Mamba |
|
|
|
|
|
> **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone |
|
|
|
|
|
π **[GitHub](https://github.com/unixsysdev/adamba)** | π€ **[Original Adamba](https://huggingface.co/datasysdev/adamba)** |
|
|
|
|
|
## Available Checkpoints |
|
|
|
|
|
| Variant | Parameters | Dim | Features | Status | Download | |
|
|
|---------|------------|-----|----------|--------|----------| |
|
|
| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | β
| [Download](./checkpoints/gptoss_phase1.pt) | |
|
|
| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | β³ | β | |
|
|
| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | β³ | β | |
|
|
| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | β³ | β | |
|
|
|
|
|
## Architecture |
|
|
|
|
|
Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration: |
|
|
|
|
|
| Component | Spec | |
|
|
|-----------|------| |
|
|
| **Base Model** | GPT-OSS 20B MoE | |
|
|
| **Hidden Dim** | 2880 | |
|
|
| **Attention** | 24 layers (sliding + full alternating) | |
|
|
| **Mamba** | 12 layers (interleaved 2:1) | |
|
|
| **MoE** | 32 experts, top-4 routing | |
|
|
| **Vocab** | 201,088 tokens | |
|
|
| **Total Blocks** | 36 (24 Attn + 12 Mamba) | |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β GPT-OSS 20B (Attention + MoE) β |
|
|
β β Surgery (inject 12 Mamba layers) β |
|
|
β Hybrid: A-A-M-A-A-M-... pattern β |
|
|
β β Phase 1 (train Mamba only) β |
|
|
β Mamba learns to "speak GPT-OSS language" β |
|
|
β β Phase 2 (enable Matryoshka) β |
|
|
β Adaptive compute: 128 β 2880 dim per layer β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
## Training Status |
|
|
|
|
|
**Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
# Coming soon - inference code |
|
|
# See: https://github.com/unixsysdev/adamba |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as GPT-OSS) |
|
|
|