File size: 2,378 Bytes
111b07a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: apache-2.0
tags:
- pytorch
- transformer
- mamba
- moe
- hybrid
- matryoshka
- gpt-oss
- adaptive-compute
pipeline_tag: text-generation
---
# π GPT-OSS Adamba: Hybrid MoE + Mamba
> **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone
π **[GitHub](https://github.com/unixsysdev/adamba)** | π€ **[Original Adamba](https://huggingface.co/datasysdev/adamba)**
## Available Checkpoints
| Variant | Parameters | Dim | Features | Status | Download |
|---------|------------|-----|----------|--------|----------|
| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | β
| [Download](./checkpoints/gptoss_phase1.pt) |
| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | β³ | β |
| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | β³ | β |
| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | β³ | β |
## Architecture
Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:
| Component | Spec |
|-----------|------|
| **Base Model** | GPT-OSS 20B MoE |
| **Hidden Dim** | 2880 |
| **Attention** | 24 layers (sliding + full alternating) |
| **Mamba** | 12 layers (interleaved 2:1) |
| **MoE** | 32 experts, top-4 routing |
| **Vocab** | 201,088 tokens |
| **Total Blocks** | 36 (24 Attn + 12 Mamba) |
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GPT-OSS 20B (Attention + MoE) β
β β Surgery (inject 12 Mamba layers) β
β Hybrid: A-A-M-A-A-M-... pattern β
β β Phase 1 (train Mamba only) β
β Mamba learns to "speak GPT-OSS language" β
β β Phase 2 (enable Matryoshka) β
β Adaptive compute: 128 β 2880 dim per layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Training Status
**Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba)
## Usage
```python
# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba
```
## License
Apache 2.0 (same as GPT-OSS)
|