gptoss-adamba / README.md
datasysdev's picture
Upload README.md with huggingface_hub
111b07a verified
---
license: apache-2.0
tags:
- pytorch
- transformer
- mamba
- moe
- hybrid
- matryoshka
- gpt-oss
- adaptive-compute
pipeline_tag: text-generation
---
# πŸŒ€ GPT-OSS Adamba: Hybrid MoE + Mamba
> **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone
πŸ“‚ **[GitHub](https://github.com/unixsysdev/adamba)** | πŸ€— **[Original Adamba](https://huggingface.co/datasysdev/adamba)**
## Available Checkpoints
| Variant | Parameters | Dim | Features | Status | Download |
|---------|------------|-----|----------|--------|----------|
| gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | βœ… | [Download](./checkpoints/gptoss_phase1.pt) |
| gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | ⏳ | β€” |
| gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | ⏳ | β€” |
| gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | ⏳ | β€” |
## Architecture
Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:
| Component | Spec |
|-----------|------|
| **Base Model** | GPT-OSS 20B MoE |
| **Hidden Dim** | 2880 |
| **Attention** | 24 layers (sliding + full alternating) |
| **Mamba** | 12 layers (interleaved 2:1) |
| **MoE** | 32 experts, top-4 routing |
| **Vocab** | 201,088 tokens |
| **Total Blocks** | 36 (24 Attn + 12 Mamba) |
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GPT-OSS 20B (Attention + MoE) β”‚
β”‚ ↓ Surgery (inject 12 Mamba layers) β”‚
β”‚ Hybrid: A-A-M-A-A-M-... pattern β”‚
β”‚ ↓ Phase 1 (train Mamba only) β”‚
β”‚ Mamba learns to "speak GPT-OSS language" β”‚
β”‚ ↓ Phase 2 (enable Matryoshka) β”‚
β”‚ Adaptive compute: 128 β†’ 2880 dim per layer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Training Status
**Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba)
## Usage
```python
# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba
```
## License
Apache 2.0 (same as GPT-OSS)