πŸŒ€ GPT-OSS Adamba: Hybrid MoE + Mamba

21.9B parameters | 32 experts | Mamba-enhanced reasoning backbone

πŸ“‚ GitHub | πŸ€— Original Adamba

Available Checkpoints

Variant Parameters Dim Features Status Download
gptoss_phase1 21.9B 2880 mamba_integration, moe_32experts βœ… Download
gptoss_phase2 21.9B 2880 matryoshka, early_exit, moe_32experts ⏳ β€”
gptoss_phase3 30B+ 4096 matryoshka, early_exit, moe_32experts, expansion ⏳ β€”
gptoss_sft 21.9B 2880 matryoshka, moe_32experts, sft ⏳ β€”

Architecture

Built on OpenAI GPT-OSS 20B with Mamba integration:

Component Spec
Base Model GPT-OSS 20B MoE
Hidden Dim 2880
Attention 24 layers (sliding + full alternating)
Mamba 12 layers (interleaved 2:1)
MoE 32 experts, top-4 routing
Vocab 201,088 tokens
Total Blocks 36 (24 Attn + 12 Mamba)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GPT-OSS 20B (Attention + MoE)                       β”‚
β”‚       ↓ Surgery (inject 12 Mamba layers)             β”‚
β”‚  Hybrid: A-A-M-A-A-M-... pattern                     β”‚
β”‚       ↓ Phase 1 (train Mamba only)                   β”‚
β”‚  Mamba learns to "speak GPT-OSS language"            β”‚
β”‚       ↓ Phase 2 (enable Matryoshka)                  β”‚
β”‚  Adaptive compute: 128 β†’ 2880 dim per layer          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training Status

Phase 1: Mamba integration (freeze Attention+MoE, train Mamba)

Usage

# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba

License

Apache 2.0 (same as GPT-OSS)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support