datasysdev
/

gptoss-adamba

Text Generation

Mixture of Experts

adaptive-compute

Model card Files Files and versions

gptoss-adamba / README.md

datasysdev's picture

Upload README.md with huggingface_hub

111b07a verified 6 days ago

|

history blame contribute delete

2.38 kB

	---
	license: apache-2.0
	tags:
	- pytorch
	- transformer
	- mamba
	- moe
	- hybrid
	- matryoshka
	- gpt-oss
	- adaptive-compute
	pipeline_tag: text-generation
	---

	# 🌀 GPT-OSS Adamba: Hybrid MoE + Mamba

	> 21.9B parameters \| 32 experts \| Mamba-enhanced reasoning backbone

	📂 [GitHub](https://github.com/unixsysdev/adamba) \| 🤗 [Original Adamba](https://huggingface.co/datasysdev/adamba)

	## Available Checkpoints

	\| Variant \| Parameters \| Dim \| Features \| Status \| Download \|
	\|---------\|------------\|-----\|----------\|--------\|----------\|
	\| gptoss_phase1 \| 21.9B \| 2880 \| mamba_integration, moe_32experts \| ✅ \| [Download](./checkpoints/gptoss_phase1.pt) \|
	\| gptoss_phase2 \| 21.9B \| 2880 \| matryoshka, early_exit, moe_32experts \| ⏳ \| — \|
	\| gptoss_phase3 \| 30B+ \| 4096 \| matryoshka, early_exit, moe_32experts, expansion \| ⏳ \| — \|
	\| gptoss_sft \| 21.9B \| 2880 \| matryoshka, moe_32experts, sft \| ⏳ \| — \|

	## Architecture

	Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:

	\| Component \| Spec \|
	\|-----------\|------\|
	\| Base Model \| GPT-OSS 20B MoE \|
	\| Hidden Dim \| 2880 \|
	\| Attention \| 24 layers (sliding + full alternating) \|
	\| Mamba \| 12 layers (interleaved 2:1) \|
	\| MoE \| 32 experts, top-4 routing \|
	\| Vocab \| 201,088 tokens \|
	\| Total Blocks \| 36 (24 Attn + 12 Mamba) \|

	```
	┌─────────────────────────────────────────────────────┐
	│ GPT-OSS 20B (Attention + MoE) │
	│ ↓ Surgery (inject 12 Mamba layers) │
	│ Hybrid: A-A-M-A-A-M-... pattern │
	│ ↓ Phase 1 (train Mamba only) │
	│ Mamba learns to "speak GPT-OSS language" │
	│ ↓ Phase 2 (enable Matryoshka) │
	│ Adaptive compute: 128 → 2880 dim per layer │
	└─────────────────────────────────────────────────────┘
	```

	## Training Status

	Phase 1: Mamba integration (freeze Attention+MoE, train Mamba)

	## Usage

	```python
	# Coming soon - inference code
	# See: https://github.com/unixsysdev/adamba
	```

	## License

	Apache 2.0 (same as GPT-OSS)