datasysdev commited on
Commit
111b07a
Β·
verified Β·
1 Parent(s): 3df123f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - pytorch
5
+ - transformer
6
+ - mamba
7
+ - moe
8
+ - hybrid
9
+ - matryoshka
10
+ - gpt-oss
11
+ - adaptive-compute
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # πŸŒ€ GPT-OSS Adamba: Hybrid MoE + Mamba
16
+
17
+ > **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone
18
+
19
+ πŸ“‚ **[GitHub](https://github.com/unixsysdev/adamba)** | πŸ€— **[Original Adamba](https://huggingface.co/datasysdev/adamba)**
20
+
21
+ ## Available Checkpoints
22
+
23
+ | Variant | Parameters | Dim | Features | Status | Download |
24
+ |---------|------------|-----|----------|--------|----------|
25
+ | gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | βœ… | [Download](./checkpoints/gptoss_phase1.pt) |
26
+ | gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | ⏳ | β€” |
27
+ | gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | ⏳ | β€” |
28
+ | gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | ⏳ | β€” |
29
+
30
+ ## Architecture
31
+
32
+ Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration:
33
+
34
+ | Component | Spec |
35
+ |-----------|------|
36
+ | **Base Model** | GPT-OSS 20B MoE |
37
+ | **Hidden Dim** | 2880 |
38
+ | **Attention** | 24 layers (sliding + full alternating) |
39
+ | **Mamba** | 12 layers (interleaved 2:1) |
40
+ | **MoE** | 32 experts, top-4 routing |
41
+ | **Vocab** | 201,088 tokens |
42
+ | **Total Blocks** | 36 (24 Attn + 12 Mamba) |
43
+
44
+ ```
45
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
46
+ β”‚ GPT-OSS 20B (Attention + MoE) β”‚
47
+ β”‚ ↓ Surgery (inject 12 Mamba layers) β”‚
48
+ β”‚ Hybrid: A-A-M-A-A-M-... pattern β”‚
49
+ β”‚ ↓ Phase 1 (train Mamba only) β”‚
50
+ β”‚ Mamba learns to "speak GPT-OSS language" β”‚
51
+ β”‚ ↓ Phase 2 (enable Matryoshka) β”‚
52
+ β”‚ Adaptive compute: 128 β†’ 2880 dim per layer β”‚
53
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
54
+ ```
55
+
56
+ ## Training Status
57
+
58
+ **Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba)
59
+
60
+ ## Usage
61
+
62
+ ```python
63
+ # Coming soon - inference code
64
+ # See: https://github.com/unixsysdev/adamba
65
+ ```
66
+
67
+ ## License
68
+
69
+ Apache 2.0 (same as GPT-OSS)