Add Stage 2b to roadmap
Browse files
README.md
CHANGED
|
@@ -34,6 +34,7 @@ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and fu
|
|
| 34 |
| 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged | shipped | F1 0.889 · 85.64M backbone · 1 free param |
|
| 35 |
| 1 | Output-channel pruning | Slice the 40 dims the classifier reads; fuse the head | shipped | F1 0.889 (parity) · same backbone · cleaner interface |
|
| 36 |
| 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
|
|
|
|
| 37 |
| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
|
| 38 |
| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
|
| 39 |
| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
|
|
|
|
| 34 |
| 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged | shipped | F1 0.889 · 85.64M backbone · 1 free param |
|
| 35 |
| 1 | Output-channel pruning | Slice the 40 dims the classifier reads; fuse the head | shipped | F1 0.889 (parity) · same backbone · cleaner interface |
|
| 36 |
| 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
|
| 37 |
+
| 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
|
| 38 |
| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
|
| 39 |
| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
|
| 40 |
| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
|