Roadmap: add stage 4b and 5b
Browse files
README.md
CHANGED
|
@@ -37,7 +37,9 @@ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and fu
|
|
| 37 |
| 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
|
| 38 |
| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
|
| 39 |
| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
|
|
|
|
| 40 |
| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
|
|
|
|
| 41 |
|
| 42 |
## Headline numbers
|
| 43 |
|
|
|
|
| 37 |
| 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
|
| 38 |
| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
|
| 39 |
| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
|
| 40 |
+
| 4b | Bigger specialist, cosine loss | 15.67 M student, cosine similarity on full 768-D pooled teacher | shipped | F1 0.723 (+0.013 over Stage 4) · gap to baseline persists |
|
| 41 |
| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
|
| 42 |
+
| 5b | Popcount reformulation | Per-dim INT8 threshold → popcount → comparator | shipped | **907 gates** (−71 % vs Stage 5 folded), F1 0.876 (−0.008) |
|
| 43 |
|
| 44 |
## Headline numbers
|
| 45 |
|