phanerozoic
/

1-parameter-classifier

@@ -10,6 +10,7 @@ tags:
   - interpretability
   - vision-transformer
   - feature-engram
 library_name: pytorch
 datasets:
   - detection-datasets/coco
@@ -28,14 +29,21 @@ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and fu
 ## Roadmap
-| Stage | Name | What changes |
-|---|---|---|
-| 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged |
-| 1 | Output-channel pruning | Keep only the 100 feature dims the classifier reads |
-| 2 | Attention-head pruning | Ablate heads that do not contribute to those 100 dims |
-| 3 | Depth reduction | Drop transformer blocks that do not route signal to the 100 dims |
-| 4 | Specialist backbone | Train a small student that emits only the 100 target dims |
-| 5 | Circuit-level synthesis | Synthesize the entire fixed-weight pipeline to gates and dead-code eliminate everything that does not reach the classifier output |
 ## Source backbone

   - interpretability
   - vision-transformer
   - feature-engram
+  - circuit-synthesis
 library_name: pytorch
 datasets:
   - detection-datasets/coco
 ## Roadmap
+| Stage | Name | What changes | Status | Result |
+|---|---|---|---|---|
+| 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged | shipped | F1 0.889 · 85.64M backbone · 1 free param |
+| 1 | Output-channel pruning | Slice the 40 dims the classifier reads; fuse the head | shipped | F1 0.889 (parity) · same backbone · cleaner interface |
+| 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
+| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
+| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
+| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
+## Headline numbers
+- Stage 2 pruning *improves* the classifier: removing 10 redundant / noise-injecting attention heads raises F1 from 0.894 (1K-image calibration) to 0.916 on the same calibration pool.
+- Stage 3 shows the backbone is depth-critical: only 1 of 12 blocks is cleanly removable.
+- Stage 4 specialist student fits the full person-classification pipeline in 3.27M parameters at F1 0.710 — 26× smaller than the teacher, with a known path forward for closing the F1 gap (see stage_4 README).
+- Stage 5 puts the actual decision circuit at 3,220 universal gates. Sub-millisecond combinational latency; sub-milliwatt power. Fits as a camera-ISP block.
 ## Source backbone