phanerozoic commited on
Commit
b52512b
·
verified ·
1 Parent(s): 958c1e0

README: add Stage 4C row + 4C/5b headline bullets; refresh Stage 4/4B to peak-checkpoint F1

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -36,8 +36,9 @@ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and fu
36
  | 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
37
  | 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
38
  | 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
39
- | 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.710 · proof of concept, gap to baseline |
40
- | 4b | Bigger specialist, cosine loss | 15.67 M student, cosine similarity on full 768-D pooled teacher | shipped | F1 0.723 (+0.013 over Stage 4) · gap to baseline persists |
 
41
  | 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
42
  | 5b | Popcount reformulation | Per-dim INT8 threshold → popcount → comparator | shipped | **907 gates** (−71 % vs Stage 5 folded), F1 0.876 (−0.008) |
43
 
@@ -45,8 +46,10 @@ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and fu
45
 
46
  - Stage 2 pruning *improves* the classifier: removing 10 redundant / noise-injecting attention heads raises F1 from 0.894 (1K-image calibration) to 0.916 on the same calibration pool.
47
  - Stage 3 shows the backbone is depth-critical: only 1 of 12 blocks is cleanly removable.
48
- - Stage 4 specialist student fits the full person-classification pipeline in 3.27M parameters at F1 0.710 26× smaller than the teacher, with a known path forward for closing the F1 gap (see stage_4 README).
49
- - Stage 5 puts the actual decision circuit at 3,220 universal gates. Sub-millisecond combinational latency; sub-milliwatt power. Fits as a camera-ISP block.
 
 
50
 
51
  ## Source backbone
52
 
 
36
  | 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
37
  | 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
38
  | 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
39
+ | 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.717 · proof of concept, gap to baseline |
40
+ | 4b | Bigger specialist, cosine loss | 15.67 M student, cosine similarity on full 768-D pooled teacher | shipped | F1 0.726 (+0.009 over Stage 4) · gap to baseline persists |
41
+ | 4c | Direct scalar supervision | Same 3.27 M student, MSE on the classifier sum-difference scalar | shipped | F1 0.734 · threshold converges to 25.0 (teacher 25.3) · calibration aligned |
42
  | 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
43
  | 5b | Popcount reformulation | Per-dim INT8 threshold → popcount → comparator | shipped | **907 gates** (−71 % vs Stage 5 folded), F1 0.876 (−0.008) |
44
 
 
46
 
47
  - Stage 2 pruning *improves* the classifier: removing 10 redundant / noise-injecting attention heads raises F1 from 0.894 (1K-image calibration) to 0.916 on the same calibration pool.
48
  - Stage 3 shows the backbone is depth-critical: only 1 of 12 blocks is cleanly removable.
49
+ - Stage 4 specialist student fits the full person-classification pipeline in 3.27M parameters at F1 0.717, 26× smaller than the teacher (full path forward in the stage_4 README).
50
+ - Stage 4C's direct scalar supervision on the same 3.27M student lifts F1 to 0.734 at the same footprint, with the student's threshold converging to 25.0 against the teacher's 25.3.
51
+ - Stage 5 puts the decision circuit at 3,220 universal gates. Sub-millisecond combinational latency; sub-milliwatt power. Fits as a camera-ISP block.
52
+ - Stage 5b's popcount reformulation drops that to 907 gates (−71 %) at F1 0.876, with most of the saving coming from eliminating the signed 8-bit adder tree.
53
 
54
  ## Source backbone
55