File size: 4,305 Bytes
1759611
 
 
 
 
 
 
 
 
 
 
 
caedda8
1759611
 
 
 
 
 
81d1bef
 
 
 
 
 
 
 
 
 
 
 
caedda8
 
 
 
 
1236861
caedda8
b52512b
 
 
caedda8
daffe23
caedda8
 
 
 
 
b52512b
 
 
 
81d1bef
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: other
license_name: fair-research-license
license_link: https://huggingface.co/facebook/EUPE-ViT-B/blob/main/LICENSE
base_model: facebook/EUPE-ViT-B
tags:
  - image-classification
  - binary-classification
  - minimal-models
  - interpretability
  - vision-transformer
  - feature-engram
  - circuit-synthesis
library_name: pytorch
datasets:
  - detection-datasets/coco
pipeline_tag: image-classification
---

# 1-Parameter Classifier

Progressively reducing the model budget for image-level person classification on EUPE-ViT-B features. Each stage is a deeper reduction or transformation of the previous. The classifier shrinks across stages while the backbone it draws features from is attacked in parallel.

## Stage 0: Baseline

A 1-free-parameter image-level person classifier on the frozen EUPE-ViT-B backbone. The classifier reads 20 pre-selected person-positive and 20 pre-selected person-negative feature dimensions, sums the positives, subtracts the negatives, and compares the result to one learned threshold. F1 = 0.889 on COCO val 2017 image-level person presence, measured through the live Argus forward pass at 768 pixel input.

See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and full characterization of the person axis in the backbone.

## Roadmap

| Stage | Name | What changes | Status | Result |
|---|---|---|---|---|
| 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged | shipped | F1 0.889 · 85.64M backbone · 1 free param |
| 1 | Output-channel pruning | Slice the 40 dims the classifier reads; fuse the head | shipped | F1 0.889 (parity) · same backbone · cleaner interface |
| 2 | Attention-head pruning | Ablate heads that do not contribute to those dims | shipped | **F1 0.916** (+0.022) at K=10 heads pruned · 1.97M params masked |
| 2b | Structural head removal | Physically shrink qkv/proj tensors, reduce per-block `num_heads` | shipped | F1 0.9159 preserved · backbone 85.64M → 83.68M (1.97M saved, 2.30 %) |
| 3 | Depth reduction | Drop transformer blocks that do not route signal | shipped | F1 0.876 at K=1 block · F1 collapses at K≥3 · hard ceiling |
| 4 | Specialist backbone | Train a small student that emits only the target dims | shipped | 3.27M-param student · F1 0.717 · proof of concept, gap to baseline |
| 4b | Bigger specialist, cosine loss | 15.67 M student, cosine similarity on full 768-D pooled teacher | shipped | F1 0.726 (+0.009 over Stage 4) · gap to baseline persists |
| 4c | Direct scalar supervision | Same 3.27 M student, MSE on the classifier sum-difference scalar | shipped | F1 0.734 · threshold converges to 25.0 (teacher 25.3) · calibration aligned |
| 5 | Circuit-level synthesis | Synthesize the Stage 0 classifier to gates | shipped | **3,220 gates** (1,172 AND + 1,318 NOT + 730 XOR) |
| 5b | Popcount reformulation | Per-dim INT8 threshold → popcount → comparator | shipped | **907 gates** (−71 % vs Stage 5 folded), F1 0.876 (−0.008) |

## Headline numbers

- Stage 2 pruning *improves* the classifier: removing 10 redundant / noise-injecting attention heads raises F1 from 0.894 (1K-image calibration) to 0.916 on the same calibration pool.
- Stage 3 shows the backbone is depth-critical: only 1 of 12 blocks is cleanly removable.
- Stage 4 specialist student fits the full person-classification pipeline in 3.27M parameters at F1 0.717, 26× smaller than the teacher (full path forward in the stage_4 README).
- Stage 4C's direct scalar supervision on the same 3.27M student lifts F1 to 0.734 at the same footprint, with the student's threshold converging to 25.0 against the teacher's 25.3.
- Stage 5 puts the decision circuit at 3,220 universal gates. Sub-millisecond combinational latency; sub-milliwatt power. Fits as a camera-ISP block.
- Stage 5b's popcount reformulation drops that to 907 gates (−71 %) at F1 0.876, with most of the saving coming from eliminating the signed 8-bit adder tree.

## Source backbone

EUPE-ViT-B from Meta FAIR ([arXiv:2603.22387](https://arxiv.org/abs/2603.22387), Zhu et al., March 2026), distilled from PEcore-G + PElang-G + DINOv3-H+ via a 1.9B proxy teacher. License: FAIR Research License (non-commercial). The 1-parameter classifier is an artifact derived from that backbone's feature geometry.