Stage 5: Circuit-Level Synthesis
The Stage 0 classifier synthesized to gates.
Input
person_classifier_1p.v describes the Stage 0 decision as a purely combinational Verilog module. It takes 40 signed INT8 inputs (the already-selected, already-layernormed, already-max-pooled feature values at the 40 classifier dims), computes sum(positive dims) - sum(negative dims), and compares the result against a signed 16-bit threshold. One output bit: person_present.
No multipliers, no sequential logic, no memory.
Synthesis
Run with Yosys (OSS CAD Suite, yosys.exe -s synth.ys). Pass sequence: hierarchy, proc, opt, flatten, opt_clean, synth -top, abc -g AND,XOR, opt_clean, stat. Target gate library restricted to {AND, XOR} at the ABC stage so the reported count is directly in universal 2-input gates rather than a vendor cell library.
Result
Total cells : 3,220
AND : 1,172
NOT : 1,318
XOR : 730
Wires : 3,261
Public ports : 42 (40 data + 1 threshold + 1 output)
Port bits : 337
3,220 universal gates for a 40-input INT8 combinational person-scene detector.
Scale comparison
From the prior cofiber-detection repo's synthesis scaling (circuit/README.md there): a 768-input INT8 multiply-accumulate extrapolates to ~65,000 gates per MAC. A full 6-MAC 4,614-parameter person detector was estimated at ~391,000 gates. Our 1-parameter classifier is roughly 120× smaller than that reference, because it replaces the 768×{cls, reg, ctr} MAC array with 40 selected additions + one comparator.
What this stage ships
person_classifier_1p.v— the classifier as synthesizable Verilogsynth.ys— Yosys scriptsynthesized.v— post-synthesis gate-level netlistsynth.log— full synthesis log with statistics
Deployment implication
3,220 gates at a modern process (e.g., 22 nm FD-SOI for microcontroller-class ASICs) sits on the order of 0.01–0.03 mm². Sub-millisecond combinational latency. Sub-milliwatt switching power for single-frame evaluation. Fits inside the ISP block of a camera sensor or as a macro next to an always-on wake circuit.
The classifier's inputs (40 selected INT8 feature values) still require the backbone to produce them. Everything upstream of this module — EUPE-ViT-B or a specialist student (Stage 4) — must also be synthesizable or runnable in some other primitive before the full camera-to-bit pipeline is gate-level. That upstream cost is where the other stages sit; this stage closes the loop at the decision end.