phanerozoic's picture
Stage 5: Yosys synthesis (3,220 gates total for Stage 0 classifier)
cb69dfe verified

Stage 5: Circuit-Level Synthesis

The Stage 0 classifier synthesized to gates.

Input

person_classifier_1p.v describes the Stage 0 decision as a purely combinational Verilog module. It takes 40 signed INT8 inputs (the already-selected, already-layernormed, already-max-pooled feature values at the 40 classifier dims), computes sum(positive dims) - sum(negative dims), and compares the result against a signed 16-bit threshold. One output bit: person_present.

No multipliers, no sequential logic, no memory.

Synthesis

Run with Yosys (OSS CAD Suite, yosys.exe -s synth.ys). Pass sequence: hierarchy, proc, opt, flatten, opt_clean, synth -top, abc -g AND,XOR, opt_clean, stat. Target gate library restricted to {AND, XOR} at the ABC stage so the reported count is directly in universal 2-input gates rather than a vendor cell library.

Result

Total cells   : 3,220
  AND         : 1,172
  NOT         : 1,318
  XOR         :   730
Wires         : 3,261
Public ports  :    42   (40 data + 1 threshold + 1 output)
Port bits     :   337

3,220 universal gates for a 40-input INT8 combinational person-scene detector.

Scale comparison

From the prior cofiber-detection repo's synthesis scaling (circuit/README.md there): a 768-input INT8 multiply-accumulate extrapolates to ~65,000 gates per MAC. A full 6-MAC 4,614-parameter person detector was estimated at ~391,000 gates. Our 1-parameter classifier is roughly 120× smaller than that reference, because it replaces the 768×{cls, reg, ctr} MAC array with 40 selected additions + one comparator.

What this stage ships

  • person_classifier_1p.v — the classifier as synthesizable Verilog
  • synth.ys — Yosys script
  • synthesized.v — post-synthesis gate-level netlist
  • synth.log — full synthesis log with statistics

Deployment implication

3,220 gates at a modern process (e.g., 22 nm FD-SOI for microcontroller-class ASICs) sits on the order of 0.01–0.03 mm². Sub-millisecond combinational latency. Sub-milliwatt switching power for single-frame evaluation. Fits inside the ISP block of a camera sensor or as a macro next to an always-on wake circuit.

The classifier's inputs (40 selected INT8 feature values) still require the backbone to produce them. Everything upstream of this module — EUPE-ViT-B or a specialist student (Stage 4) — must also be synthesizable or runnable in some other primitive before the full camera-to-bit pipeline is gate-level. That upstream cost is where the other stages sit; this stage closes the loop at the decision end.