File size: 2,591 Bytes
81d1bef cb69dfe 81d1bef cb69dfe | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | # Stage 5: Circuit-Level Synthesis
The Stage 0 classifier synthesized to gates.
## Input
`person_classifier_1p.v` describes the Stage 0 decision as a purely combinational Verilog module. It takes 40 signed INT8 inputs (the already-selected, already-layernormed, already-max-pooled feature values at the 40 classifier dims), computes `sum(positive dims) - sum(negative dims)`, and compares the result against a signed 16-bit threshold. One output bit: `person_present`.
No multipliers, no sequential logic, no memory.
## Synthesis
Run with Yosys (OSS CAD Suite, `yosys.exe -s synth.ys`). Pass sequence: `hierarchy`, `proc`, `opt`, `flatten`, `opt_clean`, `synth -top`, `abc -g AND,XOR`, `opt_clean`, `stat`. Target gate library restricted to `{AND, XOR}` at the ABC stage so the reported count is directly in universal 2-input gates rather than a vendor cell library.
## Result
```
Total cells : 3,220
AND : 1,172
NOT : 1,318
XOR : 730
Wires : 3,261
Public ports : 42 (40 data + 1 threshold + 1 output)
Port bits : 337
```
3,220 universal gates for a 40-input INT8 combinational person-scene detector.
## Scale comparison
From the prior cofiber-detection repo's synthesis scaling (`circuit/README.md` there): a 768-input INT8 multiply-accumulate extrapolates to ~65,000 gates per MAC. A full 6-MAC 4,614-parameter person detector was estimated at ~391,000 gates. Our 1-parameter classifier is roughly **120× smaller** than that reference, because it replaces the 768×{cls, reg, ctr} MAC array with 40 selected additions + one comparator.
## What this stage ships
- `person_classifier_1p.v` — the classifier as synthesizable Verilog
- `synth.ys` — Yosys script
- `synthesized.v` — post-synthesis gate-level netlist
- `synth.log` — full synthesis log with statistics
## Deployment implication
3,220 gates at a modern process (e.g., 22 nm FD-SOI for microcontroller-class ASICs) sits on the order of 0.01–0.03 mm². Sub-millisecond combinational latency. Sub-milliwatt switching power for single-frame evaluation. Fits inside the ISP block of a camera sensor or as a macro next to an always-on wake circuit.
The classifier's inputs (40 selected INT8 feature values) still require the backbone to produce them. Everything upstream of this module — EUPE-ViT-B or a specialist student (Stage 4) — must also be synthesizable or runnable in some other primitive before the full camera-to-bit pipeline is gate-level. That upstream cost is where the other stages sit; this stage closes the loop at the decision end.
|