File size: 2,591 Bytes
81d1bef
 
cb69dfe
81d1bef
cb69dfe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Stage 5: Circuit-Level Synthesis

The Stage 0 classifier synthesized to gates.

## Input

`person_classifier_1p.v` describes the Stage 0 decision as a purely combinational Verilog module. It takes 40 signed INT8 inputs (the already-selected, already-layernormed, already-max-pooled feature values at the 40 classifier dims), computes `sum(positive dims) - sum(negative dims)`, and compares the result against a signed 16-bit threshold. One output bit: `person_present`.

No multipliers, no sequential logic, no memory.

## Synthesis

Run with Yosys (OSS CAD Suite, `yosys.exe -s synth.ys`). Pass sequence: `hierarchy`, `proc`, `opt`, `flatten`, `opt_clean`, `synth -top`, `abc -g AND,XOR`, `opt_clean`, `stat`. Target gate library restricted to `{AND, XOR}` at the ABC stage so the reported count is directly in universal 2-input gates rather than a vendor cell library.

## Result

```
Total cells   : 3,220
  AND         : 1,172
  NOT         : 1,318
  XOR         :   730
Wires         : 3,261
Public ports  :    42   (40 data + 1 threshold + 1 output)
Port bits     :   337
```

3,220 universal gates for a 40-input INT8 combinational person-scene detector.

## Scale comparison

From the prior cofiber-detection repo's synthesis scaling (`circuit/README.md` there): a 768-input INT8 multiply-accumulate extrapolates to ~65,000 gates per MAC. A full 6-MAC 4,614-parameter person detector was estimated at ~391,000 gates. Our 1-parameter classifier is roughly **120× smaller** than that reference, because it replaces the 768×{cls, reg, ctr} MAC array with 40 selected additions + one comparator.

## What this stage ships

- `person_classifier_1p.v` — the classifier as synthesizable Verilog
- `synth.ys` — Yosys script
- `synthesized.v` — post-synthesis gate-level netlist
- `synth.log` — full synthesis log with statistics

## Deployment implication

3,220 gates at a modern process (e.g., 22 nm FD-SOI for microcontroller-class ASICs) sits on the order of 0.01–0.03 mm². Sub-millisecond combinational latency. Sub-milliwatt switching power for single-frame evaluation. Fits inside the ISP block of a camera sensor or as a macro next to an always-on wake circuit.

The classifier's inputs (40 selected INT8 feature values) still require the backbone to produce them. Everything upstream of this module — EUPE-ViT-B or a specialist student (Stage 4) — must also be synthesizable or runnable in some other primitive before the full camera-to-bit pipeline is gate-level. That upstream cost is where the other stages sit; this stage closes the loop at the decision end.