IEEE-754 add: composing the stages
The threshold network ships separate float16.unpack, float16.add, float16.normalize, and float16.pack (and the float32.* siblings). Each is an independent threshold-logic subcircuit with its own external ports.
What you actually get from each stage
| Stage | Inputs | Outputs |
|---|---|---|
float16.unpack |
1 generic input bit | 16 bit0..bit15 outputs (sign / exp / mantissa fields) |
float16.add |
5 exp_a + 5 exp_b + 5 generic input bits = 15 inputs |
499 outputs covering align, exp_diff, mant_add, mant_sub, mant_select, sign_xor stages |
float16.normalize |
2 generic input bits | 356 outputs covering exp_adj and mantissa-shift stages |
float16.pack |
1 generic input bit | 16 bit0..bit15 outputs (assembled IEEE-754 word) |
Use python -m safetensors2verilog ... --circuit float16.<stage> --inspect to print the live contract for any stage from the variant you're using.
What's missing for a single composed block
None of the four stages exposes ports named in a way that says "I produce the operand the next stage consumes." float16.unpack outputs float16.unpack.bit0..15; float16.add consumes $float16_exp_a[0..4] and an opaque $input[0..4]. The wire-up between unpack and add is not encoded in the safetensors metadata. It exists only in the original construction code at build.py and in the gate-fitness harness at eval.py.
Two consequences:
- Subcircuit extraction works, but each stage compiles to a standalone module whose bit-bag inputs and outputs need a hand-written wrapper to chain.
- A single
float16.add_fullVerilog block that runs the complete IEEE-754 pipeline for a 16-bit operand pair is not derivable from the published safetensors files alone.
The intended path forward
The right fix is in build.py: when generating each float stage, register the inter-stage signal IDs so that, e.g., float16.unpack.bit15 (sign of a) and float16.add.sign_a are the same signal in the global registry, with metadata that exposes the per-stage contract by role (sign_a, exp_a, mant_a) rather than by ad-hoc port name. The schema-versioned metadata from safetensors2verilog's frontend (see core.SIGNAL_REGISTRY_SCHEMA_VERSION_LATEST) is the place to land this.
Once that is in place, safetensors2verilog --circuit float16.add_full would emit a single composed top-level module by walking the now-explicit cross-stage wiring. The dependency-closure extractor already does the heavy lifting; it currently produces correct output for any circuit whose internal wiring is already explicit (e.g. arithmetic.ripplecarry8bit).
Workaround today
For end-to-end float16-add evaluation, run the existing gate-level fitness suite which exercises the composed pipeline through Python eval:
python eval_all.py variants/neural_alu8.safetensors --debug 2>&1 | grep -A 20 "FLOAT16 ADD"
For Verilog-side experimentation, pick one stage at a time and feed it driven testbench inputs that match the role each port plays in the original pipeline (read off build.py's infer_* routines for that stage).