8bit-threshold-computer / docs /float-pipeline.md
CharlesCNorton
eval_all: hash-keyed result cache (--cache-dir, --no-cache); README: bit-ordering scope rules; docs/ISA.md: opcode reference and end-to-end tutorial; docs/float-pipeline.md: composition gap notes
597e7c2

IEEE-754 add: composing the stages

The threshold network ships separate float16.unpack, float16.add, float16.normalize, and float16.pack (and the float32.* siblings). Each is an independent threshold-logic subcircuit with its own external ports.

What you actually get from each stage

Stage Inputs Outputs
float16.unpack 1 generic input bit 16 bit0..bit15 outputs (sign / exp / mantissa fields)
float16.add 5 exp_a + 5 exp_b + 5 generic input bits = 15 inputs 499 outputs covering align, exp_diff, mant_add, mant_sub, mant_select, sign_xor stages
float16.normalize 2 generic input bits 356 outputs covering exp_adj and mantissa-shift stages
float16.pack 1 generic input bit 16 bit0..bit15 outputs (assembled IEEE-754 word)

Use python -m safetensors2verilog ... --circuit float16.<stage> --inspect to print the live contract for any stage from the variant you're using.

What's missing for a single composed block

None of the four stages exposes ports named in a way that says "I produce the operand the next stage consumes." float16.unpack outputs float16.unpack.bit0..15; float16.add consumes $float16_exp_a[0..4] and an opaque $input[0..4]. The wire-up between unpack and add is not encoded in the safetensors metadata. It exists only in the original construction code at build.py and in the gate-fitness harness at eval.py.

Two consequences:

  1. Subcircuit extraction works, but each stage compiles to a standalone module whose bit-bag inputs and outputs need a hand-written wrapper to chain.
  2. A single float16.add_full Verilog block that runs the complete IEEE-754 pipeline for a 16-bit operand pair is not derivable from the published safetensors files alone.

The intended path forward

The right fix is in build.py: when generating each float stage, register the inter-stage signal IDs so that, e.g., float16.unpack.bit15 (sign of a) and float16.add.sign_a are the same signal in the global registry, with metadata that exposes the per-stage contract by role (sign_a, exp_a, mant_a) rather than by ad-hoc port name. The schema-versioned metadata from safetensors2verilog's frontend (see core.SIGNAL_REGISTRY_SCHEMA_VERSION_LATEST) is the place to land this.

Once that is in place, safetensors2verilog --circuit float16.add_full would emit a single composed top-level module by walking the now-explicit cross-stage wiring. The dependency-closure extractor already does the heavy lifting; it currently produces correct output for any circuit whose internal wiring is already explicit (e.g. arithmetic.ripplecarry8bit).

Workaround today

For end-to-end float16-add evaluation, run the existing gate-level fitness suite which exercises the composed pipeline through Python eval:

python eval_all.py variants/neural_alu8.safetensors --debug 2>&1 | grep -A 20 "FLOAT16 ADD"

For Verilog-side experimentation, pick one stage at a time and feed it driven testbench inputs that match the role each port plays in the original pipeline (read off build.py's infer_* routines for that stage).