8bit-threshold-computer / docs /float-pipeline.md
CharlesCNorton
eval_all: hash-keyed result cache (--cache-dir, --no-cache); README: bit-ordering scope rules; docs/ISA.md: opcode reference and end-to-end tutorial; docs/float-pipeline.md: composition gap notes
597e7c2
# IEEE-754 add: composing the stages
The threshold network ships separate `float16.unpack`, `float16.add`, `float16.normalize`, and `float16.pack` (and the `float32.*` siblings). Each is an independent threshold-logic subcircuit with its own external ports.
## What you actually get from each stage
| Stage | Inputs | Outputs |
|---|---|---|
| `float16.unpack` | 1 generic input bit | 16 `bit0..bit15` outputs (sign / exp / mantissa fields) |
| `float16.add` | 5 `exp_a` + 5 `exp_b` + 5 generic input bits = 15 inputs | 499 outputs covering align, exp_diff, mant_add, mant_sub, mant_select, sign_xor stages |
| `float16.normalize` | 2 generic input bits | 356 outputs covering exp_adj and mantissa-shift stages |
| `float16.pack` | 1 generic input bit | 16 `bit0..bit15` outputs (assembled IEEE-754 word) |
Use `python -m safetensors2verilog ... --circuit float16.<stage> --inspect` to print the live contract for any stage from the variant you're using.
## What's missing for a single composed block
None of the four stages exposes ports named in a way that says "I produce the operand the next stage consumes." `float16.unpack` outputs `float16.unpack.bit0..15`; `float16.add` consumes `$float16_exp_a[0..4]` and an opaque `$input[0..4]`. The wire-up between `unpack` and `add` is not encoded in the safetensors metadata. It exists only in the original construction code at `build.py` and in the gate-fitness harness at `eval.py`.
Two consequences:
1. **Subcircuit extraction works**, but each stage compiles to a standalone module whose bit-bag inputs and outputs need a hand-written wrapper to chain.
2. **A single `float16.add_full` Verilog block** that runs the complete IEEE-754 pipeline for a 16-bit operand pair is not derivable from the published safetensors files alone.
## The intended path forward
The right fix is in `build.py`: when generating each float stage, register the inter-stage signal IDs so that, e.g., `float16.unpack.bit15` (sign of `a`) and `float16.add.sign_a` are the same signal in the global registry, with metadata that exposes the per-stage contract by *role* (`sign_a`, `exp_a`, `mant_a`) rather than by ad-hoc port name. The schema-versioned metadata from `safetensors2verilog`'s frontend (see `core.SIGNAL_REGISTRY_SCHEMA_VERSION_LATEST`) is the place to land this.
Once that is in place, `safetensors2verilog --circuit float16.add_full` would emit a single composed top-level module by walking the now-explicit cross-stage wiring. The dependency-closure extractor already does the heavy lifting; it currently produces correct output for any circuit whose internal wiring is already explicit (e.g. `arithmetic.ripplecarry8bit`).
## Workaround today
For end-to-end float16-add evaluation, run the existing gate-level fitness suite which exercises the composed pipeline through Python eval:
```bash
python eval_all.py variants/neural_alu8.safetensors --debug 2>&1 | grep -A 20 "FLOAT16 ADD"
```
For Verilog-side experimentation, pick one stage at a time and feed it driven testbench inputs that match the role each port plays in the original pipeline (read off `build.py`'s `infer_*` routines for that stage).