8bit-threshold-computer / docs /float-pipeline.md

CharlesCNorton

eval_all: hash-keyed result cache (--cache-dir, --no-cache); README: bit-ordering scope rules; docs/ISA.md: opcode reference and end-to-end tutorial; docs/float-pipeline.md: composition gap notes

597e7c2 7 days ago

preview code

raw

history blame contribute delete

3.16 kB

	# IEEE-754 add: composing the stages

	The threshold network ships separate `float16.unpack`, `float16.add`, `float16.normalize`, and `float16.pack` (and the `float32.*` siblings). Each is an independent threshold-logic subcircuit with its own external ports.

	## What you actually get from each stage

	\| Stage \| Inputs \| Outputs \|
	\|---\|---\|---\|
	\| `float16.unpack` \| 1 generic input bit \| 16 `bit0..bit15` outputs (sign / exp / mantissa fields) \|
	\| `float16.add` \| 5 `exp_a` + 5 `exp_b` + 5 generic input bits = 15 inputs \| 499 outputs covering align, exp_diff, mant_add, mant_sub, mant_select, sign_xor stages \|
	\| `float16.normalize` \| 2 generic input bits \| 356 outputs covering exp_adj and mantissa-shift stages \|
	\| `float16.pack` \| 1 generic input bit \| 16 `bit0..bit15` outputs (assembled IEEE-754 word) \|

	Use `python -m safetensors2verilog ... --circuit float16.<stage> --inspect` to print the live contract for any stage from the variant you're using.

	## What's missing for a single composed block

	None of the four stages exposes ports named in a way that says "I produce the operand the next stage consumes." `float16.unpack` outputs `float16.unpack.bit0..15`; `float16.add` consumes `$float16_exp_a[0..4]` and an opaque `$input[0..4]`. The wire-up between `unpack` and `add` is not encoded in the safetensors metadata. It exists only in the original construction code at `build.py` and in the gate-fitness harness at `eval.py`.

	Two consequences:

	1. Subcircuit extraction works, but each stage compiles to a standalone module whose bit-bag inputs and outputs need a hand-written wrapper to chain.
	2. A single `float16.add_full` Verilog block that runs the complete IEEE-754 pipeline for a 16-bit operand pair is not derivable from the published safetensors files alone.

	## The intended path forward

	The right fix is in `build.py`: when generating each float stage, register the inter-stage signal IDs so that, e.g., `float16.unpack.bit15` (sign of `a`) and `float16.add.sign_a` are the same signal in the global registry, with metadata that exposes the per-stage contract by role (`sign_a`, `exp_a`, `mant_a`) rather than by ad-hoc port name. The schema-versioned metadata from `safetensors2verilog`'s frontend (see `core.SIGNAL_REGISTRY_SCHEMA_VERSION_LATEST`) is the place to land this.

	Once that is in place, `safetensors2verilog --circuit float16.add_full` would emit a single composed top-level module by walking the now-explicit cross-stage wiring. The dependency-closure extractor already does the heavy lifting; it currently produces correct output for any circuit whose internal wiring is already explicit (e.g. `arithmetic.ripplecarry8bit`).

	## Workaround today

	For end-to-end float16-add evaluation, run the existing gate-level fitness suite which exercises the composed pipeline through Python eval:

	```bash
	python eval_all.py variants/neural_alu8.safetensors --debug 2>&1 \| grep -A 20 "FLOAT16 ADD"
	```

	For Verilog-side experimentation, pick one stage at a time and feed it driven testbench inputs that match the role each port plays in the original pipeline (read off `build.py`'s `infer_*` routines for that stage).