Refresh README format + repro details

- fix encoding artifacts and ASCII-only notation
- add float16 architecture overview and step-by-step
- document format_version, signal registry, and .inputs resolution
- add reproduce steps with expected runtimes

Files changed (1) hide show

README.md +65 -15

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pipeline_tag: other
 Digital circuits encoded as neural network weights.
-Each gate is a threshold logic unit: `output = step(weights · inputs + bias)`. The step function fires when the weighted sum ≥ 0. This maps digital logic to tensor operations.
 ## What's Here
@@ -76,27 +76,47 @@ Accuracy/rounding:
 A threshold gate computes:
 ```
-output = 1 if (w₁x₁ + w₂x₂ + ... + wₙxₙ + bias) >= 0 else 0
 ```
 This is a perceptron with Heaviside step activation.
 **AND gate**: weights = [1, 1], bias = -1.5
-- (0,0): 0 + 0 - 1.5 = -1.5 < 0 → 0
-- (0,1): 0 + 1 - 1.5 = -0.5 < 0 → 0
-- (1,0): 1 + 0 - 1.5 = -0.5 < 0 → 0
-- (1,1): 1 + 1 - 1.5 = 0.5 ≥ 0 → 1
 **XOR** requires two layers (not linearly separable):
 - Layer 1: OR and NAND in parallel
 - Layer 2: AND of both outputs
 ## Self-Documenting Format
 Each gate has three tensors in `arithmetic.safetensors`:
-- `.weight` — input weights
-- `.bias` — threshold
-- `.inputs` — int64 tensor of signal IDs
 Signal registry in metadata maps IDs to names:
@@ -112,9 +132,39 @@ with safe_open('arithmetic.safetensors', framework='pt') as f:
 ```
 Signal naming:
-- `$name` — circuit input (e.g., `$a`, `$dividend[0]`)
-- `#0`, `#1` — constants
-- `gate.path` — output of another gate
 ## Running Eval
@@ -122,7 +172,7 @@ Signal naming:
 python eval.py
 ```
-Tests all circuits exhaustively. 8-bit operations test all 256 or 65,536 input combinations. Float16 tests cover special cases (NaN, Inf, ±0, subnormals) plus normal arithmetic.
 Eval runs full + verbose by default; there is no quick/verbose mode. Use --circuit to filter reported circuits.
 For coverage and input-routing validation:
@@ -135,9 +185,9 @@ python eval.py --coverage --inputs-coverage
 ## Development History
-Started as an 8-bit CPU project. Built boolean gates, then arithmetic (adders → multipliers → dividers), then CPU control logic. The CPU worked but the arithmetic core turned out to be the useful part, so it was extracted.
-Float16 was added later. The commit history shows the iterative process—float16.add went through multiple rounds of bug fixes for edge cases (zero handling, sign logic, normalization). Mul and div required multi-bit carry infrastructure.
 ## Project Origin

 Digital circuits encoded as neural network weights.
+Each gate is a threshold logic unit: `output = step(weights * inputs + bias)`. The step function fires when the weighted sum >= 0. This maps digital logic to tensor operations.
 ## What's Here
 A threshold gate computes:
 ```
+output = 1 if (w1*x1 + w2*x2 + ... + wn*xn + bias) >= 0 else 0
 ```
 This is a perceptron with Heaviside step activation.
 **AND gate**: weights = [1, 1], bias = -1.5
+- (0,0): 0 + 0 - 1.5 = -1.5 < 0 -> 0
+- (0,1): 0 + 1 - 1.5 = -0.5 < 0 -> 0
+- (1,0): 1 + 0 - 1.5 = -0.5 < 0 -> 0
+- (1,1): 1 + 1 - 1.5 = 0.5 >= 0 -> 1
 **XOR** requires two layers (not linearly separable):
 - Layer 1: OR and NAND in parallel
 - Layer 2: AND of both outputs
+## Float16 Architecture (Short)
+High-level dataflow:
+```
+float16.<op>
+  a,b -> unpack -> classify -> core op -> normalize/round -> pack -> out
+```
+Step-by-step (condensed):
+1) Unpack sign/exponent/mantissa. Subnormals use implicit 0, normals use implicit 1.
+2) Classify inputs: zero, subnormal, normal, inf, NaN.
+3) Core op:
+   - add/sub: align exponents, add/sub mantissas, compute sign.
+   - mul/div: add/sub exponents (minus bias), multiply/divide mantissas.
+   - unary LUT: lookup output for each 16-bit input (torch.float16), with NaN canonicalization.
+   - pow: ln(a) -> mul(b, ln(a)) -> exp, rounded at each stage.
+4) Normalize and round-to-nearest-even (CLZ + shifts).
+5) Pack sign/exponent/mantissa and mux special cases (NaN/Inf/zero).
 ## Self-Documenting Format
 Each gate has three tensors in `arithmetic.safetensors`:
+- `.weight` -- input weights
+- `.bias` -- threshold
+- `.inputs` -- int64 tensor of signal IDs (ordered to match `.weight`)
 Signal registry in metadata maps IDs to names:
 ```
 Signal naming:
+- `$name` -- circuit input (e.g., `$a`, `$dividend[0]`)
+- `#0`, `#1` -- constants
+- `gate.path` -- output of another gate
+Format details:
+- Metadata includes `signal_registry` (JSON map from ID to name) and `format_version` (currently `2.0`).
+- `.inputs` stores global signal IDs; these IDs are resolved through `signal_registry`.
+- External inputs are names starting with `$` or containing `.$` (e.g., `float16.add.$a[3]`).
+- All gates include `.inputs`; `build.py` infers them and `--inputs-coverage` fails if resolution is missing.
+## How to Reproduce
+Rebuild tensors:
+```bash
+python build.py
+```
+Run full evaluation (always full + verbose):
+```bash
+python eval.py
+```
+Run coverage and input-routing validation:
+```bash
+python eval.py --coverage --inputs-coverage
+```
+Expected runtimes (ballpark, CPU dependent):
+- `build.py`: ~1-2 minutes, produces ~247 MB `arithmetic.safetensors`
+- `eval.py --coverage --inputs-coverage`: ~3-4 minutes for 211,581 tests
 ## Running Eval
 python eval.py
 ```
+Tests all circuits. Small circuits are exhaustive; 16-bit arithmetic is sampled on grids (plus edge cases). Float16 tests cover special cases (NaN, Inf, +/-0, subnormals) plus normal arithmetic.
 Eval runs full + verbose by default; there is no quick/verbose mode. Use --circuit to filter reported circuits.
 For coverage and input-routing validation:
 ## Development History
+Started as an 8-bit CPU project. Built boolean gates, then arithmetic (adders -> multipliers -> dividers), then CPU control logic. The CPU worked but the arithmetic core turned out to be the useful part, so it was extracted.
+Float16 was added later. The commit history shows the iterative process--float16.add went through multiple rounds of bug fixes for edge cases (zero handling, sign logic, normalization). Mul and div required multi-bit carry infrastructure.
 ## Project Origin