phanerozoic
/

threshold-calculus

@@ -1,123 +1,172 @@
 # Threshold Calculus TODO
-## High Priority
-### Floating Point Circuits - Remaining Work
-#### `float16.mul` -- IEEE 754 multiplication (~800 gates, ~55/84 tests)
-**Problem**: Multi-bit carry propagation in 11x11 mantissa multiplier.
-**Background**: The mantissa multiplier produces a 22-bit product from two 11-bit mantissas (including implicit leading 1). Each column `i` has `min(i+1, 21-i)` partial products (AND gates). Column 10 has the maximum of 11 partial products.
-**Current Implementation**:
-- Column sums computed via threshold gates: `col_sum = parity(PP_0, PP_1, ..., PP_n)`
-- Parity computed as `(ge1 AND NOT ge2) OR (ge3 AND NOT ge4) OR ...`
-- `col_bit1` = floor(sum/2) mod 2 (carry to next position)
-- `col_bit2` = floor(sum/4) mod 2 (carry to position i+2)
-- `col_bit3` = floor(sum/8) mod 2 (carry to position i+3)
-- Carry accumulator gates sum incoming carries from multiple columns
-**Remaining Issue**: The carry accumulator can itself produce a carry (`carry_acc_carry`) when the sum of incoming carry bits is >= 2. This secondary carry needs to propagate to position i+1, creating a cascading effect that requires either:
-1. A proper CSA (Carry Save Adder) tree structure, or
-2. A secondary FA chain for accumulated carries, or
-3. Iterating until carry stabilization
-**Files**: `convert_to_explicit_inputs.py` lines 5350-5650 (build), lines 2400-2700 (infer)
 ---
-#### `float16.div` -- IEEE 754 division (~1900 gates, ~5/53 tests)
-**Problem**: Same multi-bit carry issue as multiplication, plus potential issues in the non-restoring division algorithm.
-**Background**: Division uses non-restoring algorithm with 11-bit dividend and divisor. The quotient mantissa is computed iteratively, and similar column reduction issues arise.
-**Current Implementation**:
-- NaN output bit 9 fixed (canonical NaN = 0x7E00)
-- Column sum parity gates similar to multiplication
-**Remaining Issues**:
-1. Same multi-bit carry propagation problem as multiplication
-2. May have additional issues in division-specific logic (partial remainder computation)
-**Files**: `convert_to_explicit_inputs.py` lines 5700-6200 (build), lines 2700-3100 (infer)
 ---
-### Potential Solutions for Carry Propagation
-1. **Wallace Tree**: Replace column reduction with proper Wallace tree structure. More gates but handles arbitrary partial product counts correctly.
-2. **Dadda Tree**: Similar to Wallace but minimizes gate count per level.
-3. **Iterative Carry Resolution**: After initial FA chain, detect remaining carries and iterate until stable. Simple but slow.
-4. **Hybrid Approach**: Use threshold gates for small columns (2-3 PPs) and proper tree reduction for larger columns.
 ---
-## Medium Priority
-### Extended Integer Arithmetic
-- [ ] `arithmetic.ripplecarry16bit` -- 16-bit addition
-- [ ] `arithmetic.multiplier16x16` -- 16-bit multiplication
-- [ ] `arithmetic.div16bit` -- 16-bit division
-- [ ] `arithmetic.sqrt8bit` -- integer square root
-- [ ] `arithmetic.gcd8bit` -- greatest common divisor
-- [ ] `arithmetic.lcm8bit` -- least common multiple
-## Low Priority
-### Transcendental Approximations
-- [ ] `approx.sin8bit` -- sine via CORDIC or lookup
-- [ ] `approx.cos8bit` -- cosine
-- [ ] `approx.exp8bit` -- exponential
-- [ ] `approx.log8bit` -- logarithm
-### Pruning Experiments
-- [ ] Weight magnitude pruning study
-- [ ] Quantization to int8/int4
-- [ ] Sparse representation conversion
-- [ ] Knowledge distillation to smaller networks
-### Documentation
-- [ ] Circuit diagrams for complex circuits (divider, multiplier)
-- [ ] Tutorial: building custom circuits
-- [ ] Tutorial: integrating with transformers
 ## Completed
-### Floating Point Circuits
-- [x] `float16.unpack` -- extract sign, exponent, mantissa (16 gates, 63/63 tests)
-- [x] `float16.pack` -- assemble from components (16 gates, 63/63 tests)
-- [x] `float16.cmp` -- comparison a > b (14 gates, 113/113 tests)
-- [x] `float16.normalize` -- CLZ-based shift calculator (51 gates, 14/14 tests)
-- [x] `float16.add` -- IEEE 754 addition (~998 gates, 125/125 tests)
-- [x] `float16.sub` -- IEEE 754 subtraction (via add with -b, 115/115 tests)
-- [x] `float16.toint` -- float16 to int16 (401 gates, 93/93 tests)
-- [x] `float16.fromint` -- int16 to float16 (478 gates, 53/53 tests)
-- [x] `float16.neg` -- sign flip (16 gates, 58/58 tests)
-- [x] `float16.abs` -- clear sign bit (16 gates, 58/58 tests)
-### Supporting Infrastructure
-- [x] `arithmetic.clz8bit` -- 8-bit count leading zeros (30 gates, 256/256 tests)
-- [x] `arithmetic.clz16bit` -- 16-bit count leading zeros (63 gates, 217/217 tests)
-- [x] Full circuit evaluation using .inputs topology
-- [x] Exhaustive testing for boolean, threshold, CLZ, float16, comparator circuits
-- [x] Automatic topological sort from signal registry
-### Core Circuits
 - [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
-- [x] Arithmetic adders (half, full, ripple carry 2/4/8 bit)
-- [x] Arithmetic subtraction (SUB, SBC, NEG)
-- [x] Arithmetic multiplication (2x2, 4x4, 8x8)
-- [x] Arithmetic division (8-bit with quotient and remainder)
-- [x] Comparators (>, <, >=, <=, ==)
-- [x] Shifts and rotates (ASR, ROL, ROR)
 - [x] Threshold gates (k-of-n for k=1..8)
 - [x] Modular arithmetic (mod 2-12)
-- [x] Pattern recognition (popcount, all zeros/ones, one-hot, symmetry)
 - [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
-- [x] Self-documenting format with .inputs tensors
 - [x] Signal registry in safetensors metadata

 # Threshold Calculus TODO
+Goal: Complete arithmetic coprocessor for LLM mathematical reasoning.
 ---
+## High Priority -- Core Mathematical Functions
+### Powers and Roots (float16)
+- [ ] `float16.sqrt` -- square root via Newton-Raphson or digit-by-digit
+- [ ] `float16.rsqrt` -- reciprocal square root (useful for normalization)
+- [ ] `float16.pow` -- x^y for arbitrary y (via exp/ln)
+- [ ] `float16.sq` -- x² (optimized special case)
+- [ ] `float16.cube` -- x³ (optimized special case)
+- [ ] `float16.cbrt` -- cube root
+### Exponentials and Logarithms (float16)
+- [ ] `float16.exp` -- e^x via range reduction + polynomial
+- [ ] `float16.exp2` -- 2^x (simpler, useful for pow)
+- [ ] `float16.ln` -- natural logarithm
+- [ ] `float16.log2` -- base-2 logarithm (extract exponent + correction)
+- [ ] `float16.log10` -- base-10 logarithm
+### Trigonometry (float16, CORDIC)
+- [ ] `float16.sin` -- sine
+- [ ] `float16.cos` -- cosine
+- [ ] `float16.tan` -- tangent (sin/cos)
+- [ ] `float16.sincos` -- both sin and cos (CORDIC gives both)
+- [ ] `float16.asin` -- arc sine
+- [ ] `float16.acos` -- arc cosine
+- [ ] `float16.atan` -- arc tangent
+- [ ] `float16.atan2` -- two-argument arc tangent (quadrant-aware)
+### Hyperbolic Functions (float16)
+- [ ] `float16.sinh` -- hyperbolic sine
+- [ ] `float16.cosh` -- hyperbolic cosine
+- [ ] `float16.tanh` -- hyperbolic tangent (critical for ML activations)
 ---
+## Medium Priority -- Extended Operations
+### Rounding and Truncation (float16)
+- [ ] `float16.floor` -- round toward -∞
+- [ ] `float16.ceil` -- round toward +∞
+- [ ] `float16.trunc` -- round toward zero
+- [ ] `float16.round` -- round to nearest
+- [ ] `float16.frac` -- fractional part
+- [ ] `float16.fmod` -- floating-point modulo
+### Comparisons and Selection (float16)
+- [ ] `float16.min` -- minimum of two values
+- [ ] `float16.max` -- maximum of two values
+- [ ] `float16.clamp` -- clamp to range [lo, hi]
+- [ ] `float16.sign` -- sign function (-1, 0, +1)
+- [ ] `float16.copysign` -- copy sign from y to x
+- [ ] `float16.isnan` -- NaN test
+- [ ] `float16.isinf` -- infinity test
+- [ ] `float16.isfinite` -- finite test
+### Integer Arithmetic (16-bit)
+- [ ] `arithmetic.add16` -- 16-bit addition
+- [ ] `arithmetic.sub16` -- 16-bit subtraction
+- [ ] `arithmetic.mul16` -- 16-bit multiplication
+- [ ] `arithmetic.div16` -- 16-bit division with remainder
+- [ ] `arithmetic.sqrt16` -- 16-bit integer square root
+- [ ] `arithmetic.abs16` -- 16-bit absolute value
+### Number Theory
+- [ ] `arithmetic.gcd` -- greatest common divisor (Euclidean)
+- [ ] `arithmetic.lcm` -- least common multiple
+- [ ] `arithmetic.isprime8` -- primality test (8-bit)
+- [ ] `arithmetic.factorial8` -- factorial (8! = 40320 fits in 16-bit)
+- [ ] `arithmetic.comb` -- binomial coefficient nCr
+- [ ] `arithmetic.perm` -- permutation nPr
+---
+## Lower Priority -- Specialized Functions
+### ML Activation Functions (float16)
+- [ ] `float16.relu` -- max(0, x)
+- [ ] `float16.leaky_relu` -- x if x > 0 else αx
+- [ ] `float16.sigmoid` -- 1/(1+e^(-x))
+- [ ] `float16.softplus` -- ln(1+e^x)
+- [ ] `float16.gelu` -- Gaussian error linear unit
+- [ ] `float16.silu` -- x * sigmoid(x)
+### Constants (float16 encoded)
+- [ ] `const.pi` -- π = 3.14159...
+- [ ] `const.e` -- e = 2.71828...
+- [ ] `const.phi` -- φ = 1.61803... (golden ratio)
+- [ ] `const.sqrt2` -- √2 = 1.41421...
+- [ ] `const.ln2` -- ln(2) = 0.69314...
+- [ ] `const.log2e` -- log₂(e) = 1.44269...
+### Statistics (float16, multi-input)
+- [ ] `stats.sum` -- sum of array
+- [ ] `stats.mean` -- arithmetic mean
+- [ ] `stats.min_array` -- minimum of array
+- [ ] `stats.max_array` -- maximum of array
+- [ ] `stats.variance` -- population variance
+- [ ] `stats.stddev` -- standard deviation
+### Bit Manipulation (16-bit)
+- [ ] `bits.popcnt16` -- population count
+- [ ] `bits.clz16` -- count leading zeros (done)
+- [ ] `bits.ctz16` -- count trailing zeros
+- [ ] `bits.reverse16` -- bit reversal
+- [ ] `bits.bswap16` -- byte swap
 ---
+## Infrastructure
+### Testing
+- [ ] Exhaustive float16 tests for new operations
+- [ ] Edge case coverage (±0, ±inf, NaN, subnormals)
+- [ ] Accuracy tests against reference implementations
+### Documentation
+- [ ] Circuit diagrams for CORDIC, Newton-Raphson
+- [ ] Tutorial: implementing new circuits
+- [ ] Tutorial: LLM integration patterns
+- [ ] API reference for all operations
+### Optimization
+- [ ] Gate count reduction analysis
+- [ ] Critical path optimization
+- [ ] Weight quantization study (int8/int4)
+---
 ## Completed
+### Float16 Core Arithmetic
+- [x] `float16.add` -- IEEE 754 addition (~998 gates)
+- [x] `float16.sub` -- IEEE 754 subtraction
+- [x] `float16.mul` -- IEEE 754 multiplication (~1302 gates)
+- [x] `float16.div` -- IEEE 754 division (~1854 gates)
+- [x] `float16.neg` -- sign flip
+- [x] `float16.abs` -- absolute value
+- [x] `float16.cmp` -- comparison
+### Float16 Utilities
+- [x] `float16.unpack` -- extract sign, exponent, mantissa
+- [x] `float16.pack` -- assemble components
+- [x] `float16.normalize` -- CLZ-based normalization
+- [x] `float16.toint` -- convert to int16
+- [x] `float16.fromint` -- convert from int16
+### Integer Arithmetic (8-bit)
+- [x] Adders (half, full, ripple carry 2/4/8 bit)
+- [x] Subtraction, negation
+- [x] Multiplication (2x2, 4x4, 8x8)
+- [x] Division (8-bit with remainder)
+- [x] Comparators (all relations)
+- [x] CLZ (8-bit and 16-bit)
+### Logic and Patterns
 - [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
 - [x] Threshold gates (k-of-n for k=1..8)
 - [x] Modular arithmetic (mod 2-12)
+- [x] Pattern recognition (popcount, one-hot, symmetry)
 - [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
+- [x] Shifts and rotates
+### Infrastructure
+- [x] Self-documenting .inputs tensors
 - [x] Signal registry in safetensors metadata
+- [x] Full circuit evaluation with topological sort
+- [x] Comprehensive test suite (7177 tests, 100% pass)