CharlesCNorton commited on
Commit
c94d9ea
·
1 Parent(s): 1a5962e

Expand TODO with comprehensive LLM coprocessor operations

Browse files
Files changed (1) hide show
  1. TODO.md +147 -98
TODO.md CHANGED
@@ -1,123 +1,172 @@
1
  # Threshold Calculus TODO
2
 
3
- ## High Priority
4
-
5
- ### Floating Point Circuits - Remaining Work
6
-
7
- #### `float16.mul` -- IEEE 754 multiplication (~800 gates, ~55/84 tests)
8
-
9
- **Problem**: Multi-bit carry propagation in 11x11 mantissa multiplier.
10
-
11
- **Background**: The mantissa multiplier produces a 22-bit product from two 11-bit mantissas (including implicit leading 1). Each column `i` has `min(i+1, 21-i)` partial products (AND gates). Column 10 has the maximum of 11 partial products.
12
-
13
- **Current Implementation**:
14
- - Column sums computed via threshold gates: `col_sum = parity(PP_0, PP_1, ..., PP_n)`
15
- - Parity computed as `(ge1 AND NOT ge2) OR (ge3 AND NOT ge4) OR ...`
16
- - `col_bit1` = floor(sum/2) mod 2 (carry to next position)
17
- - `col_bit2` = floor(sum/4) mod 2 (carry to position i+2)
18
- - `col_bit3` = floor(sum/8) mod 2 (carry to position i+3)
19
- - Carry accumulator gates sum incoming carries from multiple columns
20
-
21
- **Remaining Issue**: The carry accumulator can itself produce a carry (`carry_acc_carry`) when the sum of incoming carry bits is >= 2. This secondary carry needs to propagate to position i+1, creating a cascading effect that requires either:
22
- 1. A proper CSA (Carry Save Adder) tree structure, or
23
- 2. A secondary FA chain for accumulated carries, or
24
- 3. Iterating until carry stabilization
25
-
26
- **Files**: `convert_to_explicit_inputs.py` lines 5350-5650 (build), lines 2400-2700 (infer)
27
 
28
  ---
29
 
30
- #### `float16.div` -- IEEE 754 division (~1900 gates, ~5/53 tests)
31
-
32
- **Problem**: Same multi-bit carry issue as multiplication, plus potential issues in the non-restoring division algorithm.
33
-
34
- **Background**: Division uses non-restoring algorithm with 11-bit dividend and divisor. The quotient mantissa is computed iteratively, and similar column reduction issues arise.
35
-
36
- **Current Implementation**:
37
- - NaN output bit 9 fixed (canonical NaN = 0x7E00)
38
- - Column sum parity gates similar to multiplication
39
-
40
- **Remaining Issues**:
41
- 1. Same multi-bit carry propagation problem as multiplication
42
- 2. May have additional issues in division-specific logic (partial remainder computation)
43
-
44
- **Files**: `convert_to_explicit_inputs.py` lines 5700-6200 (build), lines 2700-3100 (infer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ---
47
 
48
- ### Potential Solutions for Carry Propagation
49
-
50
- 1. **Wallace Tree**: Replace column reduction with proper Wallace tree structure. More gates but handles arbitrary partial product counts correctly.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
- 2. **Dadda Tree**: Similar to Wallace but minimizes gate count per level.
53
-
54
- 3. **Iterative Carry Resolution**: After initial FA chain, detect remaining carries and iterate until stable. Simple but slow.
55
 
56
- 4. **Hybrid Approach**: Use threshold gates for small columns (2-3 PPs) and proper tree reduction for larger columns.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  ---
59
 
60
- ## Medium Priority
61
 
62
- ### Extended Integer Arithmetic
63
- - [ ] `arithmetic.ripplecarry16bit` -- 16-bit addition
64
- - [ ] `arithmetic.multiplier16x16` -- 16-bit multiplication
65
- - [ ] `arithmetic.div16bit` -- 16-bit division
66
- - [ ] `arithmetic.sqrt8bit` -- integer square root
67
- - [ ] `arithmetic.gcd8bit` -- greatest common divisor
68
- - [ ] `arithmetic.lcm8bit` -- least common multiple
69
 
70
- ## Low Priority
71
-
72
- ### Transcendental Approximations
73
- - [ ] `approx.sin8bit` -- sine via CORDIC or lookup
74
- - [ ] `approx.cos8bit` -- cosine
75
- - [ ] `approx.exp8bit` -- exponential
76
- - [ ] `approx.log8bit` -- logarithm
77
 
78
- ### Pruning Experiments
79
- - [ ] Weight magnitude pruning study
80
- - [ ] Quantization to int8/int4
81
- - [ ] Sparse representation conversion
82
- - [ ] Knowledge distillation to smaller networks
83
 
84
- ### Documentation
85
- - [ ] Circuit diagrams for complex circuits (divider, multiplier)
86
- - [ ] Tutorial: building custom circuits
87
- - [ ] Tutorial: integrating with transformers
88
 
89
  ## Completed
90
 
91
- ### Floating Point Circuits
92
- - [x] `float16.unpack` -- extract sign, exponent, mantissa (16 gates, 63/63 tests)
93
- - [x] `float16.pack` -- assemble from components (16 gates, 63/63 tests)
94
- - [x] `float16.cmp` -- comparison a > b (14 gates, 113/113 tests)
95
- - [x] `float16.normalize` -- CLZ-based shift calculator (51 gates, 14/14 tests)
96
- - [x] `float16.add` -- IEEE 754 addition (~998 gates, 125/125 tests)
97
- - [x] `float16.sub` -- IEEE 754 subtraction (via add with -b, 115/115 tests)
98
- - [x] `float16.toint` -- float16 to int16 (401 gates, 93/93 tests)
99
- - [x] `float16.fromint` -- int16 to float16 (478 gates, 53/53 tests)
100
- - [x] `float16.neg` -- sign flip (16 gates, 58/58 tests)
101
- - [x] `float16.abs` -- clear sign bit (16 gates, 58/58 tests)
102
-
103
- ### Supporting Infrastructure
104
- - [x] `arithmetic.clz8bit` -- 8-bit count leading zeros (30 gates, 256/256 tests)
105
- - [x] `arithmetic.clz16bit` -- 16-bit count leading zeros (63 gates, 217/217 tests)
106
- - [x] Full circuit evaluation using .inputs topology
107
- - [x] Exhaustive testing for boolean, threshold, CLZ, float16, comparator circuits
108
- - [x] Automatic topological sort from signal registry
109
-
110
- ### Core Circuits
 
 
 
 
 
111
  - [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
112
- - [x] Arithmetic adders (half, full, ripple carry 2/4/8 bit)
113
- - [x] Arithmetic subtraction (SUB, SBC, NEG)
114
- - [x] Arithmetic multiplication (2x2, 4x4, 8x8)
115
- - [x] Arithmetic division (8-bit with quotient and remainder)
116
- - [x] Comparators (>, <, >=, <=, ==)
117
- - [x] Shifts and rotates (ASR, ROL, ROR)
118
  - [x] Threshold gates (k-of-n for k=1..8)
119
  - [x] Modular arithmetic (mod 2-12)
120
- - [x] Pattern recognition (popcount, all zeros/ones, one-hot, symmetry)
121
  - [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
122
- - [x] Self-documenting format with .inputs tensors
 
 
 
123
  - [x] Signal registry in safetensors metadata
 
 
 
1
  # Threshold Calculus TODO
2
 
3
+ Goal: Complete arithmetic coprocessor for LLM mathematical reasoning.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ---
6
 
7
+ ## High Priority -- Core Mathematical Functions
8
+
9
+ ### Powers and Roots (float16)
10
+ - [ ] `float16.sqrt` -- square root via Newton-Raphson or digit-by-digit
11
+ - [ ] `float16.rsqrt` -- reciprocal square root (useful for normalization)
12
+ - [ ] `float16.pow` -- x^y for arbitrary y (via exp/ln)
13
+ - [ ] `float16.sq` -- x² (optimized special case)
14
+ - [ ] `float16.cube` -- (optimized special case)
15
+ - [ ] `float16.cbrt` -- cube root
16
+
17
+ ### Exponentials and Logarithms (float16)
18
+ - [ ] `float16.exp` -- e^x via range reduction + polynomial
19
+ - [ ] `float16.exp2` -- 2^x (simpler, useful for pow)
20
+ - [ ] `float16.ln` -- natural logarithm
21
+ - [ ] `float16.log2` -- base-2 logarithm (extract exponent + correction)
22
+ - [ ] `float16.log10` -- base-10 logarithm
23
+
24
+ ### Trigonometry (float16, CORDIC)
25
+ - [ ] `float16.sin` -- sine
26
+ - [ ] `float16.cos` -- cosine
27
+ - [ ] `float16.tan` -- tangent (sin/cos)
28
+ - [ ] `float16.sincos` -- both sin and cos (CORDIC gives both)
29
+ - [ ] `float16.asin` -- arc sine
30
+ - [ ] `float16.acos` -- arc cosine
31
+ - [ ] `float16.atan` -- arc tangent
32
+ - [ ] `float16.atan2` -- two-argument arc tangent (quadrant-aware)
33
+
34
+ ### Hyperbolic Functions (float16)
35
+ - [ ] `float16.sinh` -- hyperbolic sine
36
+ - [ ] `float16.cosh` -- hyperbolic cosine
37
+ - [ ] `float16.tanh` -- hyperbolic tangent (critical for ML activations)
38
 
39
  ---
40
 
41
+ ## Medium Priority -- Extended Operations
42
+
43
+ ### Rounding and Truncation (float16)
44
+ - [ ] `float16.floor` -- round toward -∞
45
+ - [ ] `float16.ceil` -- round toward +∞
46
+ - [ ] `float16.trunc` -- round toward zero
47
+ - [ ] `float16.round` -- round to nearest
48
+ - [ ] `float16.frac` -- fractional part
49
+ - [ ] `float16.fmod` -- floating-point modulo
50
+
51
+ ### Comparisons and Selection (float16)
52
+ - [ ] `float16.min` -- minimum of two values
53
+ - [ ] `float16.max` -- maximum of two values
54
+ - [ ] `float16.clamp` -- clamp to range [lo, hi]
55
+ - [ ] `float16.sign` -- sign function (-1, 0, +1)
56
+ - [ ] `float16.copysign` -- copy sign from y to x
57
+ - [ ] `float16.isnan` -- NaN test
58
+ - [ ] `float16.isinf` -- infinity test
59
+ - [ ] `float16.isfinite` -- finite test
60
+
61
+ ### Integer Arithmetic (16-bit)
62
+ - [ ] `arithmetic.add16` -- 16-bit addition
63
+ - [ ] `arithmetic.sub16` -- 16-bit subtraction
64
+ - [ ] `arithmetic.mul16` -- 16-bit multiplication
65
+ - [ ] `arithmetic.div16` -- 16-bit division with remainder
66
+ - [ ] `arithmetic.sqrt16` -- 16-bit integer square root
67
+ - [ ] `arithmetic.abs16` -- 16-bit absolute value
68
+
69
+ ### Number Theory
70
+ - [ ] `arithmetic.gcd` -- greatest common divisor (Euclidean)
71
+ - [ ] `arithmetic.lcm` -- least common multiple
72
+ - [ ] `arithmetic.isprime8` -- primality test (8-bit)
73
+ - [ ] `arithmetic.factorial8` -- factorial (8! = 40320 fits in 16-bit)
74
+ - [ ] `arithmetic.comb` -- binomial coefficient nCr
75
+ - [ ] `arithmetic.perm` -- permutation nPr
76
 
77
+ ---
 
 
78
 
79
+ ## Lower Priority -- Specialized Functions
80
+
81
+ ### ML Activation Functions (float16)
82
+ - [ ] `float16.relu` -- max(0, x)
83
+ - [ ] `float16.leaky_relu` -- x if x > 0 else αx
84
+ - [ ] `float16.sigmoid` -- 1/(1+e^(-x))
85
+ - [ ] `float16.softplus` -- ln(1+e^x)
86
+ - [ ] `float16.gelu` -- Gaussian error linear unit
87
+ - [ ] `float16.silu` -- x * sigmoid(x)
88
+
89
+ ### Constants (float16 encoded)
90
+ - [ ] `const.pi` -- π = 3.14159...
91
+ - [ ] `const.e` -- e = 2.71828...
92
+ - [ ] `const.phi` -- φ = 1.61803... (golden ratio)
93
+ - [ ] `const.sqrt2` -- √2 = 1.41421...
94
+ - [ ] `const.ln2` -- ln(2) = 0.69314...
95
+ - [ ] `const.log2e` -- log₂(e) = 1.44269...
96
+
97
+ ### Statistics (float16, multi-input)
98
+ - [ ] `stats.sum` -- sum of array
99
+ - [ ] `stats.mean` -- arithmetic mean
100
+ - [ ] `stats.min_array` -- minimum of array
101
+ - [ ] `stats.max_array` -- maximum of array
102
+ - [ ] `stats.variance` -- population variance
103
+ - [ ] `stats.stddev` -- standard deviation
104
+
105
+ ### Bit Manipulation (16-bit)
106
+ - [ ] `bits.popcnt16` -- population count
107
+ - [ ] `bits.clz16` -- count leading zeros (done)
108
+ - [ ] `bits.ctz16` -- count trailing zeros
109
+ - [ ] `bits.reverse16` -- bit reversal
110
+ - [ ] `bits.bswap16` -- byte swap
111
 
112
  ---
113
 
114
+ ## Infrastructure
115
 
116
+ ### Testing
117
+ - [ ] Exhaustive float16 tests for new operations
118
+ - [ ] Edge case coverage (±0, ±inf, NaN, subnormals)
119
+ - [ ] Accuracy tests against reference implementations
 
 
 
120
 
121
+ ### Documentation
122
+ - [ ] Circuit diagrams for CORDIC, Newton-Raphson
123
+ - [ ] Tutorial: implementing new circuits
124
+ - [ ] Tutorial: LLM integration patterns
125
+ - [ ] API reference for all operations
 
 
126
 
127
+ ### Optimization
128
+ - [ ] Gate count reduction analysis
129
+ - [ ] Critical path optimization
130
+ - [ ] Weight quantization study (int8/int4)
 
131
 
132
+ ---
 
 
 
133
 
134
  ## Completed
135
 
136
+ ### Float16 Core Arithmetic
137
+ - [x] `float16.add` -- IEEE 754 addition (~998 gates)
138
+ - [x] `float16.sub` -- IEEE 754 subtraction
139
+ - [x] `float16.mul` -- IEEE 754 multiplication (~1302 gates)
140
+ - [x] `float16.div` -- IEEE 754 division (~1854 gates)
141
+ - [x] `float16.neg` -- sign flip
142
+ - [x] `float16.abs` -- absolute value
143
+ - [x] `float16.cmp` -- comparison
144
+
145
+ ### Float16 Utilities
146
+ - [x] `float16.unpack` -- extract sign, exponent, mantissa
147
+ - [x] `float16.pack` -- assemble components
148
+ - [x] `float16.normalize` -- CLZ-based normalization
149
+ - [x] `float16.toint` -- convert to int16
150
+ - [x] `float16.fromint` -- convert from int16
151
+
152
+ ### Integer Arithmetic (8-bit)
153
+ - [x] Adders (half, full, ripple carry 2/4/8 bit)
154
+ - [x] Subtraction, negation
155
+ - [x] Multiplication (2x2, 4x4, 8x8)
156
+ - [x] Division (8-bit with remainder)
157
+ - [x] Comparators (all relations)
158
+ - [x] CLZ (8-bit and 16-bit)
159
+
160
+ ### Logic and Patterns
161
  - [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
 
 
 
 
 
 
162
  - [x] Threshold gates (k-of-n for k=1..8)
163
  - [x] Modular arithmetic (mod 2-12)
164
+ - [x] Pattern recognition (popcount, one-hot, symmetry)
165
  - [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
166
+ - [x] Shifts and rotates
167
+
168
+ ### Infrastructure
169
+ - [x] Self-documenting .inputs tensors
170
  - [x] Signal registry in safetensors metadata
171
+ - [x] Full circuit evaluation with topological sort
172
+ - [x] Comprehensive test suite (7177 tests, 100% pass)