CharlesCNorton commited on
Commit
65db75b
·
1 Parent(s): b933255

Rewrite README: cut marketing fluff, focus on float16

Browse files
Files changed (1) hide show
  1. README.md +86 -763
README.md CHANGED
@@ -14,138 +14,84 @@ pipeline_tag: other
14
 
15
  # Threshold Calculus
16
 
17
- **Arithmetic coprocessor for LLMs, implemented as threshold logic gates.**
18
 
19
- This is a runtime component, not a proof artifact. The circuits embed directly into transformer MLP layers as a reusable arithmetic unit. The model learns when to route through circuits vs standard MLP paths. Inference runs in PyTorch.
20
 
21
- Early training runs embedding these circuits into SmolLM2-360M show significant accuracy improvements on arithmetic tasks.
22
 
23
- The repository contains an arithmetic core implemented as threshold logic gates stored in safetensors format. Every tensor represents a neural network weight or bias that, when combined with a Heaviside step activation function, computes exact arithmetic operations. All circuits pass exhaustive testing (7,177 tests, 100% pass rate).
 
 
 
 
 
24
 
25
- ---
26
-
27
- ## Table of Contents
28
-
29
- 1. [Overview](#overview)
30
- 2. [Project History](#project-history)
31
- 3. [The Pivot to Arithmetic](#the-pivot-to-arithmetic)
32
- 4. [What This Model Contains](#what-this-model-contains)
33
- 5. [How Threshold Logic Works](#how-threshold-logic-works)
34
- 6. [Circuit Catalog](#circuit-catalog)
35
- 7. [Evaluation and Verification](#evaluation-and-verification)
36
- 8. [Intended Use Cases](#intended-use-cases)
37
- 9. [Integration with Language Models](#integration-with-language-models)
38
- 10. [Pruning Experiments](#pruning-experiments)
39
- 11. [Limitations](#limitations)
40
- 12. [Future Work](#future-work)
41
- 13. [Technical Details](#technical-details)
42
- 14. [Citation](#citation)
43
- 15. [License](#license)
44
-
45
- ---
46
-
47
- ## Overview
48
-
49
- Threshold Calculus is an arithmetic computation core built entirely from threshold logic gates. Unlike traditional digital circuits that use discrete components, this implementation encodes every gate as a single neuron with learned weights and biases. The key insight is that threshold logic gates are computationally equivalent to single-layer perceptrons with step activation functions, meaning we can represent arbitrary digital circuits as neural network weights.
50
-
51
- The model contains 5,094 tensors totaling 575KB. These tensors implement:
52
-
53
- - Full 8-bit integer arithmetic (addition, subtraction, multiplication, division)
54
- - All standard comparison operations
55
- - Bitwise and logical operations
56
- - Modular arithmetic (divisibility testing for mod 2 through mod 12)
57
- - Pattern recognition primitives (popcount, leading zeros, symmetry detection)
58
- - Threshold voting circuits (k-of-n gates, majority, minority)
59
- - Combinational building blocks (multiplexers, demultiplexers, encoders, decoders)
60
-
61
- Every circuit has been exhaustively tested against all possible inputs. The 8-bit adder has been verified against all 65,536 input combinations. The 8-bit multiplier has been tested against representative samples including edge cases, powers of two, and adversarial bit patterns. The 8-bit divider produces correct quotients and remainders for all tested dividend/divisor pairs.
62
-
63
- ---
64
-
65
- ## Project History
66
-
67
- This project began as an attempt to build a complete 8-bit CPU using threshold logic. The original goal was ambitious: create a Turing-complete computer where every logic gate, every flip-flop, every control signal was implemented as a neural network weight. The CPU would have registers, a program counter, an instruction decoder, conditional jumps, a stack, and the ability to run arbitrary programs.
68
-
69
- The development proceeded through several phases:
70
-
71
- ### Phase 1: Boolean Foundations
72
-
73
- We started by implementing the basic Boolean gates. AND, OR, NOT, NAND, and NOR gates are trivially implementable as single threshold neurons. A 2-input AND gate, for example, uses weights [1, 1] and bias -2, firing only when both inputs are 1. XOR and XNOR required two-layer networks because they are not linearly separable. We developed standard templates for these gates that could be instantiated throughout the design.
74
-
75
- ### Phase 2: Arithmetic Circuits
76
-
77
- With Boolean gates in hand, we built up the arithmetic hierarchy. Half adders combine an XOR (for sum) and AND (for carry). Full adders chain two half adders with an OR for carry propagation. Ripple carry adders chain full adders. We implemented 2-bit, 4-bit, and 8-bit variants and verified each exhaustively.
78
-
79
- Multiplication came next. An 8x8 multiplier requires 64 partial products (each an AND gate) followed by seven stages of addition to accumulate the results. The implementation uses the standard shift-and-add architecture, resulting in hundreds of interconnected gates.
80
-
81
- Division was the most complex arithmetic circuit. We implemented a restoring division algorithm with eight stages, each containing a comparator, conditional subtractor, and multiplexer to select between the subtracted and original values. The full divider contains nearly 2,000 tensors and correctly computes both quotient and remainder.
82
-
83
- ### Phase 3: The CPU Attempt
84
-
85
- With arithmetic complete, we began building CPU infrastructure:
86
-
87
- - **Instruction Decoder**: A 4-bit opcode decoder that activates one of 16 operation lines
88
- - **Register File**: Four 8-bit registers with read/write multiplexing
89
- - **Program Counter**: An 8-bit counter with increment and load capabilities
90
- - **ALU Integration**: Routing to select between arithmetic operations based on opcode
91
- - **Control Signals**: Jump, conditional jump, call, return, push, pop, halt
92
- - **Flag Generation**: Zero, negative, carry, and overflow flags
93
 
94
- The CPU grew to over 6,000 tensors. We implemented conditional jumps based on flags, subroutine calls with a stack, and began writing test programs.
 
 
 
 
95
 
96
- ### Phase 4: Scope Realization
97
 
98
- As the CPU neared completion, we stepped back to assess the project. The CPU worked. Programs could execute. But we realized several things:
 
 
 
 
 
99
 
100
- First, the complexity was substantial. Debugging required careful routing analysis. Adding new instructions meant touching many interconnected systems. The verification burden grew quadratically with features.
 
101
 
102
- Second, and more importantly, we asked: what is the most valuable artifact here? The CPU is interesting as a demonstration, but its practical utility is limited. Nobody needs an 8-bit CPU implemented in neural network weights. What people do need is reliable arithmetic.
 
103
 
104
- Language models notoriously struggle with arithmetic. They can discuss mathematics eloquently but fail at actual computation. A frozen, verified arithmetic layer could potentially address this gap. The arithmetic circuits we had built were the genuinely useful core. The CPU control logic was scaffolding.
 
 
105
 
106
- ---
107
-
108
- ## The Pivot to Arithmetic
109
-
110
- We made the decision to extract and perfect the arithmetic core as a standalone artifact. This involved:
111
-
112
- 1. **Identifying Essential Tensors**: We cataloged every tensor by category and determined which were arithmetic-related versus CPU-specific.
113
-
114
- 2. **Removing CPU Infrastructure**: Control flow circuits (instruction decoder, program counter, jump logic, stack operations), ALU wrapper logic, and CPU manifest metadata were stripped out.
115
-
116
- 3. **Retaining Arithmetic Foundations**: All arithmetic operations, Boolean gates, threshold primitives, combinational building blocks, modular arithmetic, and pattern recognition circuits were preserved.
117
 
118
- 4. **Cleaning Residual CPU Artifacts**: Some tensors like the register multiplexer had leaked into the combinational category. These were identified and removed to ensure a clean arithmetic-only core.
 
 
 
119
 
120
- 5. **Verification**: The stripped model was re-verified to ensure 100% test pass rate and 100% tensor coverage.
121
 
122
- The result is this repository: a focused arithmetic core with 5,094 tensors, every one tested and accounted for.
123
 
124
- The CPU work is not abandoned. It will continue in the original repository (phanerozoic/8bit-threshold-computer) as an interesting research direction. But we believe the arithmetic core is the more immediately valuable contribution, and it deserves its own focused home.
125
-
126
- ---
127
-
128
- ## What This Model Contains
129
 
130
- ### File Manifest
131
 
132
- | File | Description | Size |
133
- |------|-------------|------|
134
- | `arithmetic.safetensors` | Self-documenting format with explicit .inputs tensors | 1.06 MB |
135
- | `eval.py` | Verification suite using self-documenting format | 12 KB |
136
- | `TODO.md` | Development roadmap | 3 KB |
137
- | `convert_to_explicit_inputs.py` | Script used to generate .inputs tensors | 32 KB |
138
- | `tensors_arithmetic_only.txt` | Tensor manifest with shapes and values | 397 KB |
139
 
140
- ### Self-Documenting Format
 
 
141
 
142
- The `arithmetic.safetensors` file is fully self-contained. Each gate has three tensors:
143
 
144
- - `.weight` -- the gate's weight vector
145
- - `.bias` -- the gate's bias
146
- - `.inputs` -- integer tensor of signal IDs referencing input sources
 
147
 
148
- The signal registry is stored in file metadata under the key `signal_registry` as a JSON object mapping IDs to signal names:
149
 
150
  ```python
151
  from safetensors import safe_open
@@ -153,675 +99,52 @@ import json
153
 
154
  with safe_open('arithmetic.safetensors', framework='pt') as f:
155
  registry = json.loads(f.metadata()['signal_registry'])
156
-
157
- # Get inputs for a gate
158
- inputs_tensor = f.get_tensor('boolean.and.inputs')
159
- input_signals = [registry[str(i.item())] for i in inputs_tensor]
160
- # Result: ['$a', '$b']
161
  ```
162
 
163
- Signal naming conventions:
164
- - `$name` -- external circuit input (e.g., `$a`, `$dividend[0]`)
165
- - `#value` -- constant (e.g., `#0`, `#1`)
166
- - `gate.path` -- output of another gate (e.g., `ha1.sum`, `stage0.cmp`)
167
 
168
- This format eliminates the need for external routing files and makes circuits fully introspectable from the safetensors file alone.
169
-
170
- ### Tensor Statistics
171
-
172
- - **Total tensors**: 7,634 (weights + biases + inputs)
173
- - **Gates**: 2,540
174
- - **Signal registry**: 3,018 signals
175
- - **Categories**: 6 (arithmetic, boolean, combinational, modular, pattern_recognition, threshold)
176
- - **Largest category**: arithmetic (4,659 weight/bias tensors)
177
- - **Smallest category**: boolean (30 weight/bias tensors)
178
-
179
- ### Category Breakdown
180
-
181
- | Category | Tensors | Description |
182
- |----------|---------|-------------|
183
- | arithmetic | 4,659 | Adders, subtractors, multipliers, dividers, comparators, shifts |
184
- | modular | 226 | Divisibility testers for mod 2 through mod 12 |
185
- | combinational | 40 | Multiplexers, demultiplexers, encoders, decoders, barrel shifter |
186
- | threshold | 30 | k-of-n voting gates, majority, minority |
187
- | boolean | 30 | AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES |
188
- | pattern_recognition | 25 | Popcount, leading/trailing ones, symmetry, alternating patterns |
189
-
190
- ---
191
-
192
- ## How Threshold Logic Works
193
-
194
- Threshold logic is a computational model where each gate computes a weighted sum of its inputs and compares the result to a threshold. If the sum meets or exceeds the threshold, the gate outputs 1; otherwise, it outputs 0.
195
-
196
- Mathematically, a threshold gate computes:
197
-
198
- ```
199
- output = 1 if (w1*x1 + w2*x2 + ... + wn*xn + bias) >= 0 else 0
200
- ```
201
-
202
- This is identical to a single neuron with a Heaviside step activation function:
203
-
204
- ```python
205
- def heaviside(x):
206
- return 1.0 if x >= 0 else 0.0
207
-
208
- def threshold_gate(inputs, weights, bias):
209
- return heaviside(sum(w * x for w, x in zip(weights, inputs)) + bias)
210
- ```
211
-
212
- ### Examples
213
-
214
- **AND Gate**: weights = [1, 1], bias = -2
215
- - inputs (0, 0): 0 + 0 - 2 = -2 < 0, output 0
216
- - inputs (0, 1): 0 + 1 - 2 = -1 < 0, output 0
217
- - inputs (1, 0): 1 + 0 - 2 = -1 < 0, output 0
218
- - inputs (1, 1): 1 + 1 - 2 = 0 >= 0, output 1
219
-
220
- **OR Gate**: weights = [1, 1], bias = -1
221
- - inputs (0, 0): 0 + 0 - 1 = -1 < 0, output 0
222
- - inputs (0, 1): 0 + 1 - 1 = 0 >= 0, output 1
223
- - inputs (1, 0): 1 + 0 - 1 = 0 >= 0, output 1
224
- - inputs (1, 1): 1 + 1 - 1 = 1 >= 0, output 1
225
-
226
- **NOT Gate**: weights = [-1], bias = 0
227
- - input 0: -0 + 0 = 0 >= 0, output 1
228
- - input 1: -1 + 0 = -1 < 0, output 0
229
-
230
- **3-of-5 Majority**: weights = [1, 1, 1, 1, 1], bias = -3
231
- - Outputs 1 if and only if at least 3 of the 5 inputs are 1
232
-
233
- ### Non-Linearly Separable Functions
234
-
235
- Some Boolean functions, notably XOR and XNOR, cannot be computed by a single threshold gate because they are not linearly separable. For these, we use two-layer networks:
236
-
237
- **XOR**: Layer 1 computes OR and NAND in parallel. Layer 2 computes AND of these results.
238
- - OR fires if at least one input is 1
239
- - NAND fires unless both inputs are 1
240
- - AND of (OR, NAND) fires only when exactly one input is 1
241
-
242
- This two-layer pattern is used throughout the design wherever XOR operations are needed, including in half adders, full adders, and parity circuits.
243
-
244
- ---
245
-
246
- ## Circuit Catalog
247
-
248
- ### Boolean Gates
249
-
250
- | Circuit | Inputs | Outputs | Layers | Description |
251
- |---------|--------|---------|--------|-------------|
252
- | boolean.and | 2 | 1 | 1 | Logical AND |
253
- | boolean.or | 2 | 1 | 1 | Logical OR |
254
- | boolean.not | 1 | 1 | 1 | Logical NOT |
255
- | boolean.nand | 2 | 1 | 1 | NOT AND |
256
- | boolean.nor | 2 | 1 | 1 | NOT OR |
257
- | boolean.xor | 2 | 1 | 2 | Exclusive OR |
258
- | boolean.xnor | 2 | 1 | 2 | Exclusive NOR |
259
- | boolean.implies | 2 | 1 | 1 | Logical implication (A implies B) |
260
- | boolean.biimplies | 2 | 1 | 2 | Biconditional (A iff B) |
261
-
262
- ### Arithmetic: Addition
263
-
264
- | Circuit | Inputs | Outputs | Description |
265
- |---------|--------|---------|-------------|
266
- | arithmetic.halfadder | 2 bits | sum, carry | Basic half adder |
267
- | arithmetic.fulladder | 3 bits (a, b, cin) | sum, cout | Full adder with carry |
268
- | arithmetic.ripplecarry2bit | 2x 2-bit | 2-bit sum, cout | 2-bit ripple carry adder |
269
- | arithmetic.ripplecarry4bit | 2x 4-bit | 4-bit sum, cout | 4-bit ripple carry adder |
270
- | arithmetic.ripplecarry8bit | 2x 8-bit | 8-bit sum, cout | 8-bit ripple carry adder |
271
- | arithmetic.adc8bit | 2x 8-bit + cin | 8-bit sum, cout | Add with carry |
272
- | arithmetic.incrementer8bit | 8-bit | 8-bit | Add 1 to input |
273
- | arithmetic.decrementer8bit | 8-bit | 8-bit | Subtract 1 from input |
274
-
275
- ### Arithmetic: Subtraction
276
-
277
- | Circuit | Inputs | Outputs | Description |
278
- |---------|--------|---------|-------------|
279
- | arithmetic.sub8bit | 2x 8-bit | 8-bit diff, borrow | 8-bit subtraction |
280
- | arithmetic.sbc8bit | 2x 8-bit + bin | 8-bit diff, bout | Subtract with borrow |
281
- | arithmetic.neg8bit | 8-bit | 8-bit | Two's complement negation |
282
- | arithmetic.absolutedifference8bit | 2x 8-bit | 8-bit | |A - B| |
283
-
284
- ### Arithmetic: Multiplication
285
-
286
- | Circuit | Inputs | Outputs | Description |
287
- |---------|--------|---------|-------------|
288
- | arithmetic.multiplier2x2 | 2x 2-bit | 4-bit product | 2x2 multiplier |
289
- | arithmetic.multiplier4x4 | 2x 4-bit | 8-bit product | 4x4 multiplier |
290
- | arithmetic.multiplier8x8 | 2x 8-bit | 16-bit product | 8x8 multiplier |
291
-
292
- ### Arithmetic: Division
293
-
294
- | Circuit | Inputs | Outputs | Description |
295
- |---------|--------|---------|-------------|
296
- | arithmetic.div8bit | 8-bit dividend, 8-bit divisor | 8-bit quotient, 8-bit remainder | Full 8-bit division |
297
-
298
- The divider uses a restoring division algorithm with 8 stages. Each stage shifts the partial remainder, compares against the divisor, conditionally subtracts, and records one quotient bit. The implementation contains nearly 2,000 tensors and is the most complex circuit in the model.
299
-
300
- ### Arithmetic: Comparison
301
-
302
- | Circuit | Inputs | Outputs | Description |
303
- |---------|--------|---------|-------------|
304
- | arithmetic.greaterthan8bit | 2x 8-bit | 1 bit | A > B |
305
- | arithmetic.lessthan8bit | 2x 8-bit | 1 bit | A < B |
306
- | arithmetic.greaterorequal8bit | 2x 8-bit | 1 bit | A >= B |
307
- | arithmetic.lessorequal8bit | 2x 8-bit | 1 bit | A <= B |
308
- | arithmetic.equality8bit | 2x 8-bit | 1 bit | A == B |
309
- | arithmetic.cmp8bit | 2x 8-bit | flags | Full comparison with flags |
310
- | arithmetic.max8bit | 2x 8-bit | 8-bit | Maximum of two values |
311
- | arithmetic.min8bit | 2x 8-bit | 8-bit | Minimum of two values |
312
-
313
- ### Arithmetic: Shifts and Rotates
314
-
315
- | Circuit | Inputs | Outputs | Description |
316
- |---------|--------|---------|-------------|
317
- | arithmetic.asr8bit | 8-bit | 8-bit | Arithmetic shift right (sign-preserving) |
318
- | arithmetic.rol8bit | 8-bit | 8-bit, cout | Rotate left |
319
- | arithmetic.ror8bit | 8-bit | 8-bit, cout | Rotate right |
320
-
321
- ### Threshold Gates
322
-
323
- | Circuit | Inputs | Outputs | Description |
324
- |---------|--------|---------|-------------|
325
- | threshold.oneoutof8 | 8 bits | 1 bit | At least 1 of 8 inputs is 1 |
326
- | threshold.twooutof8 | 8 bits | 1 bit | At least 2 of 8 inputs are 1 |
327
- | threshold.threeoutof8 | 8 bits | 1 bit | At least 3 of 8 inputs are 1 |
328
- | threshold.fouroutof8 | 8 bits | 1 bit | At least 4 of 8 inputs are 1 |
329
- | threshold.fiveoutof8 | 8 bits | 1 bit | At least 5 of 8 inputs are 1 |
330
- | threshold.sixoutof8 | 8 bits | 1 bit | At least 6 of 8 inputs are 1 |
331
- | threshold.sevenoutof8 | 8 bits | 1 bit | At least 7 of 8 inputs are 1 |
332
- | threshold.alloutof8 | 8 bits | 1 bit | All 8 inputs are 1 |
333
- | threshold.majority | n bits | 1 bit | More than half of inputs are 1 |
334
- | threshold.minority | n bits | 1 bit | Fewer than half of inputs are 1 |
335
-
336
- ### Modular Arithmetic
337
-
338
- | Circuit | Inputs | Outputs | Description |
339
- |---------|--------|---------|-------------|
340
- | modular.mod2 | 8-bit | 1 bit | Divisible by 2 |
341
- | modular.mod3 | 8-bit | 1 bit | Divisible by 3 |
342
- | modular.mod4 | 8-bit | 1 bit | Divisible by 4 |
343
- | modular.mod5 | 8-bit | 1 bit | Divisible by 5 |
344
- | modular.mod6 | 8-bit | 1 bit | Divisible by 6 |
345
- | modular.mod7 | 8-bit | 1 bit | Divisible by 7 |
346
- | modular.mod8 | 8-bit | 1 bit | Divisible by 8 |
347
- | modular.mod9 | 8-bit | 1 bit | Divisible by 9 |
348
- | modular.mod10 | 8-bit | 1 bit | Divisible by 10 |
349
- | modular.mod11 | 8-bit | 1 bit | Divisible by 11 |
350
- | modular.mod12 | 8-bit | 1 bit | Divisible by 12 |
351
-
352
- Powers of 2 (mod 2, 4, 8) use single-layer circuits that check only the relevant low bits. Other moduli use multi-layer networks that detect all sums (0-255) that are divisible by the modulus.
353
-
354
- ### Pattern Recognition
355
-
356
- | Circuit | Inputs | Outputs | Description |
357
- |---------|--------|---------|-------------|
358
- | pattern_recognition.popcount | 8 bits | count | Count of 1 bits (population count) |
359
- | pattern_recognition.allzeros | 8 bits | 1 bit | All bits are 0 |
360
- | pattern_recognition.allones | 8 bits | 1 bit | All bits are 1 |
361
- | pattern_recognition.onehotdetector | 8 bits | 1 bit | Exactly one bit is 1 |
362
- | pattern_recognition.leadingones | 8 bits | count | Count of leading 1 bits |
363
- | pattern_recognition.trailingones | 8 bits | count | Count of trailing 1 bits |
364
- | pattern_recognition.symmetry8bit | 8 bits | 1 bit | Bit pattern is palindromic |
365
- | pattern_recognition.alternating8bit | 8 bits | 1 bit | Bits alternate (01010101 or 10101010) |
366
- | pattern_recognition.hammingdistance8bit | 2x 8-bit | count | Number of differing bits |
367
-
368
- ### Combinational
369
-
370
- | Circuit | Inputs | Outputs | Description |
371
- |---------|--------|---------|-------------|
372
- | combinational.decoder3to8 | 3-bit select | 8 one-hot | 3-to-8 decoder |
373
- | combinational.encoder8to3 | 8-bit one-hot | 3-bit | 8-to-3 priority encoder |
374
- | combinational.multiplexer2to1 | 2 data, 1 select | 1 | 2-to-1 multiplexer |
375
- | combinational.multiplexer4to1 | 4 data, 2 select | 1 | 4-to-1 multiplexer |
376
- | combinational.multiplexer8to1 | 8 data, 3 select | 1 | 8-to-1 multiplexer |
377
- | combinational.demultiplexer1to2 | 1 data, 1 select | 2 | 1-to-2 demultiplexer |
378
- | combinational.demultiplexer1to4 | 1 data, 2 select | 4 | 1-to-4 demultiplexer |
379
- | combinational.demultiplexer1to8 | 1 data, 3 select | 8 | 1-to-8 demultiplexer |
380
- | combinational.barrelshifter8bit | 8-bit data, 3-bit shift | 8-bit | Barrel shifter |
381
- | combinational.priorityencoder8bit | 8 bits | 3-bit + valid | Priority encoder |
382
-
383
- ---
384
-
385
- ## Evaluation and Verification
386
-
387
- The model includes a comprehensive evaluation suite (`arithmetic_eval.py`) that tests every circuit exhaustively where feasible.
388
-
389
- ### Test Coverage
390
-
391
- | Category | Tests | Method |
392
- |----------|-------|--------|
393
- | Boolean gates | 34 | All input combinations |
394
- | Half/full adders | 12 | All input combinations |
395
- | 2-bit adder | 16 | All 4x4 combinations |
396
- | 4-bit adder | 256 | All 16x16 combinations |
397
- | 8-bit adder | 65,536 | All 256x256 combinations |
398
- | Comparators | 262,144 | All 256x256 combinations (4 comparators) |
399
- | 8x8 multiplier | 357 | Strategic sample (edges, powers of 2, patterns) |
400
- | 8-bit divider | 1,108 | Strategic sample |
401
- | Threshold gates | 2,048 | All 256 values for each of 8 gates |
402
- | Modular arithmetic | 2,816 | All 256 values for each of 11 moduli |
403
- | Pattern recognition | 1,537 | Exhaustive for detectors, sampled for counters |
404
- | Combinational | 854 | All relevant combinations |
405
-
406
- ### Running the Evaluator
407
 
408
  ```bash
409
- python arithmetic_eval.py --model arithmetic.safetensors --device cpu
410
- ```
411
-
412
- Output:
413
- ```
414
- Loading model from arithmetic.safetensors...
415
- Found 5094 tensors
416
- Categories: ['arithmetic', 'boolean', 'combinational', 'modular', 'pattern_recognition', 'threshold']
417
-
418
- === BOOLEAN GATES ===
419
- boolean.and: 4/4 [PASS]
420
- boolean.or: 4/4 [PASS]
421
- ...
422
-
423
- ============================================================
424
- SUMMARY
425
- ============================================================
426
- Total: 339500/339500 (100.0000%)
427
- Time: 136.78s
428
-
429
- All circuits passed!
430
-
431
- ============================================================
432
- TENSOR COVERAGE: 5094/5094 (100.00%)
433
-
434
- All tensors tested!
435
-
436
- Fitness: 1.000000
437
- ```
438
-
439
- ### Verification Guarantees
440
-
441
- - **100% test pass rate**: Every test passes
442
- - **100% tensor coverage**: Every tensor in the model is accessed during testing
443
- - **Exhaustive where feasible**: All circuits with <= 16 input bits are tested exhaustively
444
- - **Strategic sampling for large circuits**: Multiplier and divider use carefully chosen test vectors
445
-
446
- ---
447
-
448
- ## Intended Use Cases
449
-
450
- ### 1. Frozen Arithmetic Layer for Language Models
451
-
452
- The primary intended use is embedding this arithmetic core as a frozen layer within a language model. The concept:
453
-
454
- - The LLM learns to recognize when arithmetic is needed
455
- - Interface layers (trained) convert token representations to binary inputs
456
- - The frozen arithmetic layer computes the exact result
457
- - Interface layers convert binary outputs back to token space
458
-
459
- This separates the "knowing when to compute" problem (which LLMs can learn) from the "computing correctly" problem (which is solved by the frozen weights).
460
-
461
- ### 2. Neuromorphic Hardware
462
-
463
- Threshold logic maps naturally to neuromorphic computing substrates. Each gate is a single neuron. The weights are sparse and small (typically -2 to +2). This model could serve as a reference implementation for arithmetic on neuromorphic chips.
464
-
465
- ### 3. Verified Computing
466
-
467
- Because every circuit has been exhaustively tested, this model provides a verified computing substrate. Applications requiring guaranteed correctness can use these weights with confidence.
468
-
469
- ### 4. Educational Resource
470
-
471
- The model serves as a complete, working example of how digital logic maps to neural network weights. Students can inspect the weights, trace signal flow, and understand the correspondence between Boolean algebra and threshold logic.
472
-
473
- ### 5. Baseline for Pruning Research
474
-
475
- The model provides a known-correct starting point for pruning and compression research. How aggressively can we prune while maintaining correctness? Which tensors are most compressible? These questions can be explored with ground truth.
476
-
477
- ---
478
-
479
- ## Integration with Language Models
480
-
481
- We envision integration following this architecture:
482
-
483
  ```
484
- [Token Embeddings]
485
- |
486
- v
487
- [Transformer Layers (trainable)]
488
- |
489
- v
490
- [Arithmetic Router (trainable)] -- decides whether arithmetic is needed
491
- |
492
- v
493
- [BitExtractor (trainable)] -- converts activations to binary inputs
494
- |
495
- v
496
- [Threshold Calculus Core (FROZEN)] -- computes exact arithmetic
497
- |
498
- v
499
- [BitInjector (trainable)] -- converts binary outputs back to activations
500
- |
501
- v
502
- [Transformer Layers (trainable)]
503
- |
504
- v
505
- [Output]
506
- ```
507
-
508
- The key insight is that the model learns call dispatch, not computation. The trainable components learn:
509
- - When to invoke arithmetic circuits
510
- - How to extract operands from the representation
511
- - How to interpret and integrate results
512
-
513
- The actual arithmetic is handled by frozen, verified weights that cannot drift or hallucinate.
514
-
515
- ### Interface Layer Design
516
-
517
- The BitExtractor must learn to:
518
- 1. Identify which activation dimensions encode numerical values
519
- 2. Convert floating-point activations to 8-bit binary representations
520
- 3. Route to the appropriate arithmetic circuit
521
-
522
- The BitInjector must learn to:
523
- 1. Interpret binary results
524
- 2. Convert back to the model's activation space
525
- 3. Integrate results with ongoing computation
526
-
527
- These interface layers are small and trainable. The bulk of the arithmetic (5,094 tensors) remains frozen.
528
-
529
- ---
530
-
531
- ## Pruning Experiments
532
-
533
- A key research direction is pruning. The current model uses canonical, human-designed circuits. These are not necessarily optimal for neural network representations. Several questions arise:
534
 
535
- ### Weight Magnitude Pruning
536
 
537
- Can we zero out small weights while maintaining correctness? Initial experiments suggest that threshold logic is sensitive to weight changes because the decision boundary must be exact. A weight of 0.99 instead of 1.0 might flip outputs for edge cases.
538
 
539
- ### Structural Pruning
540
 
541
- Can we remove entire neurons or layers? Some circuits may have redundant paths. The two-layer XOR implementation, for instance, might have alternative single-layer approximations for specific use cases.
542
 
543
- ### Knowledge Distillation
544
 
545
- Can we train smaller networks to mimic the larger verified networks? This would trade verification for compression.
546
-
547
- ### Quantization
548
-
549
- The current weights are float32 but only take values in a small set (typically -2, -1, 0, 1, 2). Aggressive quantization to int8 or even int4 should be possible with no loss.
550
-
551
- ### Sparsity Patterns
552
-
553
- Many weights are zero. Converting to sparse representations could significantly reduce memory and computation.
554
-
555
- We look forward to exploring how extreme we can push these compressions while maintaining 100% correctness. The verified nature of the model provides ground truth for evaluating any compression scheme.
556
-
557
- ---
558
-
559
- ## Limitations
560
-
561
- ### Bit Width
562
-
563
- The model implements 8-bit arithmetic. Larger operands require chaining operations using carry propagation. This is possible but requires external orchestration.
564
-
565
- ### No Floating Point
566
-
567
- The model only supports integer arithmetic. Floating-point operations (which LLMs are frequently asked to perform) are not implemented. This is the most significant gap for practical LLM integration. Adding IEEE 754 floating-point support is a priority for future work.
568
-
569
- ### No Memory
570
-
571
- The model is purely combinational. There are no flip-flops, registers, or memory elements. State must be managed externally.
572
-
573
- ### Interface Complexity
574
-
575
- Integrating with an LLM requires training interface layers. The optimal architecture for these layers is an open research question.
576
-
577
- ### Verification Scope
578
-
579
- While we have tested exhaustively where feasible, the 8x8 multiplier and 8-bit divider use strategic sampling rather than exhaustive testing. Full exhaustive testing would require 2^16 = 65,536 tests for the multiplier and careful handling of division by zero.
580
-
581
- ---
582
 
583
  ## Roadmap
584
 
585
- Goal: Complete arithmetic coprocessor for LLM mathematical reasoning.
586
-
587
- ### Completed
588
-
589
- #### Float16 Core Arithmetic
590
- - [x] `float16.add` — IEEE 754 addition (~998 gates)
591
- - [x] `float16.sub` — IEEE 754 subtraction
592
- - [x] `float16.mul` — IEEE 754 multiplication (~1302 gates)
593
- - [x] `float16.div` — IEEE 754 division (~1854 gates)
594
- - [x] `float16.neg` — sign flip
595
- - [x] `float16.abs` — absolute value
596
- - [x] `float16.cmp` — comparison
597
-
598
- #### Float16 Utilities
599
- - [x] `float16.unpack` — extract sign, exponent, mantissa
600
- - [x] `float16.pack` — assemble components
601
- - [x] `float16.normalize` — CLZ-based normalization
602
- - [x] `float16.toint` — convert to int16
603
- - [x] `float16.fromint` — convert from int16
604
-
605
- #### Integer Arithmetic (8-bit)
606
- - [x] Adders (half, full, ripple carry 2/4/8 bit)
607
- - [x] Subtraction, negation
608
- - [x] Multiplication (2x2, 4x4, 8x8)
609
- - [x] Division (8-bit with remainder)
610
- - [x] Comparators (all relations)
611
- - [x] CLZ (8-bit and 16-bit)
612
-
613
- #### Logic and Patterns
614
- - [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
615
- - [x] Threshold gates (k-of-n for k=1..8)
616
- - [x] Modular arithmetic (mod 2-12)
617
- - [x] Pattern recognition (popcount, one-hot, symmetry)
618
- - [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
619
- - [x] Shifts and rotates
620
-
621
- #### Infrastructure
622
- - [x] Self-documenting .inputs tensors
623
- - [x] Signal registry in safetensors metadata
624
- - [x] Full circuit evaluation with topological sort
625
- - [x] Comprehensive test suite (7,177 tests, 100% pass)
626
 
627
- ---
 
 
 
 
628
 
629
- ### High Priority — Core Mathematical Functions
630
-
631
- #### Powers and Roots (float16)
632
- - [ ] `float16.sqrt` — square root via Newton-Raphson or digit-by-digit
633
- - [ ] `float16.rsqrt` — reciprocal square root (useful for normalization)
634
- - [ ] `float16.pow` — x^y for arbitrary y (via exp/ln)
635
- - [ ] `float16.sq` — x² (optimized special case)
636
- - [ ] `float16.cube` — x³ (optimized special case)
637
- - [ ] `float16.cbrt` — cube root
638
-
639
- #### Exponentials and Logarithms (float16)
640
- - [ ] `float16.exp` — e^x via range reduction + polynomial
641
- - [ ] `float16.exp2` — 2^x (simpler, useful for pow)
642
- - [ ] `float16.ln` — natural logarithm
643
- - [ ] `float16.log2` — base-2 logarithm (extract exponent + correction)
644
- - [ ] `float16.log10` — base-10 logarithm
645
-
646
- #### Trigonometry (float16, CORDIC)
647
- - [ ] `float16.sin` — sine
648
- - [ ] `float16.cos` — cosine
649
- - [ ] `float16.tan` — tangent (sin/cos)
650
- - [ ] `float16.sincos` — both sin and cos (CORDIC gives both)
651
- - [ ] `float16.asin` — arc sine
652
- - [ ] `float16.acos` — arc cosine
653
- - [ ] `float16.atan` — arc tangent
654
- - [ ] `float16.atan2` — two-argument arc tangent (quadrant-aware)
655
-
656
- #### Hyperbolic Functions (float16)
657
- - [ ] `float16.sinh` — hyperbolic sine
658
- - [ ] `float16.cosh` — hyperbolic cosine
659
- - [ ] `float16.tanh` — hyperbolic tangent (critical for ML activations)
660
-
661
- ---
662
-
663
- ### Medium Priority — Extended Operations
664
-
665
- #### Rounding and Truncation (float16)
666
- - [ ] `float16.floor` — round toward -∞
667
- - [ ] `float16.ceil` — round toward +∞
668
- - [ ] `float16.trunc` — round toward zero
669
- - [ ] `float16.round` — round to nearest
670
- - [ ] `float16.frac` — fractional part
671
- - [ ] `float16.fmod` — floating-point modulo
672
-
673
- #### Comparisons and Selection (float16)
674
- - [ ] `float16.min` — minimum of two values
675
- - [ ] `float16.max` — maximum of two values
676
- - [ ] `float16.clamp` — clamp to range [lo, hi]
677
- - [ ] `float16.sign` — sign function (-1, 0, +1)
678
- - [ ] `float16.copysign` — copy sign from y to x
679
- - [ ] `float16.isnan` — NaN test
680
- - [ ] `float16.isinf` — infinity test
681
- - [ ] `float16.isfinite` — finite test
682
-
683
- #### Integer Arithmetic (16-bit)
684
- - [ ] `arithmetic.add16` — 16-bit addition
685
- - [ ] `arithmetic.sub16` — 16-bit subtraction
686
- - [ ] `arithmetic.mul16` — 16-bit multiplication
687
- - [ ] `arithmetic.div16` — 16-bit division with remainder
688
- - [ ] `arithmetic.sqrt16` — 16-bit integer square root
689
- - [ ] `arithmetic.abs16` — 16-bit absolute value
690
-
691
- #### Number Theory
692
- - [ ] `arithmetic.gcd` — greatest common divisor (Euclidean)
693
- - [ ] `arithmetic.lcm` — least common multiple
694
- - [ ] `arithmetic.isprime8` — primality test (8-bit)
695
- - [ ] `arithmetic.factorial8` — factorial (8! = 40320 fits in 16-bit)
696
- - [ ] `arithmetic.comb` — binomial coefficient nCr
697
- - [ ] `arithmetic.perm` — permutation nPr
698
-
699
- ---
700
-
701
- ### Lower Priority — Specialized Functions
702
-
703
- #### ML Activation Functions (float16)
704
- - [ ] `float16.relu` — max(0, x)
705
- - [ ] `float16.leaky_relu` — x if x > 0 else αx
706
- - [ ] `float16.sigmoid` — 1/(1+e^(-x))
707
- - [ ] `float16.softplus` — ln(1+e^x)
708
- - [ ] `float16.gelu` — Gaussian error linear unit
709
- - [ ] `float16.silu` — x * sigmoid(x)
710
-
711
- #### Constants (float16 encoded)
712
- - [ ] `const.pi` — π = 3.14159...
713
- - [ ] `const.e` — e = 2.71828...
714
- - [ ] `const.phi` — φ = 1.61803... (golden ratio)
715
- - [ ] `const.sqrt2` — √2 = 1.41421...
716
- - [ ] `const.ln2` — ln(2) = 0.69314...
717
- - [ ] `const.log2e` — log₂(e) = 1.44269...
718
-
719
- #### Statistics (float16, multi-input)
720
- - [ ] `stats.sum` — sum of array
721
- - [ ] `stats.mean` — arithmetic mean
722
- - [ ] `stats.min_array` — minimum of array
723
- - [ ] `stats.max_array` — maximum of array
724
- - [ ] `stats.variance` — population variance
725
- - [ ] `stats.stddev` — standard deviation
726
-
727
- #### Bit Manipulation (16-bit)
728
- - [ ] `bits.popcnt16` — population count
729
- - [ ] `bits.clz16` — count leading zeros (done)
730
- - [ ] `bits.ctz16` — count trailing zeros
731
- - [ ] `bits.reverse16` — bit reversal
732
- - [ ] `bits.bswap16` — byte swap
733
-
734
- ---
735
-
736
- ### Infrastructure TODO
737
-
738
- #### Testing
739
- - [ ] Exhaustive float16 tests for new operations
740
- - [ ] Edge case coverage (±0, ±inf, NaN, subnormals)
741
- - [ ] Accuracy tests against reference implementations
742
-
743
- #### Documentation
744
- - [ ] Circuit diagrams for CORDIC, Newton-Raphson
745
- - [ ] Tutorial: implementing new circuits
746
- - [ ] Tutorial: LLM integration patterns
747
- - [ ] API reference for all operations
748
-
749
- #### Optimization
750
- - [ ] Gate count reduction analysis
751
- - [ ] Critical path optimization
752
- - [ ] Weight quantization study (int8/int4)
753
-
754
- ---
755
-
756
- ## Technical Details
757
-
758
- ### Tensor Naming Convention
759
-
760
- Tensors follow a hierarchical naming scheme:
761
-
762
- ```
763
- category.circuit.component.subcomponent.layer.type
764
- ```
765
-
766
- Examples:
767
- - `boolean.and.weight` -- weights for AND gate
768
- - `boolean.and.bias` -- bias for AND gate
769
- - `arithmetic.fulladder.ha1.sum.layer1.or.weight` -- first half adder, sum output, layer 1, OR gate weights
770
- - `arithmetic.div8bit.stage3.mux5.and0.bias` -- divider stage 3, mux for bit 5, AND gate 0, bias
771
-
772
- ### Weight Conventions
773
-
774
- - Weights are stored as 1D tensors
775
- - Biases are stored as scalar tensors (shape [1]) or sometimes as single floats
776
- - All values are float32 but only use a small discrete set of values
777
- - Common weight values: -2, -1, 0, 1, 2
778
- - Common bias values: -2, -1, 0, 1
779
-
780
- ### Activation Function
781
-
782
- All circuits assume a Heaviside step activation:
783
-
784
- ```python
785
- def heaviside(x):
786
- return (x >= 0).float()
787
- ```
788
-
789
- This is critical. Using ReLU, sigmoid, or other activations will produce incorrect results.
790
-
791
- ### Routing Information
792
-
793
- The `routing.json` file contains connectivity information for complex circuits, particularly the divider. This maps gate names to their input sources, enabling correct signal propagation during evaluation.
794
-
795
- ---
796
-
797
- ## Citation
798
-
799
- If you use this work, please cite:
800
-
801
- ```bibtex
802
- @misc{threshold-calculus,
803
- author = {Norton, Charles},
804
- title = {Threshold Calculus: Verified Arithmetic Circuits as Neural Network Weights},
805
- year = {2025},
806
- publisher = {Hugging Face},
807
- url = {https://huggingface.co/phanerozoic/threshold-calculus}
808
- }
809
- ```
810
-
811
- ---
812
 
813
  ## License
814
 
815
- This model is released under the Apache 2.0 License. You are free to use, modify, and distribute it for any purpose, including commercial applications.
816
-
817
- ---
818
-
819
- ## Acknowledgments
820
-
821
- This project builds on decades of research in threshold logic, digital design, and neural network theory. The insight that threshold gates are equivalent to perceptrons dates to the 1960s. We are grateful to the open-source communities around PyTorch, safetensors, and Hugging Face for the infrastructure that makes this work possible.
822
-
823
- ---
824
-
825
- ## Contact
826
-
827
- For questions, suggestions, or collaboration inquiries, please open an issue on this repository or contact the author through Hugging Face.
 
14
 
15
  # Threshold Calculus
16
 
17
+ Digital circuits encoded as neural network weights.
18
 
19
+ Each gate is a threshold logic unit: `output = step(weights · inputs + bias)`. The step function fires when the weighted sum 0. This maps digital logic to tensor operations.
20
 
21
+ ## What's Here
22
 
23
+ | File | Description |
24
+ |------|-------------|
25
+ | `arithmetic.safetensors` | 23,494 tensors encoding 7,828 gates |
26
+ | `eval.py` | Test harness (206,124 tests) |
27
+ | `convert_to_explicit_inputs.py` | Builds tensors and infers gate connectivity |
28
+ | `routing.json` | Signal routing for complex circuits |
29
 
30
+ ## Circuits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
+ **Float16 (IEEE 754)**
33
+ - `float16.add`, `float16.sub`, `float16.mul`, `float16.div`
34
+ - `float16.neg`, `float16.abs`, `float16.cmp`
35
+ - `float16.toint`, `float16.fromint`
36
+ - `float16.pack`, `float16.unpack`, `float16.normalize`
37
 
38
+ Handles NaN, Inf, zero, subnormals. Mantissa alignment via barrel shifter. Normalization via CLZ.
39
 
40
+ **8-bit Integer**
41
+ - Adders: half, full, ripple carry (2/4/8 bit), add-with-carry
42
+ - Subtraction: sub8bit, sbc8bit, neg8bit
43
+ - Comparison: cmp8bit, equality8bit
44
+ - Shifts: asr8bit, rol8bit, ror8bit
45
+ - CLZ: 8-bit and 16-bit
46
 
47
+ **Modular Arithmetic**
48
+ - mod2 through mod12 (divisibility testing)
49
 
50
+ **Boolean**
51
+ - AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES
52
 
53
+ **Threshold**
54
+ - k-of-n gates (1-of-8 through 8-of-8)
55
+ - majority, minority, atleastk, atmostk, exactlyk
56
 
57
+ **Pattern Recognition**
58
+ - popcount, allzeros, allones, onehotdetector
59
+ - symmetry8bit, alternating8bit, hammingdistance8bit
60
+ - leadingones, trailingones, runlength
 
 
 
 
 
 
 
61
 
62
+ **Combinational**
63
+ - decoder3to8, encoder
64
+ - multiplexer (2/4/8 to 1), demultiplexer (1 to 2/4/8)
65
+ - barrelshifter8bit, priorityencoder8bit
66
 
67
+ ## How It Works
68
 
69
+ A threshold gate computes:
70
 
71
+ ```
72
+ output = 1 if (w₁x₁ + w₂x₂ + ... + wₙxₙ + bias) >= 0 else 0
73
+ ```
 
 
74
 
75
+ This is a perceptron with Heaviside step activation.
76
 
77
+ **AND gate**: weights = [1, 1], bias = -1.5
78
+ - (0,0): 0 + 0 - 1.5 = -1.5 < 0 → 0
79
+ - (0,1): 0 + 1 - 1.5 = -0.5 < 0 → 0
80
+ - (1,0): 1 + 0 - 1.5 = -0.5 < 0 → 0
81
+ - (1,1): 1 + 1 - 1.5 = 0.5 ≥ 0 → 1
 
 
82
 
83
+ **XOR** requires two layers (not linearly separable):
84
+ - Layer 1: OR and NAND in parallel
85
+ - Layer 2: AND of both outputs
86
 
87
+ ## Self-Documenting Format
88
 
89
+ Each gate has three tensors in `arithmetic.safetensors`:
90
+ - `.weight` input weights
91
+ - `.bias` threshold
92
+ - `.inputs` — int64 tensor of signal IDs
93
 
94
+ Signal registry in metadata maps IDs to names:
95
 
96
  ```python
97
  from safetensors import safe_open
 
99
 
100
  with safe_open('arithmetic.safetensors', framework='pt') as f:
101
  registry = json.loads(f.metadata()['signal_registry'])
102
+ inputs = f.get_tensor('boolean.and.inputs')
103
+ names = [registry[str(i.item())] for i in inputs]
104
+ # ['$a', '$b']
 
 
105
  ```
106
 
107
+ Signal naming:
108
+ - `$name` circuit input (e.g., `$a`, `$dividend[0]`)
109
+ - `#0`, `#1` — constants
110
+ - `gate.path` output of another gate
111
 
112
+ ## Running Eval
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ```bash
115
+ python eval.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
+ Tests all circuits exhaustively. 8-bit operations test all 256 or 65,536 input combinations. Float16 tests cover special cases (NaN, Inf, ±0, subnormals) plus normal arithmetic.
119
 
120
+ ## Development History
121
 
122
+ Started as an 8-bit CPU project. Built boolean gates, then arithmetic (adders → multipliers → dividers), then CPU control logic. The CPU worked but the arithmetic core turned out to be the useful part, so it was extracted.
123
 
124
+ Float16 was added later. The commit history shows the iterative process—float16.add went through multiple rounds of bug fixes for edge cases (zero handling, sign logic, normalization). Mul and div required multi-bit carry infrastructure.
125
 
126
+ ## Project Origin
127
 
128
+ This began as an attempt to build a complete threshold-logic CPU. The CPU is in a separate repo (phanerozoic/8bit-threshold-computer). This repo focuses on the arithmetic core.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  ## Roadmap
131
 
132
+ **Done:**
133
+ - Float16 core (add/sub/mul/div)
134
+ - Float16 utilities (pack/unpack/normalize/conversions)
135
+ - 8-bit integer arithmetic
136
+ - Boolean, threshold, modular, pattern recognition, combinational
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
+ **Next:**
139
+ - Float16 sqrt, rsqrt, pow
140
+ - Float16 exp, ln, log2
141
+ - Float16 trig (sin, cos, tan via CORDIC)
142
+ - Float16 tanh (ML activation)
143
 
144
+ **Cleanup:**
145
+ - Rip out 8-bit integer circuits, replace with 16-bit
146
+ - 8-bit was scaffolding for float16 development, not the product
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  ## License
149
 
150
+ Apache 2.0