phanerozoic
/

threshold-calculus

@@ -14,138 +14,84 @@ pipeline_tag: other
 # Threshold Calculus
-**Arithmetic coprocessor for LLMs, implemented as threshold logic gates.**
-This is a runtime component, not a proof artifact. The circuits embed directly into transformer MLP layers as a reusable arithmetic unit. The model learns when to route through circuits vs standard MLP paths. Inference runs in PyTorch.
-Early training runs embedding these circuits into SmolLM2-360M show significant accuracy improvements on arithmetic tasks.
-The repository contains an arithmetic core implemented as threshold logic gates stored in safetensors format. Every tensor represents a neural network weight or bias that, when combined with a Heaviside step activation function, computes exact arithmetic operations. All circuits pass exhaustive testing (7,177 tests, 100% pass rate).
----
-## Table of Contents
-1. [Overview](#overview)
-2. [Project History](#project-history)
-3. [The Pivot to Arithmetic](#the-pivot-to-arithmetic)
-4. [What This Model Contains](#what-this-model-contains)
-5. [How Threshold Logic Works](#how-threshold-logic-works)
-6. [Circuit Catalog](#circuit-catalog)
-7. [Evaluation and Verification](#evaluation-and-verification)
-8. [Intended Use Cases](#intended-use-cases)
-9. [Integration with Language Models](#integration-with-language-models)
-10. [Pruning Experiments](#pruning-experiments)
-11. [Limitations](#limitations)
-12. [Future Work](#future-work)
-13. [Technical Details](#technical-details)
-14. [Citation](#citation)
-15. [License](#license)
----
-## Overview
-Threshold Calculus is an arithmetic computation core built entirely from threshold logic gates. Unlike traditional digital circuits that use discrete components, this implementation encodes every gate as a single neuron with learned weights and biases. The key insight is that threshold logic gates are computationally equivalent to single-layer perceptrons with step activation functions, meaning we can represent arbitrary digital circuits as neural network weights.
-The model contains 5,094 tensors totaling 575KB. These tensors implement:
-- Full 8-bit integer arithmetic (addition, subtraction, multiplication, division)
-- All standard comparison operations
-- Bitwise and logical operations
-- Modular arithmetic (divisibility testing for mod 2 through mod 12)
-- Pattern recognition primitives (popcount, leading zeros, symmetry detection)
-- Threshold voting circuits (k-of-n gates, majority, minority)
-- Combinational building blocks (multiplexers, demultiplexers, encoders, decoders)
-Every circuit has been exhaustively tested against all possible inputs. The 8-bit adder has been verified against all 65,536 input combinations. The 8-bit multiplier has been tested against representative samples including edge cases, powers of two, and adversarial bit patterns. The 8-bit divider produces correct quotients and remainders for all tested dividend/divisor pairs.
----
-## Project History
-This project began as an attempt to build a complete 8-bit CPU using threshold logic. The original goal was ambitious: create a Turing-complete computer where every logic gate, every flip-flop, every control signal was implemented as a neural network weight. The CPU would have registers, a program counter, an instruction decoder, conditional jumps, a stack, and the ability to run arbitrary programs.
-The development proceeded through several phases:
-### Phase 1: Boolean Foundations
-We started by implementing the basic Boolean gates. AND, OR, NOT, NAND, and NOR gates are trivially implementable as single threshold neurons. A 2-input AND gate, for example, uses weights [1, 1] and bias -2, firing only when both inputs are 1. XOR and XNOR required two-layer networks because they are not linearly separable. We developed standard templates for these gates that could be instantiated throughout the design.
-### Phase 2: Arithmetic Circuits
-With Boolean gates in hand, we built up the arithmetic hierarchy. Half adders combine an XOR (for sum) and AND (for carry). Full adders chain two half adders with an OR for carry propagation. Ripple carry adders chain full adders. We implemented 2-bit, 4-bit, and 8-bit variants and verified each exhaustively.
-Multiplication came next. An 8x8 multiplier requires 64 partial products (each an AND gate) followed by seven stages of addition to accumulate the results. The implementation uses the standard shift-and-add architecture, resulting in hundreds of interconnected gates.
-Division was the most complex arithmetic circuit. We implemented a restoring division algorithm with eight stages, each containing a comparator, conditional subtractor, and multiplexer to select between the subtracted and original values. The full divider contains nearly 2,000 tensors and correctly computes both quotient and remainder.
-### Phase 3: The CPU Attempt
-With arithmetic complete, we began building CPU infrastructure:
-- **Instruction Decoder**: A 4-bit opcode decoder that activates one of 16 operation lines
-- **Register File**: Four 8-bit registers with read/write multiplexing
-- **Program Counter**: An 8-bit counter with increment and load capabilities
-- **ALU Integration**: Routing to select between arithmetic operations based on opcode
-- **Control Signals**: Jump, conditional jump, call, return, push, pop, halt
-- **Flag Generation**: Zero, negative, carry, and overflow flags
-The CPU grew to over 6,000 tensors. We implemented conditional jumps based on flags, subroutine calls with a stack, and began writing test programs.
-### Phase 4: Scope Realization
-As the CPU neared completion, we stepped back to assess the project. The CPU worked. Programs could execute. But we realized several things:
-First, the complexity was substantial. Debugging required careful routing analysis. Adding new instructions meant touching many interconnected systems. The verification burden grew quadratically with features.
-Second, and more importantly, we asked: what is the most valuable artifact here? The CPU is interesting as a demonstration, but its practical utility is limited. Nobody needs an 8-bit CPU implemented in neural network weights. What people do need is reliable arithmetic.
-Language models notoriously struggle with arithmetic. They can discuss mathematics eloquently but fail at actual computation. A frozen, verified arithmetic layer could potentially address this gap. The arithmetic circuits we had built were the genuinely useful core. The CPU control logic was scaffolding.
----
-## The Pivot to Arithmetic
-We made the decision to extract and perfect the arithmetic core as a standalone artifact. This involved:
-1. **Identifying Essential Tensors**: We cataloged every tensor by category and determined which were arithmetic-related versus CPU-specific.
-2. **Removing CPU Infrastructure**: Control flow circuits (instruction decoder, program counter, jump logic, stack operations), ALU wrapper logic, and CPU manifest metadata were stripped out.
-3. **Retaining Arithmetic Foundations**: All arithmetic operations, Boolean gates, threshold primitives, combinational building blocks, modular arithmetic, and pattern recognition circuits were preserved.
-4. **Cleaning Residual CPU Artifacts**: Some tensors like the register multiplexer had leaked into the combinational category. These were identified and removed to ensure a clean arithmetic-only core.
-5. **Verification**: The stripped model was re-verified to ensure 100% test pass rate and 100% tensor coverage.
-The result is this repository: a focused arithmetic core with 5,094 tensors, every one tested and accounted for.
-The CPU work is not abandoned. It will continue in the original repository (phanerozoic/8bit-threshold-computer) as an interesting research direction. But we believe the arithmetic core is the more immediately valuable contribution, and it deserves its own focused home.
----
-## What This Model Contains
-### File Manifest
-| File | Description | Size |
-|------|-------------|------|
-| `arithmetic.safetensors` | Self-documenting format with explicit .inputs tensors | 1.06 MB |
-| `eval.py` | Verification suite using self-documenting format | 12 KB |
-| `TODO.md` | Development roadmap | 3 KB |
-| `convert_to_explicit_inputs.py` | Script used to generate .inputs tensors | 32 KB |
-| `tensors_arithmetic_only.txt` | Tensor manifest with shapes and values | 397 KB |
-### Self-Documenting Format
-The `arithmetic.safetensors` file is fully self-contained. Each gate has three tensors:
-- `.weight` -- the gate's weight vector
-- `.bias` -- the gate's bias
-- `.inputs` -- integer tensor of signal IDs referencing input sources
-The signal registry is stored in file metadata under the key `signal_registry` as a JSON object mapping IDs to signal names:
 ```python
 from safetensors import safe_open
@@ -153,675 +99,52 @@ import json
 with safe_open('arithmetic.safetensors', framework='pt') as f:
     registry = json.loads(f.metadata()['signal_registry'])
-    # Get inputs for a gate
-    inputs_tensor = f.get_tensor('boolean.and.inputs')
-    input_signals = [registry[str(i.item())] for i in inputs_tensor]
-    # Result: ['$a', '$b']
 ```
-Signal naming conventions:
-- `$name` -- external circuit input (e.g., `$a`, `$dividend[0]`)
-- `#value` -- constant (e.g., `#0`, `#1`)
-- `gate.path` -- output of another gate (e.g., `ha1.sum`, `stage0.cmp`)
-This format eliminates the need for external routing files and makes circuits fully introspectable from the safetensors file alone.
-### Tensor Statistics
-- **Total tensors**: 7,634 (weights + biases + inputs)
-- **Gates**: 2,540
-- **Signal registry**: 3,018 signals
-- **Categories**: 6 (arithmetic, boolean, combinational, modular, pattern_recognition, threshold)
-- **Largest category**: arithmetic (4,659 weight/bias tensors)
-- **Smallest category**: boolean (30 weight/bias tensors)
-### Category Breakdown
-| Category | Tensors | Description |
-|----------|---------|-------------|
-| arithmetic | 4,659 | Adders, subtractors, multipliers, dividers, comparators, shifts |
-| modular | 226 | Divisibility testers for mod 2 through mod 12 |
-| combinational | 40 | Multiplexers, demultiplexers, encoders, decoders, barrel shifter |
-| threshold | 30 | k-of-n voting gates, majority, minority |
-| boolean | 30 | AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES |
-| pattern_recognition | 25 | Popcount, leading/trailing ones, symmetry, alternating patterns |
----
-## How Threshold Logic Works
-Threshold logic is a computational model where each gate computes a weighted sum of its inputs and compares the result to a threshold. If the sum meets or exceeds the threshold, the gate outputs 1; otherwise, it outputs 0.
-Mathematically, a threshold gate computes:
-```
-output = 1 if (w1*x1 + w2*x2 + ... + wn*xn + bias) >= 0 else 0
-```
-This is identical to a single neuron with a Heaviside step activation function:
-```python
-def heaviside(x):
-    return 1.0 if x >= 0 else 0.0
-def threshold_gate(inputs, weights, bias):
-    return heaviside(sum(w * x for w, x in zip(weights, inputs)) + bias)
-```
-### Examples
-**AND Gate**: weights = [1, 1], bias = -2
-- inputs (0, 0): 0 + 0 - 2 = -2 < 0, output 0
-- inputs (0, 1): 0 + 1 - 2 = -1 < 0, output 0
-- inputs (1, 0): 1 + 0 - 2 = -1 < 0, output 0
-- inputs (1, 1): 1 + 1 - 2 = 0 >= 0, output 1
-**OR Gate**: weights = [1, 1], bias = -1
-- inputs (0, 0): 0 + 0 - 1 = -1 < 0, output 0
-- inputs (0, 1): 0 + 1 - 1 = 0 >= 0, output 1
-- inputs (1, 0): 1 + 0 - 1 = 0 >= 0, output 1
-- inputs (1, 1): 1 + 1 - 1 = 1 >= 0, output 1
-**NOT Gate**: weights = [-1], bias = 0
-- input 0: -0 + 0 = 0 >= 0, output 1
-- input 1: -1 + 0 = -1 < 0, output 0
-**3-of-5 Majority**: weights = [1, 1, 1, 1, 1], bias = -3
-- Outputs 1 if and only if at least 3 of the 5 inputs are 1
-### Non-Linearly Separable Functions
-Some Boolean functions, notably XOR and XNOR, cannot be computed by a single threshold gate because they are not linearly separable. For these, we use two-layer networks:
-**XOR**: Layer 1 computes OR and NAND in parallel. Layer 2 computes AND of these results.
-- OR fires if at least one input is 1
-- NAND fires unless both inputs are 1
-- AND of (OR, NAND) fires only when exactly one input is 1
-This two-layer pattern is used throughout the design wherever XOR operations are needed, including in half adders, full adders, and parity circuits.
----
-## Circuit Catalog
-### Boolean Gates
-| Circuit | Inputs | Outputs | Layers | Description |
-|---------|--------|---------|--------|-------------|
-| boolean.and | 2 | 1 | 1 | Logical AND |
-| boolean.or | 2 | 1 | 1 | Logical OR |
-| boolean.not | 1 | 1 | 1 | Logical NOT |
-| boolean.nand | 2 | 1 | 1 | NOT AND |
-| boolean.nor | 2 | 1 | 1 | NOT OR |
-| boolean.xor | 2 | 1 | 2 | Exclusive OR |
-| boolean.xnor | 2 | 1 | 2 | Exclusive NOR |
-| boolean.implies | 2 | 1 | 1 | Logical implication (A implies B) |
-| boolean.biimplies | 2 | 1 | 2 | Biconditional (A iff B) |
-### Arithmetic: Addition
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.halfadder | 2 bits | sum, carry | Basic half adder |
-| arithmetic.fulladder | 3 bits (a, b, cin) | sum, cout | Full adder with carry |
-| arithmetic.ripplecarry2bit | 2x 2-bit | 2-bit sum, cout | 2-bit ripple carry adder |
-| arithmetic.ripplecarry4bit | 2x 4-bit | 4-bit sum, cout | 4-bit ripple carry adder |
-| arithmetic.ripplecarry8bit | 2x 8-bit | 8-bit sum, cout | 8-bit ripple carry adder |
-| arithmetic.adc8bit | 2x 8-bit + cin | 8-bit sum, cout | Add with carry |
-| arithmetic.incrementer8bit | 8-bit | 8-bit | Add 1 to input |
-| arithmetic.decrementer8bit | 8-bit | 8-bit | Subtract 1 from input |
-### Arithmetic: Subtraction
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.sub8bit | 2x 8-bit | 8-bit diff, borrow | 8-bit subtraction |
-| arithmetic.sbc8bit | 2x 8-bit + bin | 8-bit diff, bout | Subtract with borrow |
-| arithmetic.neg8bit | 8-bit | 8-bit | Two's complement negation |
-| arithmetic.absolutedifference8bit | 2x 8-bit | 8-bit | |A - B| |
-### Arithmetic: Multiplication
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.multiplier2x2 | 2x 2-bit | 4-bit product | 2x2 multiplier |
-| arithmetic.multiplier4x4 | 2x 4-bit | 8-bit product | 4x4 multiplier |
-| arithmetic.multiplier8x8 | 2x 8-bit | 16-bit product | 8x8 multiplier |
-### Arithmetic: Division
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.div8bit | 8-bit dividend, 8-bit divisor | 8-bit quotient, 8-bit remainder | Full 8-bit division |
-The divider uses a restoring division algorithm with 8 stages. Each stage shifts the partial remainder, compares against the divisor, conditionally subtracts, and records one quotient bit. The implementation contains nearly 2,000 tensors and is the most complex circuit in the model.
-### Arithmetic: Comparison
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.greaterthan8bit | 2x 8-bit | 1 bit | A > B |
-| arithmetic.lessthan8bit | 2x 8-bit | 1 bit | A < B |
-| arithmetic.greaterorequal8bit | 2x 8-bit | 1 bit | A >= B |
-| arithmetic.lessorequal8bit | 2x 8-bit | 1 bit | A <= B |
-| arithmetic.equality8bit | 2x 8-bit | 1 bit | A == B |
-| arithmetic.cmp8bit | 2x 8-bit | flags | Full comparison with flags |
-| arithmetic.max8bit | 2x 8-bit | 8-bit | Maximum of two values |
-| arithmetic.min8bit | 2x 8-bit | 8-bit | Minimum of two values |
-### Arithmetic: Shifts and Rotates
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| arithmetic.asr8bit | 8-bit | 8-bit | Arithmetic shift right (sign-preserving) |
-| arithmetic.rol8bit | 8-bit | 8-bit, cout | Rotate left |
-| arithmetic.ror8bit | 8-bit | 8-bit, cout | Rotate right |
-### Threshold Gates
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| threshold.oneoutof8 | 8 bits | 1 bit | At least 1 of 8 inputs is 1 |
-| threshold.twooutof8 | 8 bits | 1 bit | At least 2 of 8 inputs are 1 |
-| threshold.threeoutof8 | 8 bits | 1 bit | At least 3 of 8 inputs are 1 |
-| threshold.fouroutof8 | 8 bits | 1 bit | At least 4 of 8 inputs are 1 |
-| threshold.fiveoutof8 | 8 bits | 1 bit | At least 5 of 8 inputs are 1 |
-| threshold.sixoutof8 | 8 bits | 1 bit | At least 6 of 8 inputs are 1 |
-| threshold.sevenoutof8 | 8 bits | 1 bit | At least 7 of 8 inputs are 1 |
-| threshold.alloutof8 | 8 bits | 1 bit | All 8 inputs are 1 |
-| threshold.majority | n bits | 1 bit | More than half of inputs are 1 |
-| threshold.minority | n bits | 1 bit | Fewer than half of inputs are 1 |
-### Modular Arithmetic
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| modular.mod2 | 8-bit | 1 bit | Divisible by 2 |
-| modular.mod3 | 8-bit | 1 bit | Divisible by 3 |
-| modular.mod4 | 8-bit | 1 bit | Divisible by 4 |
-| modular.mod5 | 8-bit | 1 bit | Divisible by 5 |
-| modular.mod6 | 8-bit | 1 bit | Divisible by 6 |
-| modular.mod7 | 8-bit | 1 bit | Divisible by 7 |
-| modular.mod8 | 8-bit | 1 bit | Divisible by 8 |
-| modular.mod9 | 8-bit | 1 bit | Divisible by 9 |
-| modular.mod10 | 8-bit | 1 bit | Divisible by 10 |
-| modular.mod11 | 8-bit | 1 bit | Divisible by 11 |
-| modular.mod12 | 8-bit | 1 bit | Divisible by 12 |
-Powers of 2 (mod 2, 4, 8) use single-layer circuits that check only the relevant low bits. Other moduli use multi-layer networks that detect all sums (0-255) that are divisible by the modulus.
-### Pattern Recognition
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| pattern_recognition.popcount | 8 bits | count | Count of 1 bits (population count) |
-| pattern_recognition.allzeros | 8 bits | 1 bit | All bits are 0 |
-| pattern_recognition.allones | 8 bits | 1 bit | All bits are 1 |
-| pattern_recognition.onehotdetector | 8 bits | 1 bit | Exactly one bit is 1 |
-| pattern_recognition.leadingones | 8 bits | count | Count of leading 1 bits |
-| pattern_recognition.trailingones | 8 bits | count | Count of trailing 1 bits |
-| pattern_recognition.symmetry8bit | 8 bits | 1 bit | Bit pattern is palindromic |
-| pattern_recognition.alternating8bit | 8 bits | 1 bit | Bits alternate (01010101 or 10101010) |
-| pattern_recognition.hammingdistance8bit | 2x 8-bit | count | Number of differing bits |
-### Combinational
-| Circuit | Inputs | Outputs | Description |
-|---------|--------|---------|-------------|
-| combinational.decoder3to8 | 3-bit select | 8 one-hot | 3-to-8 decoder |
-| combinational.encoder8to3 | 8-bit one-hot | 3-bit | 8-to-3 priority encoder |
-| combinational.multiplexer2to1 | 2 data, 1 select | 1 | 2-to-1 multiplexer |
-| combinational.multiplexer4to1 | 4 data, 2 select | 1 | 4-to-1 multiplexer |
-| combinational.multiplexer8to1 | 8 data, 3 select | 1 | 8-to-1 multiplexer |
-| combinational.demultiplexer1to2 | 1 data, 1 select | 2 | 1-to-2 demultiplexer |
-| combinational.demultiplexer1to4 | 1 data, 2 select | 4 | 1-to-4 demultiplexer |
-| combinational.demultiplexer1to8 | 1 data, 3 select | 8 | 1-to-8 demultiplexer |
-| combinational.barrelshifter8bit | 8-bit data, 3-bit shift | 8-bit | Barrel shifter |
-| combinational.priorityencoder8bit | 8 bits | 3-bit + valid | Priority encoder |
----
-## Evaluation and Verification
-The model includes a comprehensive evaluation suite (`arithmetic_eval.py`) that tests every circuit exhaustively where feasible.
-### Test Coverage
-| Category | Tests | Method |
-|----------|-------|--------|
-| Boolean gates | 34 | All input combinations |
-| Half/full adders | 12 | All input combinations |
-| 2-bit adder | 16 | All 4x4 combinations |
-| 4-bit adder | 256 | All 16x16 combinations |
-| 8-bit adder | 65,536 | All 256x256 combinations |
-| Comparators | 262,144 | All 256x256 combinations (4 comparators) |
-| 8x8 multiplier | 357 | Strategic sample (edges, powers of 2, patterns) |
-| 8-bit divider | 1,108 | Strategic sample |
-| Threshold gates | 2,048 | All 256 values for each of 8 gates |
-| Modular arithmetic | 2,816 | All 256 values for each of 11 moduli |
-| Pattern recognition | 1,537 | Exhaustive for detectors, sampled for counters |
-| Combinational | 854 | All relevant combinations |
-### Running the Evaluator
 ```bash
-python arithmetic_eval.py --model arithmetic.safetensors --device cpu
-```
-Output:
-```
-Loading model from arithmetic.safetensors...
-  Found 5094 tensors
-  Categories: ['arithmetic', 'boolean', 'combinational', 'modular', 'pattern_recognition', 'threshold']
-=== BOOLEAN GATES ===
-  boolean.and: 4/4 [PASS]
-  boolean.or: 4/4 [PASS]
-  ...
-============================================================
-SUMMARY
-============================================================
-Total: 339500/339500 (100.0000%)
-Time: 136.78s
-All circuits passed!
-============================================================
-TENSOR COVERAGE: 5094/5094 (100.00%)
-All tensors tested!
-Fitness: 1.000000
-```
-### Verification Guarantees
-- **100% test pass rate**: Every test passes
-- **100% tensor coverage**: Every tensor in the model is accessed during testing
-- **Exhaustive where feasible**: All circuits with <= 16 input bits are tested exhaustively
-- **Strategic sampling for large circuits**: Multiplier and divider use carefully chosen test vectors
----
-## Intended Use Cases
-### 1. Frozen Arithmetic Layer for Language Models
-The primary intended use is embedding this arithmetic core as a frozen layer within a language model. The concept:
-- The LLM learns to recognize when arithmetic is needed
-- Interface layers (trained) convert token representations to binary inputs
-- The frozen arithmetic layer computes the exact result
-- Interface layers convert binary outputs back to token space
-This separates the "knowing when to compute" problem (which LLMs can learn) from the "computing correctly" problem (which is solved by the frozen weights).
-### 2. Neuromorphic Hardware
-Threshold logic maps naturally to neuromorphic computing substrates. Each gate is a single neuron. The weights are sparse and small (typically -2 to +2). This model could serve as a reference implementation for arithmetic on neuromorphic chips.
-### 3. Verified Computing
-Because every circuit has been exhaustively tested, this model provides a verified computing substrate. Applications requiring guaranteed correctness can use these weights with confidence.
-### 4. Educational Resource
-The model serves as a complete, working example of how digital logic maps to neural network weights. Students can inspect the weights, trace signal flow, and understand the correspondence between Boolean algebra and threshold logic.
-### 5. Baseline for Pruning Research
-The model provides a known-correct starting point for pruning and compression research. How aggressively can we prune while maintaining correctness? Which tensors are most compressible? These questions can be explored with ground truth.
----
-## Integration with Language Models
-We envision integration following this architecture:
 ```
-[Token Embeddings]
-        |
-        v
-[Transformer Layers (trainable)]
-        |
-        v
-[Arithmetic Router (trainable)] -- decides whether arithmetic is needed
-        |
-        v
-[BitExtractor (trainable)] -- converts activations to binary inputs
-        |
-        v
-[Threshold Calculus Core (FROZEN)] -- computes exact arithmetic
-        |
-        v
-[BitInjector (trainable)] -- converts binary outputs back to activations
-        |
-        v
-[Transformer Layers (trainable)]
-        |
-        v
-[Output]
-```
-The key insight is that the model learns call dispatch, not computation. The trainable components learn:
-- When to invoke arithmetic circuits
-- How to extract operands from the representation
-- How to interpret and integrate results
-The actual arithmetic is handled by frozen, verified weights that cannot drift or hallucinate.
-### Interface Layer Design
-The BitExtractor must learn to:
-1. Identify which activation dimensions encode numerical values
-2. Convert floating-point activations to 8-bit binary representations
-3. Route to the appropriate arithmetic circuit
-The BitInjector must learn to:
-1. Interpret binary results
-2. Convert back to the model's activation space
-3. Integrate results with ongoing computation
-These interface layers are small and trainable. The bulk of the arithmetic (5,094 tensors) remains frozen.
----
-## Pruning Experiments
-A key research direction is pruning. The current model uses canonical, human-designed circuits. These are not necessarily optimal for neural network representations. Several questions arise:
-### Weight Magnitude Pruning
-Can we zero out small weights while maintaining correctness? Initial experiments suggest that threshold logic is sensitive to weight changes because the decision boundary must be exact. A weight of 0.99 instead of 1.0 might flip outputs for edge cases.
-### Structural Pruning
-Can we remove entire neurons or layers? Some circuits may have redundant paths. The two-layer XOR implementation, for instance, might have alternative single-layer approximations for specific use cases.
-### Knowledge Distillation
-Can we train smaller networks to mimic the larger verified networks? This would trade verification for compression.
-### Quantization
-The current weights are float32 but only take values in a small set (typically -2, -1, 0, 1, 2). Aggressive quantization to int8 or even int4 should be possible with no loss.
-### Sparsity Patterns
-Many weights are zero. Converting to sparse representations could significantly reduce memory and computation.
-We look forward to exploring how extreme we can push these compressions while maintaining 100% correctness. The verified nature of the model provides ground truth for evaluating any compression scheme.
----
-## Limitations
-### Bit Width
-The model implements 8-bit arithmetic. Larger operands require chaining operations using carry propagation. This is possible but requires external orchestration.
-### No Floating Point
-The model only supports integer arithmetic. Floating-point operations (which LLMs are frequently asked to perform) are not implemented. This is the most significant gap for practical LLM integration. Adding IEEE 754 floating-point support is a priority for future work.
-### No Memory
-The model is purely combinational. There are no flip-flops, registers, or memory elements. State must be managed externally.
-### Interface Complexity
-Integrating with an LLM requires training interface layers. The optimal architecture for these layers is an open research question.
-### Verification Scope
-While we have tested exhaustively where feasible, the 8x8 multiplier and 8-bit divider use strategic sampling rather than exhaustive testing. Full exhaustive testing would require 2^16 = 65,536 tests for the multiplier and careful handling of division by zero.
----
 ## Roadmap
-Goal: Complete arithmetic coprocessor for LLM mathematical reasoning.
-### Completed
-#### Float16 Core Arithmetic
-- [x] `float16.add` — IEEE 754 addition (~998 gates)
-- [x] `float16.sub` — IEEE 754 subtraction
-- [x] `float16.mul` — IEEE 754 multiplication (~1302 gates)
-- [x] `float16.div` — IEEE 754 division (~1854 gates)
-- [x] `float16.neg` — sign flip
-- [x] `float16.abs` — absolute value
-- [x] `float16.cmp` — comparison
-#### Float16 Utilities
-- [x] `float16.unpack` — extract sign, exponent, mantissa
-- [x] `float16.pack` — assemble components
-- [x] `float16.normalize` — CLZ-based normalization
-- [x] `float16.toint` — convert to int16
-- [x] `float16.fromint` — convert from int16
-#### Integer Arithmetic (8-bit)
-- [x] Adders (half, full, ripple carry 2/4/8 bit)
-- [x] Subtraction, negation
-- [x] Multiplication (2x2, 4x4, 8x8)
-- [x] Division (8-bit with remainder)
-- [x] Comparators (all relations)
-- [x] CLZ (8-bit and 16-bit)
-#### Logic and Patterns
-- [x] Boolean gates (AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES)
-- [x] Threshold gates (k-of-n for k=1..8)
-- [x] Modular arithmetic (mod 2-12)
-- [x] Pattern recognition (popcount, one-hot, symmetry)
-- [x] Combinational (mux, demux, encoder, decoder, barrel shifter)
-- [x] Shifts and rotates
-#### Infrastructure
-- [x] Self-documenting .inputs tensors
-- [x] Signal registry in safetensors metadata
-- [x] Full circuit evaluation with topological sort
-- [x] Comprehensive test suite (7,177 tests, 100% pass)
----
-### High Priority — Core Mathematical Functions
-#### Powers and Roots (float16)
-- [ ] `float16.sqrt` — square root via Newton-Raphson or digit-by-digit
-- [ ] `float16.rsqrt` — reciprocal square root (useful for normalization)
-- [ ] `float16.pow` — x^y for arbitrary y (via exp/ln)
-- [ ] `float16.sq` — x² (optimized special case)
-- [ ] `float16.cube` — x³ (optimized special case)
-- [ ] `float16.cbrt` — cube root
-#### Exponentials and Logarithms (float16)
-- [ ] `float16.exp` — e^x via range reduction + polynomial
-- [ ] `float16.exp2` — 2^x (simpler, useful for pow)
-- [ ] `float16.ln` — natural logarithm
-- [ ] `float16.log2` — base-2 logarithm (extract exponent + correction)
-- [ ] `float16.log10` — base-10 logarithm
-#### Trigonometry (float16, CORDIC)
-- [ ] `float16.sin` — sine
-- [ ] `float16.cos` — cosine
-- [ ] `float16.tan` — tangent (sin/cos)
-- [ ] `float16.sincos` — both sin and cos (CORDIC gives both)
-- [ ] `float16.asin` — arc sine
-- [ ] `float16.acos` — arc cosine
-- [ ] `float16.atan` — arc tangent
-- [ ] `float16.atan2` — two-argument arc tangent (quadrant-aware)
-#### Hyperbolic Functions (float16)
-- [ ] `float16.sinh` — hyperbolic sine
-- [ ] `float16.cosh` — hyperbolic cosine
-- [ ] `float16.tanh` — hyperbolic tangent (critical for ML activations)
----
-### Medium Priority — Extended Operations
-#### Rounding and Truncation (float16)
-- [ ] `float16.floor` — round toward -∞
-- [ ] `float16.ceil` — round toward +∞
-- [ ] `float16.trunc` — round toward zero
-- [ ] `float16.round` — round to nearest
-- [ ] `float16.frac` — fractional part
-- [ ] `float16.fmod` — floating-point modulo
-#### Comparisons and Selection (float16)
-- [ ] `float16.min` — minimum of two values
-- [ ] `float16.max` — maximum of two values
-- [ ] `float16.clamp` — clamp to range [lo, hi]
-- [ ] `float16.sign` — sign function (-1, 0, +1)
-- [ ] `float16.copysign` — copy sign from y to x
-- [ ] `float16.isnan` — NaN test
-- [ ] `float16.isinf` — infinity test
-- [ ] `float16.isfinite` — finite test
-#### Integer Arithmetic (16-bit)
-- [ ] `arithmetic.add16` — 16-bit addition
-- [ ] `arithmetic.sub16` — 16-bit subtraction
-- [ ] `arithmetic.mul16` — 16-bit multiplication
-- [ ] `arithmetic.div16` — 16-bit division with remainder
-- [ ] `arithmetic.sqrt16` — 16-bit integer square root
-- [ ] `arithmetic.abs16` — 16-bit absolute value
-#### Number Theory
-- [ ] `arithmetic.gcd` — greatest common divisor (Euclidean)
-- [ ] `arithmetic.lcm` — least common multiple
-- [ ] `arithmetic.isprime8` — primality test (8-bit)
-- [ ] `arithmetic.factorial8` — factorial (8! = 40320 fits in 16-bit)
-- [ ] `arithmetic.comb` — binomial coefficient nCr
-- [ ] `arithmetic.perm` — permutation nPr
----
-### Lower Priority — Specialized Functions
-#### ML Activation Functions (float16)
-- [ ] `float16.relu` — max(0, x)
-- [ ] `float16.leaky_relu` — x if x > 0 else αx
-- [ ] `float16.sigmoid` — 1/(1+e^(-x))
-- [ ] `float16.softplus` — ln(1+e^x)
-- [ ] `float16.gelu` — Gaussian error linear unit
-- [ ] `float16.silu` — x * sigmoid(x)
-#### Constants (float16 encoded)
-- [ ] `const.pi` — π = 3.14159...
-- [ ] `const.e` — e = 2.71828...
-- [ ] `const.phi` — φ = 1.61803... (golden ratio)
-- [ ] `const.sqrt2` — √2 = 1.41421...
-- [ ] `const.ln2` — ln(2) = 0.69314...
-- [ ] `const.log2e` — log₂(e) = 1.44269...
-#### Statistics (float16, multi-input)
-- [ ] `stats.sum` — sum of array
-- [ ] `stats.mean` — arithmetic mean
-- [ ] `stats.min_array` — minimum of array
-- [ ] `stats.max_array` — maximum of array
-- [ ] `stats.variance` — population variance
-- [ ] `stats.stddev` — standard deviation
-#### Bit Manipulation (16-bit)
-- [ ] `bits.popcnt16` — population count
-- [ ] `bits.clz16` — count leading zeros (done)
-- [ ] `bits.ctz16` — count trailing zeros
-- [ ] `bits.reverse16` — bit reversal
-- [ ] `bits.bswap16` — byte swap
----
-### Infrastructure TODO
-#### Testing
-- [ ] Exhaustive float16 tests for new operations
-- [ ] Edge case coverage (±0, ±inf, NaN, subnormals)
-- [ ] Accuracy tests against reference implementations
-#### Documentation
-- [ ] Circuit diagrams for CORDIC, Newton-Raphson
-- [ ] Tutorial: implementing new circuits
-- [ ] Tutorial: LLM integration patterns
-- [ ] API reference for all operations
-#### Optimization
-- [ ] Gate count reduction analysis
-- [ ] Critical path optimization
-- [ ] Weight quantization study (int8/int4)
----
-## Technical Details
-### Tensor Naming Convention
-Tensors follow a hierarchical naming scheme:
-```
-category.circuit.component.subcomponent.layer.type
-```
-Examples:
-- `boolean.and.weight` -- weights for AND gate
-- `boolean.and.bias` -- bias for AND gate
-- `arithmetic.fulladder.ha1.sum.layer1.or.weight` -- first half adder, sum output, layer 1, OR gate weights
-- `arithmetic.div8bit.stage3.mux5.and0.bias` -- divider stage 3, mux for bit 5, AND gate 0, bias
-### Weight Conventions
-- Weights are stored as 1D tensors
-- Biases are stored as scalar tensors (shape [1]) or sometimes as single floats
-- All values are float32 but only use a small discrete set of values
-- Common weight values: -2, -1, 0, 1, 2
-- Common bias values: -2, -1, 0, 1
-### Activation Function
-All circuits assume a Heaviside step activation:
-```python
-def heaviside(x):
-    return (x >= 0).float()
-```
-This is critical. Using ReLU, sigmoid, or other activations will produce incorrect results.
-### Routing Information
-The `routing.json` file contains connectivity information for complex circuits, particularly the divider. This maps gate names to their input sources, enabling correct signal propagation during evaluation.
----
-## Citation
-If you use this work, please cite:
-```bibtex
-@misc{threshold-calculus,
-  author = {Norton, Charles},
-  title = {Threshold Calculus: Verified Arithmetic Circuits as Neural Network Weights},
-  year = {2025},
-  publisher = {Hugging Face},
-  url = {https://huggingface.co/phanerozoic/threshold-calculus}
-}
-```
----
 ## License
-This model is released under the Apache 2.0 License. You are free to use, modify, and distribute it for any purpose, including commercial applications.
----
-## Acknowledgments
-This project builds on decades of research in threshold logic, digital design, and neural network theory. The insight that threshold gates are equivalent to perceptrons dates to the 1960s. We are grateful to the open-source communities around PyTorch, safetensors, and Hugging Face for the infrastructure that makes this work possible.
----
-## Contact
-For questions, suggestions, or collaboration inquiries, please open an issue on this repository or contact the author through Hugging Face.

 # Threshold Calculus
+Digital circuits encoded as neural network weights.
+Each gate is a threshold logic unit: `output = step(weights · inputs + bias)`. The step function fires when the weighted sum ≥ 0. This maps digital logic to tensor operations.
+## What's Here
+| File | Description |
+|------|-------------|
+| `arithmetic.safetensors` | 23,494 tensors encoding 7,828 gates |
+| `eval.py` | Test harness (206,124 tests) |
+| `convert_to_explicit_inputs.py` | Builds tensors and infers gate connectivity |
+| `routing.json` | Signal routing for complex circuits |
+## Circuits
+**Float16 (IEEE 754)**
+- `float16.add`, `float16.sub`, `float16.mul`, `float16.div`
+- `float16.neg`, `float16.abs`, `float16.cmp`
+- `float16.toint`, `float16.fromint`
+- `float16.pack`, `float16.unpack`, `float16.normalize`
+Handles NaN, Inf, zero, subnormals. Mantissa alignment via barrel shifter. Normalization via CLZ.
+**8-bit Integer**
+- Adders: half, full, ripple carry (2/4/8 bit), add-with-carry
+- Subtraction: sub8bit, sbc8bit, neg8bit
+- Comparison: cmp8bit, equality8bit
+- Shifts: asr8bit, rol8bit, ror8bit
+- CLZ: 8-bit and 16-bit
+**Modular Arithmetic**
+- mod2 through mod12 (divisibility testing)
+**Boolean**
+- AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES
+**Threshold**
+- k-of-n gates (1-of-8 through 8-of-8)
+- majority, minority, atleastk, atmostk, exactlyk
+**Pattern Recognition**
+- popcount, allzeros, allones, onehotdetector
+- symmetry8bit, alternating8bit, hammingdistance8bit
+- leadingones, trailingones, runlength
+**Combinational**
+- decoder3to8, encoder
+- multiplexer (2/4/8 to 1), demultiplexer (1 to 2/4/8)
+- barrelshifter8bit, priorityencoder8bit
+## How It Works
+A threshold gate computes:
+```
+output = 1 if (w₁x₁ + w₂x₂ + ... + wₙxₙ + bias) >= 0 else 0
+```
+This is a perceptron with Heaviside step activation.
+**AND gate**: weights = [1, 1], bias = -1.5
+- (0,0): 0 + 0 - 1.5 = -1.5 < 0 → 0
+- (0,1): 0 + 1 - 1.5 = -0.5 < 0 → 0
+- (1,0): 1 + 0 - 1.5 = -0.5 < 0 → 0
+- (1,1): 1 + 1 - 1.5 = 0.5 ≥ 0 → 1
+**XOR** requires two layers (not linearly separable):
+- Layer 1: OR and NAND in parallel
+- Layer 2: AND of both outputs
+## Self-Documenting Format
+Each gate has three tensors in `arithmetic.safetensors`:
+- `.weight` — input weights
+- `.bias` — threshold
+- `.inputs` — int64 tensor of signal IDs
+Signal registry in metadata maps IDs to names:
 ```python
 from safetensors import safe_open
 with safe_open('arithmetic.safetensors', framework='pt') as f:
     registry = json.loads(f.metadata()['signal_registry'])
+    inputs = f.get_tensor('boolean.and.inputs')
+    names = [registry[str(i.item())] for i in inputs]
+    # ['$a', '$b']
 ```
+Signal naming:
+- `$name` — circuit input (e.g., `$a`, `$dividend[0]`)
+- `#0`, `#1` — constants
+- `gate.path` — output of another gate
+## Running Eval
 ```bash
+python eval.py
 ```
+Tests all circuits exhaustively. 8-bit operations test all 256 or 65,536 input combinations. Float16 tests cover special cases (NaN, Inf, ±0, subnormals) plus normal arithmetic.
+## Development History
+Started as an 8-bit CPU project. Built boolean gates, then arithmetic (adders → multipliers → dividers), then CPU control logic. The CPU worked but the arithmetic core turned out to be the useful part, so it was extracted.
+Float16 was added later. The commit history shows the iterative process—float16.add went through multiple rounds of bug fixes for edge cases (zero handling, sign logic, normalization). Mul and div required multi-bit carry infrastructure.
+## Project Origin
+This began as an attempt to build a complete threshold-logic CPU. The CPU is in a separate repo (phanerozoic/8bit-threshold-computer). This repo focuses on the arithmetic core.
 ## Roadmap
+**Done:**
+- Float16 core (add/sub/mul/div)
+- Float16 utilities (pack/unpack/normalize/conversions)
+- 8-bit integer arithmetic
+- Boolean, threshold, modular, pattern recognition, combinational
+**Next:**
+- Float16 sqrt, rsqrt, pow
+- Float16 exp, ln, log2
+- Float16 trig (sin, cos, tan via CORDIC)
+- Float16 tanh (ML activation)
+**Cleanup:**
+- Rip out 8-bit integer circuits, replace with 16-bit
+- 8-bit was scaffolding for float16 development, not the product
 ## License
+Apache 2.0