CharlesCNorton

Add order of operations circuit (arithmetic.expr_add_mul)

d15242d 3 months ago

22.1 kB

license: mit
tags:
  - threshold-logic
  - neuromorphic
  - computer-architecture
  - turing-complete
  - loihi
  - truenorth
  - akida

8bit-threshold-computer

A Turing-complete 8-bit CPU implemented entirely as threshold logic gates.

Every logic gate is a threshold neuron: output = 1 if (Σ wᵢxᵢ + b) ≥ 0 else 0

Tensors:    11,581
Parameters: 8,290,134 (full CPU) / 32,397 (pure ALU for LLM)

What Is This?

A complete 8-bit processor where every operation—from Boolean logic to arithmetic to control flow—is implemented using only weighted sums and step functions. No traditional gates.

Component	Specification
Registers	4 × 8-bit general purpose
Memory	Configurable: 0B (pure ALU) to 64KB (full CPU)
ALU	16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, MUL, DIV, INC, DEC, NEG, ROL, ROR, CMP)
Flags	Zero, Negative, Carry, Overflow
Control	JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP

Turing complete. Verified with loops, conditionals, recursion, and self-modification.

Execution Model

A self-contained, autonomous computational machine:

Pure tensor computation: State in, state out
Frozen circuits: Integer weights, Heaviside activation
ACT execution: Internal loop until HALT
No external orchestration: One forward pass = complete program execution

            ┌─────────────────────────────┐
            │      Initial State          │
            │  [PC|Regs|Flags|Memory...]  │
            └─────────────┬───────────────┘
                          ▼
            ┌─────────────────────────────┐
            │                             │
            │   Threshold Circuit Layer   │
            │                             │
            │  ┌───────────────────────┐  │
            │  │   Fetch: PC → Instr   │  │
            │  ├───────────────────────┤  │
            │  │   Decode: Opcode/Ops  │  │
            │  ├───────────────────────┤  │
            │  │   Execute: ALU/Mem    │  │
            │  ├───────────────────────┤  │
            │  │   Writeback: Results  │  │
            │  ├───────────────────────┤  │
            │  │   PC Update           │  │
            │  └───────────┬───────────┘  │
            │              │              │
            │         ┌────▼────┐         │
            │         │ HALTED? │         │
            │         └────┬────┘         │
            │              │              │
            │         no ──┴── yes        │
            │          │       │          │
            │          ▼       ▼          │
            │       [loop]   [exit]       │
            │                             │
            └─────────────┬───────────────┘
                          ▼
            ┌─────────────────────────────┐
            │       Final State           │
            │  [PC|Regs|Flags|Memory...]  │
            └─────────────────────────────┘

Instruction Set

Opcode	Mnemonic	Operation
0x0	ADD	R[d] = R[a] + R[b]
0x1	SUB	R[d] = R[a] - R[b]
0x2	AND	R[d] = R[a] & R[b]
0x3	OR	R[d] = R[a] \| R[b]
0x4	XOR	R[d] = R[a] ^ R[b]
0x5	SHL	R[d] = R[a] << 1
0x6	SHR	R[d] = R[a] >> 1
0x7	MUL	R[d] = R[a] * R[b]
0x8	DIV	R[d] = R[a] / R[b]
0x9	CMP	flags = R[a] - R[b]
0xA	LOAD	R[d] = M[addr]
0xB	STORE	M[addr] = R[s]
0xC	JMP	PC = addr
0xD	Jcc	PC = addr if cond (imm8[2:0]: 0=Z,1=NZ,2=C,3=NC,4=N,5=P,6=V,7=NV)
0xE	CALL	push PC; PC = addr
0xF	HALT	stop execution

Design Principles

Autonomy: Machine runs without external logic
Purity: forward(state) → state', no side effects
Transparency: All weights inspectable, all operations traceable
Universality: Turing complete, runs arbitrary programs

Background

Threshold Logic

A threshold gate computes a Boolean function by taking a weighted sum of binary inputs and comparing it to a threshold. If the sum meets or exceeds the threshold, the output is 1; otherwise, 0. This can be expressed as a neuron with Heaviside step activation: output = 1 if (Σ wᵢxᵢ + b) ≥ 0 else 0, where weights wᵢ and bias b are integers.

Threshold gates are strictly more powerful than standard Boolean gates. A single threshold gate can compute any linearly separable Boolean function—this includes AND, OR, NAND, NOR, and many others that require multiple levels of conventional gates. Functions that are not linearly separable (such as XOR or parity) require multiple threshold gates arranged in layers.

Historical Development

Warren McCulloch and Walter Pitts introduced the threshold neuron model in 1943, proving that networks of such neurons could compute any Boolean function. This work preceded both the perceptron and modern neural networks, establishing the theoretical foundation for neural computation.

The 1960s saw significant development in threshold logic synthesis. Researchers including Saburo Muroga, Robert McNaughton, and Michael Dertouzos developed algebraic methods for determining whether a Boolean function could be implemented as a single threshold gate, and if so, how to calculate appropriate weights. This work produced systematic techniques for threshold gate design but focused on individual gates rather than complete systems.

Frank Rosenblatt's Mark I Perceptron (1957-1960) implemented threshold neurons in hardware using potentiometers for weights, but it was a pattern classifier that learned its weights through training—the final weight configurations were not published. Bernard Widrow's ADALINE and MADALINE systems (1960-1963) similarly used adaptive threshold elements with weights learned via the LMS algorithm.

Hava Siegelmann and Eduardo Sontag proved in the 1990s that recurrent neural networks are Turing complete. Their construction, however, relied on continuous sigmoid activation functions with infinite precision—not the discrete step function used in threshold logic. Other theoretical work on neural Turing machines and differentiable computers followed similar patterns: proving computational universality using continuous, differentiable activations suitable for gradient-based training.

Neuromorphic Hardware

Modern neuromorphic processors implement large arrays of configurable threshold-like neurons in silicon:

Intel Loihi (2017) provides 128 neuromorphic cores with programmable synaptic weights, spike-based communication, and on-chip learning. The architecture supports integer weights and configurable neuron dynamics.

IBM TrueNorth (2014) integrates one million neurons and 256 million synapses in a 4096-core array. Each neurosynaptic core implements 256 neurons with configurable weights and thresholds. The chip was designed as an alternative to von Neumann architecture rather than an implementation of one.

BrainChip Akida (2021) targets edge deployment with event-based processing and integer weights. The architecture supports standard neural network operations mapped onto neuromorphic primitives.

SpiNNaker (University of Manchester) uses ARM processor cores to simulate spiking neural networks at scale. The platform has hosted various neural models but is simulation-based rather than native neuromorphic silicon.

Despite the availability of these platforms, published work has focused on neural network inference, sensory processing, and pattern recognition. A 2024 paper demonstrated basic logic gates, adders, and decoders on SpiNNaker and Dynap-SE1, describing this as "a first step toward the construction of a spiking computer"—the implementation lacked instruction fetch, program counter, memory systems, and control logic.

This Implementation

The weights in this repository implement a complete 8-bit computer: registers, ALU with 16 operations, status flags, conditional branching, subroutine calls, stack operations, and memory access. Every component is built from threshold neurons with integer weights. The weight configurations are published in safetensors format for direct loading and deployment.

Circuit Categories

Category	Circuits	Examples
Boolean	9	AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES
Arithmetic	18	Half/full adder, 2/4/8-bit ripple carry, comparators
ALU	3	8-bit ALU, control decoder, flag computation
Combinational	10	MUX (2:1, 4:1, 8:1), DEMUX, encoders, decoders
Control Flow	16	JMP, conditional jumps, CALL, RET, PUSH, POP
Error Detection	11	Parity (XOR tree), checksum, CRC, Hamming
Modular	11	Divisibility by 2-12 (multi-layer for non-powers-of-2)
Threshold	13	k-of-n gates, majority, minority, exactly-k
Pattern	10	Popcount, leading/trailing ones, symmetry
Memory	3	N-bit addr decoder, 2^N×8 read mux, write cells (configurable, packed)

Usage

import torch
from safetensors.torch import load_file

tensors = load_file("neural_computer.safetensors")

def heaviside(x):
    return (x >= 0).float()

# AND gate: fires when both inputs are 1
w = tensors['boolean.and.weight']  # [1, 1]
b = tensors['boolean.and.bias']    # [-2]

for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
    inp = torch.tensor([a, b_in], dtype=torch.float32)
    out = heaviside(inp @ w + b)
    print(f"AND({a}, {b_in}) = {int(out.item())}")

State Tensor Layout

All multi-bit fields are MSB-first (index 0 is the most-significant bit).

[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]

Where N = address bits (configurable: 0-16).

Flags are ordered as: Z, N, C, V.

Control bits are ordered as: HALT, MEM_WE, MEM_RE, RESERVED.

Memory Profile	Addr Bits	Memory Size	State Bits
Full CPU	16	64KB	524,376
Reduced	12	4KB	32,856
Scratchpad	8	256B	2,104
Registers	4	16B	184
Pure ALU	0	0B	56

Instruction Encoding (16-bit)

All instruction bits are MSB-first.

15..12  11..10  9..8  7..0
opcode  rd      rs    imm8

Interpretation:

R-type: rd = rd op rs (imm8 ignored).
I-type: rd = op rd, imm8 (rs ignored).
Address-extended: LOAD, STORE, JMP, JZ, CALL consume the next word as a 16-bit address (big-endian). imm8 is reserved, and the PC skips 4 bytes when the jump is not taken.

Verification

python eval.py
python threshold_cpu.py

Verification Status

Category	Status	Notes
Boolean gates	Exhaustively tested	All 2^n input combinations
Arithmetic	Exhaustively tested	Full 8-bit range
ALU	Exhaustively tested	All operations, all inputs
Control flow	Exhaustively tested	Branch/jump conditions
Threshold	Exhaustively tested	k-of-n, majority, etc.
Modular (mod 3,5,6,7,9,10,11,12)	Exhaustively tested	Multi-layer, hand-constructed
Parity	Exhaustively tested	XOR tree, hand-constructed
Modular (mod 2,4,8)	Exhaustively tested	Single-layer, trivial

The modular arithmetic circuits for non-powers-of-2 and the parity circuits were hand-constructed because:

Divisibility by 3, 5, etc. is not linearly separable in binary
8-bit parity (XOR of all bits) requires a tree of XOR gates

All circuits pass exhaustive testing over their full input domains.

Tensor Naming Convention

{category}.{circuit}[.{layer}][.{component}].{weight|bias}

Examples:
  boolean.and.weight
  boolean.xor.layer1.neuron1.weight
  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
  modular.mod5.layer2.eq3.weight
  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias

Memory circuits are stored as packed tensors to keep the safetensors header size manageable
(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).

Hardware Compatibility

All weights are integers. All activations are Heaviside step. Designed for:

Intel Loihi — Neuromorphic research chip
IBM TrueNorth — 1M neuron chip
BrainChip Akida — Edge neuromorphic processor

LLM Integration

The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.

For LLM integration, use --memory-profile none to generate a pure ALU model (~32K params) without memory circuits.

Core Thesis

Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.

We solve this by embedding frozen, proven-correct arithmetic circuits directly into the transformer's MLP layers. The circuits use threshold logic (weighted sums + step activation), which is structurally compatible with neural network layers. We train only the interface layers that learn to:

Extract operands from token embeddings
Route computation through the circuits
Inject results back into the residual stream

The model learns call dispatch, not arithmetic. The arithmetic is already solved.

Architecture

Standard MLP block with parallel circuit path:

x ──┬── MLP path ────────────────┬── + ── output
    │                            │
    └── BitExtractor ── Circuit ─┴── BitInjector
                          │
                       Router (learned weighting)

Augmented MLP forward pass:

def forward(x):  # x: [batch, seq, d_model]
    # Original MLP path (unchanged)
    mlp_out = self.down_proj(silu(self.gate_proj(x)) * self.up_proj(x))

    # Circuit path (new)
    a_bits, b_bits = self.bit_extractor(x)       # [batch, seq, 8] each
    result_bits, carry = self.circuits.add_8bit(a_bits, b_bits)
    flags = self.compute_flags(result_bits, carry)
    circuit_delta = self.bit_injector(result_bits, flags)

    # Routing
    route_weights = self.router(x)  # [batch, seq, 2] softmax

    # Combine
    return mlp_out + route_weights[..., 1:2] * circuit_delta

Threshold Logic Fundamentals

A threshold gate computes:

output = 1  if  (Σ wᵢxᵢ + b) ≥ 0
         0  otherwise

Example gates:

AND: w=[1,1], b=-2
  AND(0,0) = H(-2) = 0
  AND(1,1) = H(0)  = 1

OR: w=[1,1], b=-1
  OR(0,1) = H(0) = 1
  OR(1,1) = H(1) = 1

XOR: requires 2 layers (not linearly separable)
  Layer 1: OR + NAND
  Layer 2: AND

Full adder = 2 half-adders + carry OR, ~4 threshold layers. 8-bit ripple carry = 8 chained full adders, ~32 threshold layers.

Interface Layers (Trainable)

BitExtractor — Maps embedding → two 8-bit operands:

class BitExtractor(nn.Module):
    def __init__(self, d_model):
        self.proj = nn.Linear(d_model, 16)

    def forward(self, x):
        logits = self.proj(x)
        bits = heaviside(logits)  # STE for training
        return bits[..., :8], bits[..., 8:]

BitInjector — Maps result bits → embedding delta:

class BitInjector(nn.Module):
    def __init__(self, d_model):
        self.proj = nn.Linear(16, d_model)
        self.scale = nn.Parameter(torch.tensor(0.1))

    def forward(self, result_bits, flags):
        combined = torch.cat([result_bits, flags], dim=-1)
        return self.proj(combined) * self.scale

Router — Decides when to use circuits:

class Router(nn.Module):
    def __init__(self, d_model):
        self.net = nn.Sequential(
            nn.Linear(d_model, 64), nn.ReLU(),
            nn.Linear(64, 2), nn.Softmax(dim=-1)
        )

Trainable Parameters

For SmolLM2-360M (d_model=960), augmenting 11 layers:

Component	Params/Layer
BitExtractor	15,376
BitInjector	16,321
Router	61,698
OpSelector	~31,000
Total	~124,395

11 layers × 124,395 = ~1.37M trainable parameters (0.38% of model)

Gradient Flow

Heaviside has zero gradient almost everywhere. We use Straight-Through Estimator (STE):

class HeavisideSTE(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        return (x >= 0).float()

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output  # pass through unchanged

Training Strategy

Data: Generate 8-bit arithmetic problems exhaustively (256×256 = 65,536 unique)
Loss: Cross-entropy on answer tokens only (prompt masked with -100)
Optimizer: AdamW on interface params only, lr=1e-4
Curriculum: Single-digit → two-digit → full 8-bit → adversarial (127+128, 255+1)

Inference

At inference, Heaviside is true step function—no approximation. If BitExtractor correctly extracts operands, the circuit will output the correct result. Circuit computation adds ~5-10% latency overhead.

Target Performance

Model	Baseline	Target
SmolLM2-360M	~5-10%	>95%

The interface generalizes to all 65,536 8-bit additions once trained—no memorization, the circuits compute.

Extension Roadmap

Parenthetical expressions ((5 + 3) × 2 = 16) — Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out. Adds complexity to extraction layer.
16-bit operations (0-65535) — Chain two 8-bit circuits with carry propagation. ADD16: low = ADD8(A_lo, B_lo), high = ADD8(A_hi, B_hi, carry_out). MUL16: four partial products + shift-add. Doubles operand extraction width.
Floating point arithmetic — IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas. Requires sign handling, overflow detection, and rounding logic.

Completed Extensions

3-operand addition (15 + 27 + 33 = 75) — arithmetic.add3_8bit chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.
Order of operations (5 + 3 × 2 = 11) — arithmetic.expr_add_mul computes A + (B × C) using shift-add multiplication then addition. 64 AND gates + 64 full adders, 73 test cases verified.

Files

File	Description
`neural_computer.safetensors`	11,581 tensors, 8,290,134 parameters (full CPU)
`threshold_cpu.py`	CPU state, reference cycle, threshold runtime
`eval.py`	Unified evaluation suite (6,441 tests, GPU-batched)
`build.py`	Build tools with configurable memory partitioning
`prune_weights.py`	Weight magnitude pruning

Build Tool Usage

# Full CPU (64KB memory, default)
python build.py memory --apply

# LLM integration profiles
python build.py --memory-profile none memory --apply       # Pure ALU (32K params)
python build.py --memory-profile registers memory --apply  # 16-byte register file
python build.py --memory-profile scratchpad memory --apply # 256-byte scratchpad

# Custom memory size
python build.py --addr-bits 6 memory --apply  # 64 bytes (2^6)

# Regenerate ALU and input metadata
python build.py alu --apply
python build.py inputs --apply
python build.py all --apply  # memory + alu + inputs

Memory profiles:

Profile	Addr Bits	Memory	Memory Params	Total Params
`none`	0	0B	0	~32K
`registers`	4	16B	~2K	~34K
`scratchpad`	8	256B	~30K	~63K
`reduced`	12	4KB	~516K	~549K
`full`	16	64KB	~8.26M	~8.29M

Citation

@misc{8bit-threshold-computer,
  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
  author={Norton, Charles},
  year={2026},
  howpublished={Hugging Face},
  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
}

License

MIT

References

McCulloch & Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity"
Muroga (1971). "Threshold Logic and Its Applications"
Siegelmann & Sontag (1995). "On the Computational Power of Neural Nets"
Bengio et al. (2013). "Estimating or Propagating Gradients Through Stochastic Neurons"
Ma et al. (2024). "The Era of 1-bit LLMs" (BitNet b1.58)
HuggingFace (2024). "SmolLM2: Small Language Models"