File size: 34,996 Bytes
bbff33a 486f47d bbff33a 486f47d 1f44a34 bbff33a 486f47d 80624ac 1f44a34 80624ac 1f44a34 486f47d 1f44a34 4feb9eb 1f44a34 4feb9eb 80624ac 4feb9eb 2bbc3fe dda1489 2bbc3fe 65da55b bbff33a 4feb9eb bbff33a 56aed4a f90261b 4feb9eb bbff33a 486f47d 56aed4a 486f47d bbff33a 486f47d 3a9b35a 486f47d bbff33a 4feb9eb bbff33a 486f47d bbff33a 56aed4a f90261b 56aed4a f90261b 56aed4a f90261b 56aed4a 486f47d bbff33a dda1489 f90261b bbff33a 486f47d bbff33a 486f47d bbff33a 56aed4a 65da55b 4feb9eb bbff33a 65da55b 4feb9eb bbff33a 4feb9eb bbff33a 56aed4a 4feb9eb 80624ac bbff33a 80624ac 56aed4a f90261b 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a 470f0a9 56aed4a ef9f9e5 a2cd2fc 084c69c a2cd2fc 084c69c a2cd2fc 1f44a34 084c69c 470f0a9 a2cd2fc 1f44a34 a2cd2fc 1f44a34 a2cd2fc 470f0a9 084c69c 1f44a34 470f0a9 1f44a34 470f0a9 1f44a34 470f0a9 084c69c 1f44a34 ef9f9e5 1f44a34 ef9f9e5 1f44a34 ef9f9e5 470f0a9 1f44a34 ef9f9e5 56aed4a 1f44a34 ef9f9e5 1f44a34 ef9f9e5 1f44a34 7c967c0 1f44a34 7c967c0 1f44a34 7c967c0 1f44a34 df088a9 1f44a34 56aed4a 1f44a34 80624ac 1f44a34 470f0a9 1f44a34 470f0a9 1f44a34 470f0a9 1f44a34 80624ac 1f44a34 f90261b 3a9b35a f90261b 1f44a34 3a9b35a f90261b 1f44a34 3a9b35a f90261b 3a9b35a f90261b 1f44a34 f90261b 3a9b35a f90261b 80624ac 486f47d bbff33a f90261b 80624ac bbff33a 486f47d 80624ac 56aed4a 486f47d 56aed4a 470f0a9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 |
---
license: mit
tags:
- threshold-logic
- neuromorphic
- computer-architecture
- turing-complete
- loihi
- truenorth
- akida
---
# 8bit-threshold-computer
**A Turing-complete CPU implemented entirely as threshold logic gates, with 8-bit and 32-bit ALU support.**
Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β₯ 0 else 0`
```
8-bit CPU: 8,290,134 params (full) / 32,397 params (pure ALU)
32-bit ALU: 202,869 params (1KB scratch memory)
```
---
## What Is This?
A complete processor where every operationβfrom Boolean logic to arithmetic to control flowβis implemented using only weighted sums and step functions. No traditional gates.
| Component | 8-bit CPU | 32-bit ALU |
|-----------|-----------|------------|
| Registers | 4 Γ 8-bit | N/A (pure computation) |
| Memory | 0Bβ64KB configurable | 1KB scratch |
| ALU | 16 ops @ 8-bit | ADD, SUB, MUL, DIV, CMP, bitwise, shifts |
| Precision | 0β255 | 0β4,294,967,295 |
| Flags | Z, N, C, V | Carry/overflow |
| Control | Full ISA | Stateless |
**Turing complete.** The 8-bit CPU is verified with loops, conditionals, recursion, and self-modification. The 32-bit ALU extends arithmetic to practical ranges (0β4B) where 8-bit (0β255) is insufficient.
---
## Execution Model
A self-contained, autonomous computational machine:
- **Pure tensor computation**: State in, state out
- **Frozen circuits**: Integer weights, Heaviside activation
- **ACT execution**: Internal loop until HALT
- **No external orchestration**: One forward pass = complete program execution
```
βββββββββββββββββββββββββββββββ
β Initial State β
β [PC|Regs|Flags|Memory...] β
βββββββββββββββ¬ββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β β
β Threshold Circuit Layer β
β β
β βββββββββββββββββββββββββ β
β β Fetch: PC β Instr β β
β βββββββββββββββββββββββββ€ β
β β Decode: Opcode/Ops β β
β βββββββββββββββββββββββββ€ β
β β Execute: ALU/Mem β β
β βββββββββββββββββββββββββ€ β
β β Writeback: Results β β
β βββββββββββββββββββββββββ€ β
β β PC Update β β
β βββββββββββββ¬ββββββββββββ β
β β β
β ββββββΌβββββ β
β β HALTED? β β
β ββββββ¬βββββ β
β β β
β no βββ΄ββ yes β
β β β β
β βΌ βΌ β
β [loop] [exit] β
β β
βββββββββββββββ¬ββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Final State β
β [PC|Regs|Flags|Memory...] β
βββββββββββββββββββββββββββββββ
```
### Instruction Set
| Opcode | Mnemonic | Operation |
|--------|----------|-----------|
| 0x0 | ADD | R[d] = R[a] + R[b] |
| 0x1 | SUB | R[d] = R[a] - R[b] |
| 0x2 | AND | R[d] = R[a] & R[b] |
| 0x3 | OR | R[d] = R[a] \| R[b] |
| 0x4 | XOR | R[d] = R[a] ^ R[b] |
| 0x5 | SHL | R[d] = R[a] << 1 |
| 0x6 | SHR | R[d] = R[a] >> 1 |
| 0x7 | MUL | R[d] = R[a] * R[b] |
| 0x8 | DIV | R[d] = R[a] / R[b] |
| 0x9 | CMP | flags = R[a] - R[b] |
| 0xA | LOAD | R[d] = M[addr] |
| 0xB | STORE | M[addr] = R[s] |
| 0xC | JMP | PC = addr |
| 0xD | Jcc | PC = addr if cond (imm8[2:0]: 0=Z,1=NZ,2=C,3=NC,4=N,5=P,6=V,7=NV) |
| 0xE | CALL | push PC; PC = addr |
| 0xF | HALT | stop execution |
### Design Principles
1. **Autonomy**: Machine runs without external logic
2. **Purity**: forward(state) β state', no side effects
3. **Transparency**: All weights inspectable, all operations traceable
4. **Universality**: Turing complete, runs arbitrary programs
---
## Background
### Threshold Logic
A threshold gate computes a Boolean function by taking a weighted sum of binary inputs and comparing it to a threshold. If the sum meets or exceeds the threshold, the output is 1; otherwise, 0. This can be expressed as a neuron with Heaviside step activation: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β₯ 0 else 0`, where weights `wα΅’` and bias `b` are integers.
Threshold gates are strictly more powerful than standard Boolean gates. A single threshold gate can compute any linearly separable Boolean functionβthis includes AND, OR, NAND, NOR, and many others that require multiple levels of conventional gates. Functions that are not linearly separable (such as XOR or parity) require multiple threshold gates arranged in layers.
### Historical Development
Warren McCulloch and Walter Pitts introduced the threshold neuron model in 1943, proving that networks of such neurons could compute any Boolean function. This work preceded both the perceptron and modern neural networks, establishing the theoretical foundation for neural computation.
The 1960s saw significant development in threshold logic synthesis. Researchers including Saburo Muroga, Robert McNaughton, and Michael Dertouzos developed algebraic methods for determining whether a Boolean function could be implemented as a single threshold gate, and if so, how to calculate appropriate weights. This work produced systematic techniques for threshold gate design but focused on individual gates rather than complete systems.
Frank Rosenblatt's Mark I Perceptron (1957-1960) implemented threshold neurons in hardware using potentiometers for weights, but it was a pattern classifier that learned its weights through trainingβthe final weight configurations were not published. Bernard Widrow's ADALINE and MADALINE systems (1960-1963) similarly used adaptive threshold elements with weights learned via the LMS algorithm.
Hava Siegelmann and Eduardo Sontag proved in the 1990s that recurrent neural networks are Turing complete. Their construction, however, relied on continuous sigmoid activation functions with infinite precisionβnot the discrete step function used in threshold logic. Other theoretical work on neural Turing machines and differentiable computers followed similar patterns: proving computational universality using continuous, differentiable activations suitable for gradient-based training.
### Neuromorphic Hardware
Modern neuromorphic processors implement large arrays of configurable threshold-like neurons in silicon:
**Intel Loihi** (2017) provides 128 neuromorphic cores with programmable synaptic weights, spike-based communication, and on-chip learning. The architecture supports integer weights and configurable neuron dynamics.
**IBM TrueNorth** (2014) integrates one million neurons and 256 million synapses in a 4096-core array. Each neurosynaptic core implements 256 neurons with configurable weights and thresholds. The chip was designed as an alternative to von Neumann architecture rather than an implementation of one.
**BrainChip Akida** (2021) targets edge deployment with event-based processing and integer weights. The architecture supports standard neural network operations mapped onto neuromorphic primitives.
**SpiNNaker** (University of Manchester) uses ARM processor cores to simulate spiking neural networks at scale. The platform has hosted various neural models but is simulation-based rather than native neuromorphic silicon.
Despite the availability of these platforms, published work has focused on neural network inference, sensory processing, and pattern recognition. A 2024 paper demonstrated basic logic gates, adders, and decoders on SpiNNaker and Dynap-SE1, describing this as "a first step toward the construction of a spiking computer"βthe implementation lacked instruction fetch, program counter, memory systems, and control logic.
### This Implementation
The weights in this repository implement a complete 8-bit computer: registers, ALU with 16 operations, status flags, conditional branching, subroutine calls, stack operations, and memory access. Every component is built from threshold neurons with integer weights. The weight configurations are published in safetensors format for direct loading and deployment.
---
## Circuit Categories
| Category | Circuits | Examples |
|----------|----------|----------|
| Boolean | 9 | AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES |
| Arithmetic | 18 | Half/full adder, 2/4/8-bit ripple carry, comparators |
| ALU | 3 | 8-bit ALU, control decoder, flag computation |
| Combinational | 10 | MUX (2:1, 4:1, 8:1), DEMUX, encoders, decoders |
| Control Flow | 16 | JMP, conditional jumps, CALL, RET, PUSH, POP |
| Error Detection | 11 | Parity (XOR tree), checksum, CRC, Hamming |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
| Memory | 3 | N-bit addr decoder, 2^NΓ8 read mux, write cells (configurable, packed) |
---
## Usage
```python
import torch
from safetensors.torch import load_file
tensors = load_file("neural_computer8.safetensors")
def heaviside(x):
return (x >= 0).float()
# AND gate: fires when both inputs are 1
w = tensors['boolean.and.weight'] # [1, 1]
b = tensors['boolean.and.bias'] # [-2]
for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
inp = torch.tensor([a, b_in], dtype=torch.float32)
out = heaviside(inp @ w + b)
print(f"AND({a}, {b_in}) = {int(out.item())}")
```
---
## State Tensor Layout
All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
```
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
```
Where N = address bits (configurable: 0-16).
Flags are ordered as: `Z, N, C, V`.
Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
| Memory Profile | Addr Bits | Memory Size | State Bits |
|----------------|-----------|-------------|------------|
| Full CPU | 16 | 64KB | 524,376 |
| Reduced | 12 | 4KB | 32,856 |
| Scratchpad | 8 | 256B | 2,104 |
| Registers | 4 | 16B | 184 |
| Pure ALU | 0 | 0B | 56 |
---
## Instruction Encoding (16-bit)
All instruction bits are **MSB-first**.
```
15..12 11..10 9..8 7..0
opcode rd rs imm8
```
Interpretation:
- **R-type**: `rd = rd op rs` (imm8 ignored).
- **I-type**: `rd = op rd, imm8` (rs ignored).
- **Address-extended**: `LOAD`, `STORE`, `JMP`, `JZ`, `CALL` consume the next word as a 16-bit address (big-endian). `imm8` is reserved, and the PC skips 4 bytes when the jump is not taken.
---
## Verification
```bash
python eval.py
python threshold_cpu.py
```
### Verification Status
| Category | Status | Notes |
|----------|--------|-------|
| Boolean gates | Exhaustively tested | All 2^n input combinations |
| Arithmetic | Exhaustively tested | Full 8-bit range |
| ALU | Exhaustively tested | All operations, all inputs |
| Control flow | Exhaustively tested | Branch/jump conditions |
| Threshold | Exhaustively tested | k-of-n, majority, etc. |
| Modular (mod 3,5,6,7,9,10,11,12) | Exhaustively tested | Multi-layer, hand-constructed |
| Parity | Exhaustively tested | XOR tree, hand-constructed |
| Modular (mod 2,4,8) | Exhaustively tested | Single-layer, trivial |
The modular arithmetic circuits for non-powers-of-2 and the parity circuits were hand-constructed because:
- Divisibility by 3, 5, etc. is **not linearly separable** in binary
- 8-bit parity (XOR of all bits) requires a tree of XOR gates
All circuits pass exhaustive testing over their full input domains.
---
## Tensor Naming Convention
```
{category}.{circuit}[.{layer}][.{component}].{weight|bias}
Examples:
boolean.and.weight
boolean.xor.layer1.neuron1.weight
arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
modular.mod5.layer2.eq3.weight
error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
Memory circuits are stored as packed tensors to keep the safetensors header size manageable
(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).
```
---
## Hardware Compatibility
All weights are integers. All activations are Heaviside step. Designed for:
- **Intel Loihi** β Neuromorphic research chip
- **IBM TrueNorth** β 1M neuron chip
- **BrainChip Akida** β Edge neuromorphic processor
---
## LLM Integration
The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
**For LLM integration, use `--memory-profile none` to generate a pure ALU model (~32K params) without memory circuits.**
### Core Thesis
Standard LLMs fail at arithmetic because they're interpolatorsβthey approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
We solve this by embedding **frozen, proven-correct arithmetic circuits** directly into the transformer's MLP layers. The circuits use threshold logic (weighted sums + step activation), which is structurally compatible with neural network layers. We train only the **interface layers** that learn to:
1. Extract operands from token embeddings
2. Route computation through the circuits
3. Inject results back into the residual stream
The model learns **call dispatch**, not arithmetic. The arithmetic is already solved.
### Target Model: SmolLM2-360M-Instruct
We use HuggingFace's SmolLM2-360M-Instruct as our base model. See [`llm_integration/SMOLLM2_ARCHITECTURE.md`](llm_integration/SMOLLM2_ARCHITECTURE.md) for the complete technical analysis.
| Property | Value |
|----------|-------|
| Parameters | 361.82M |
| Hidden Dimension | **960** (matches extractor input) |
| Layers | 32 transformer blocks |
| Attention | 15 query heads, 5 KV heads (GQA) |
| MLP | SwiGLU (960β2560β960) |
| Position Encoding | RoPE (theta=100k, max 8192) |
**Key insight**: The hidden dimension of 960 exactly matches our extractor requirementsβno projection layer needed.
**Tokenization**: Digits are tokenized individually (`"47 + 86"` β `['4', '7', ' +', ' ', '8', '6']`), with digit token IDs following `token_id = 32 + digit_value`. This enables position-based operand extraction.
**Hidden State Extraction**: Layer 31 (final, pre-LM-head) provides well-normalized representations (std=1.34) ideal for bit extraction. All 33 hidden state outputs are available (embedding + 32 layers).
### Architecture
Standard MLP block with parallel circuit path:
```
x βββ¬ββ MLP path βββββββββββββββββ¬ββ + ββ output
β β
βββ BitExtractor ββ Circuit ββ΄ββ BitInjector
β
Router (learned weighting)
```
Augmented MLP forward pass:
```python
def forward(x): # x: [batch, seq, d_model=960]
# Original MLP path (unchanged)
mlp_out = self.down_proj(silu(self.gate_proj(x)) * self.up_proj(x))
# Circuit path (new)
a_bits, b_bits = self.bit_extractor(x) # [batch, seq, 8] each
result_bits, carry = self.circuits.add_8bit(a_bits, b_bits)
flags = self.compute_flags(result_bits, carry)
circuit_delta = self.bit_injector(result_bits, flags)
# Routing
route_weights = self.router(x) # [batch, seq, 2] softmax
# Combine
return mlp_out + route_weights[..., 1:2] * circuit_delta
```
### Threshold Logic Fundamentals
A threshold gate computes:
```
output = 1 if (Ξ£ wα΅’xα΅’ + b) β₯ 0
0 otherwise
```
Example gates:
```
AND: w=[1,1], b=-2
AND(0,0) = H(-2) = 0
AND(1,1) = H(0) = 1
OR: w=[1,1], b=-1
OR(0,1) = H(0) = 1
OR(1,1) = H(1) = 1
XOR: requires 2 layers (not linearly separable)
Layer 1: OR + NAND
Layer 2: AND
```
Full adder = 2 half-adders + carry OR, ~4 threshold layers.
8-bit ripple carry = 8 chained full adders, ~32 threshold layers.
### Interface Layers (Trainable)
**Extractor** β Extracts operands and operation from LLM hidden states:
```python
class Extractor(nn.Module):
"""Attention pooling + per-bit extraction networks."""
def __init__(self, hidden_dim=960):
self.attention_pool = AttentionPooling(hidden_dim, num_heads=4)
self.a_extractor = MultiHeadBitExtractor(hidden_dim) # 8 separate bit networks
self.b_extractor = MultiHeadBitExtractor(hidden_dim)
self.op_router = nn.Sequential(
nn.Linear(hidden_dim, 256), nn.GELU(),
nn.Linear(256, 6) # 6 operations
)
def forward(self, hidden_states, attention_mask):
pooled = self.attention_pool(hidden_states, attention_mask) # (batch, 960)
a_bits, _ = self.a_extractor(pooled) # (batch, 8)
b_bits, _ = self.b_extractor(pooled) # (batch, 8)
op_logits = self.op_router(pooled) # (batch, 6)
return a_bits, b_bits, op_logits
```
**MultiHeadBitExtractor** β 8 specialized networks, one per bit:
```python
class MultiHeadBitExtractor(nn.Module):
def __init__(self, hidden_dim=960):
self.bit_extractors = nn.ModuleList([
nn.Sequential(nn.Linear(hidden_dim, 128), nn.GELU(), nn.Linear(128, 1))
for _ in range(8)
])
def forward(self, x):
logits = torch.cat([ext(x) for ext in self.bit_extractors], dim=-1)
soft = torch.sigmoid(logits)
hard = heaviside_ste(logits)
return hard - soft.detach() + soft, logits # STE
```
**AttentionPooling** β Learns which token positions matter:
```python
class AttentionPooling(nn.Module):
"""CLS-token style pooling with learned attention."""
def __init__(self, hidden_dim=960, num_heads=4):
self.cls_token = nn.Parameter(torch.randn(1, 1, hidden_dim) * 0.02)
self.query = nn.Linear(hidden_dim, hidden_dim)
self.key = nn.Linear(hidden_dim, hidden_dim)
self.value = nn.Linear(hidden_dim, hidden_dim)
```
### Trainable Parameters
For SmolLM2-360M (hidden_dim=960):
| Component | Parameters | Description |
|-----------|------------|-------------|
| AttentionPooling | ~3.7M | 4-head attention over sequence |
| MultiHeadBitExtractor (Γ2) | ~245K each | 8 per-bit MLPs for A and B |
| OpRouter | ~246K | 960β256β6 MLP |
| **Extractor Total** | ~4.4M | Full extraction module |
**Alternative architectures**:
- `PositionExtractor`: ~1.5M (position-specific, no attention)
- `DigitExtractor`: ~1.2M (predicts digits 0-9 instead of bits)
With `--unfreeze_layers 4`: Adds ~39.3M trainable params (top 4 transformer layers).
### Gradient Flow
Heaviside has zero gradient almost everywhere. We use **Straight-Through Estimator (STE)**:
```python
class HeavisideSTE(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
return (x >= 0).float()
@staticmethod
def backward(ctx, grad_output):
return grad_output # pass through unchanged
```
### Training Strategy
1. **Data**: Random 8-bit arithmetic problems (operands 0-255, 6 operations)
2. **Loss**: Multi-component BCE + CE
- `result_loss`: BCE on output bits vs expected
- `a_loss`, `b_loss`: BCE on extracted bits vs ground truth (2Γ weight)
- `op_loss`: CE on operation classification
3. **Optimizer**: AdamW, lr=3e-4, gradient clipping at 1.0
4. **Curriculum**: Epoch-based range expansion (0-9 β 0-99 β 0-255)
5. **Batching**: 256-4096 samples per batch (VRAM-dependent)
```bash
# Example training commands
python train.py --mode router --epochs 100 # Sanity check
python train.py --mode llm --epochs 100 --batch_size 256 # Frozen LLM
python train.py --mode llm --unfreeze_layers 4 --batch_size 4096 # Fine-tune top layers
```
### Inference
At inference, Heaviside is true step functionβno approximation. If the Extractor correctly identifies operands, the circuit **will** output the correct result.
### Target Performance
| Condition | Configuration | Accuracy |
|-----------|---------------|----------|
| Control | Vanilla SmolLM2-360M | 11.90% |
| Circuits only | Ground truth bits | 100.00% |
| Experimental | LLM + Extractor + Circuits | **Target: 100%** |
The interface generalizes to **all** 65,536 8-bit additions once trainedβno memorization, the circuits compute.
### LLM Integration: Proof of Concept (In Progress)
Before proceeding with architectural extensions, we are validating the core thesis: that frozen threshold circuits can provide exact arithmetic capability to language models that otherwise fail at computation.
#### Baseline Evaluation
We evaluated SmolLM2-360M-Instruct on randomized 8-bit arithmetic using a generous answer extraction protocol. The model was prompted with a system message instructing it to output only numeric answers, and we accepted any correct number found in the output (first number, last number, or word-to-number conversion).
| Operation | SmolLM2-360M Accuracy | Notes |
|-----------|----------------------|-------|
| Addition (A + B) | 35.92% | Best performance, still fails 2/3 |
| Subtraction (A - B) | 17.72% | Poor handling of borrowing |
| Multiplication (A Γ B) | **1.25%** | Near-total failure |
| Greater Than (A > B) | 14.37% | Often echoes expression |
| Less Than (A < B) | 4.31% | Often echoes expression |
| Equality (A == B) | 0.28% | Near-total failure |
| **Overall Fitness** | **11.90%** | 238/2000 correct |
**Methodology**: 2000 randomized test cases with operands uniformly sampled from [0, 255]. Ground truth computed as 8-bit arithmetic (matching the threshold circuit specification). Batch size 64, greedy decoding (temperature=0).
**Key Observations**:
- Multiplication accuracy (1.25%) is essentially random guessing over the output space
- Comparison operations fail because the model often echoes the expression rather than evaluating it
- Even additionβthe simplest operationβfails nearly two-thirds of the time on 8-bit operands
- Performance degrades sharply as operand magnitude increases (edge cases like 127+128 are almost never correct)
These results establish the **control condition** for our experiment.
#### Experimental Design
| Condition | Model Configuration | Target Fitness |
|-----------|---------------------|----------------|
| **Control** | Vanilla SmolLM2-360M-Instruct | 11.90% (measured) |
| **Experimental** | SmolLM2-360M + Frozen ThresholdALU + Trained Interface | **100%** |
The experimental condition adds:
1. **BitEncoder** (trainable): Projects hidden states β 24 bits (3 Γ 8-bit operands)
2. **OpRouter** (trainable): Selects which circuit to activate based on context
3. **BitDecoder** (trainable): Projects 8-bit result β hidden state delta
4. **ThresholdALU** (frozen): The verified circuits from this repository
**Training Signal**: The fitness function itself. We do not provide answer supervisionβthe model must learn to correctly encode operands and route to circuits such that the frozen circuits produce correct outputs. This is possible because the circuits are proven correct; the interface layers need only learn the encoding/routing protocol.
**Success Criterion**: If the experimental condition achieves 100% fitness on randomized arithmetic while the control remains at ~12%, this demonstrates:
1. The frozen threshold circuits provide exact computation
2. Neural interface layers can learn to use discrete computational substrates
3. Small language models can achieve perfect arithmetic via architectural augmentation rather than scale
#### Progress
**Stage 1: Circuit Validation β COMPLETE**
The frozen threshold circuits achieve 100% accuracy when given correctly formatted bit inputs:
| Test | Result |
|------|--------|
| DirectCircuitModel (ground truth bits) | 100.00% on 10,000 random cases |
| All operations (ADD, SUB, MUL, GT, LT, EQ) | 100.00% each |
This confirms the circuits compute correctly. However, this was already established by `eval.py`.
**Stage 2: LLM Baseline β COMPLETE**
SmolLM2-360M-Instruct baseline on randomized 8-bit arithmetic:
| Operation | Accuracy |
|-----------|----------|
| Addition | 35.92% |
| Subtraction | 17.72% |
| Multiplication | 1.25% |
| Comparisons | 0.28β14.37% |
| **Overall** | **11.90%** |
Head-to-head on 50 random cases: SmolLM2 got 7/50 (14%), circuits got 50/50 (100%).
**Stage 3: LLM Integration β IN PROGRESS**
The challenge: train an interface that extracts operands and operations from natural language (not from pre-formatted bit inputs).
```
"47 + 86"
β
[Language Model / Extractor]
β
[a_bits, b_bits, op_logits]
β
[Frozen threshold circuits]
β
[Result bits] β 133
```
**SmolLM2 Approach** (`llm_integration/`):
Initial experiments used SmolLM2-360M-Instruct as the language understanding backbone.
| Mode | Description | Status |
|------|-------------|--------|
| `--mode router` | Train OpRouter with ground truth bits | 100% achieved |
| `--mode interface` | Train BitEncoder + OpRouter | Ready |
| `--mode llm` | Train from LLM hidden states | Explored |
**LLM Mode Options**:
- `--unfreeze_layers N`: Fine-tune top N transformer layers
- `--extract_layer N`: Extract from intermediate layer (-1 = final)
- `--position_extract`: Position-specific extraction (uses token positions)
- `--digit_pred`: Predict digits (0-9) instead of bits
**Extraction Architectures** (`model.py`):
- `Extractor`: Attention pooling + per-bit MLPs
- `PositionExtractor`: Position-aware (operand A from positions 0-2, B from 5-7)
- `DigitExtractor`: Predicts 3 digits per operand, converts to bits
- `HybridExtractor`: Digit lookup + MLP fallback for word inputs
**Curriculum Learning**: Training progresses 0-9 β 0-99 β 0-255 over epochs.
**Observations**: SmolLM2 integration proved challengingβ360M parameters of pre-trained representations largely irrelevant to arithmetic parsing, high VRAM requirements, and gradient conflicts between frozen circuits and pre-trained weights.
**Pivot: From-Scratch Extractor**
Given that the task is fundamentally simpleβparse `(a, b, op)` from structured textβa lightweight purpose-built model may be more appropriate than adapting a general LLM.
```
"one thousand plus two thousand"
β
[Char-level tokenizer: ~40 tokens]
β
[Small transformer: ~1-5M params]
β
[3 heads: a_value, b_value, op_idx]
β
[Frozen 32-bit threshold circuits]
β
3000
```
**Design principles**:
- **Minimal Python**: All parsing logic learned in weights, not hardcoded
- **Character-level input**: No word tokenization; model learns "forty seven" = 47
- **From-scratch training**: No pre-trained weights to conflict with
- **32-bit target**: Practical arithmetic range (0β4,294,967,295)
**Planned architecture**:
- Vocab: ~40 chars (a-z, 0-9, space, operators)
- Embedding: 40 Γ 128d
- Encoder: 2-3 transformer layers
- Output heads: `a_classifier`, `b_classifier`, `op_classifier`
- Total: ~1-5M params (vs 360M for SmolLM2)
This approach treats the problem as what it is: a structured parsing task where the frozen circuits handle all computation. The extractor need only learn the mapping from text to operandsβno world knowledge required.
#### Proof of Concept Scope
- **32-bit operands** (0β4,294,967,295)
- **Six operations**: ADD, SUB, MUL, GT, LT, EQ
- **Structured input**: Digits ("1000 + 2000") and number words ("one thousand plus two thousand")
**Current Status**:
- Circuit validation: Complete (100% on 8-bit operations)
- 32-bit circuits: Built and tested (adder verified on 1M+2M=3M, etc.)
- LLM baseline: Measured (11.90% - establishes control condition)
- SmolLM2 integration: Infrastructure complete, training explored
- From-scratch extractor: Design phase
### Extension Roadmap
#### Completed
1. **32-bit operations (0β4,294,967,295)** β Full 32-bit ALU implemented via `--bits 32` flag:
- 32-bit ripple carry adder (32 chained full adders) β **verified**
- 32-bit subtractor (NOT + adder with carry-in)
- 32-bit multiplication (1024 partial product ANDs)
- 32-bit division (32 restoring stages)
- 32-bit comparators (GT, LT, GE, LE, EQ)
- 32-bit bitwise ops (AND, OR, XOR, NOT)
- 32-bit shifts (SHL, SHR), INC, DEC, NEG
**Known issue**: Single-layer 32-bit comparators use weights up to 2Β³ΒΉ, which exceeds float32 mantissa precision (24 bits). Comparisons between large numbers differing only in low bits may fail. Fix planned: cascaded byte-wise comparison (compare MSB first, if equal compare next byte, etc.).
2. **3-operand addition (15 + 27 + 33 = 75)** β `arithmetic.add3_8bit` chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.
3. **Order of operations (5 + 3 Γ 2 = 11)** β `arithmetic.expr_add_mul` computes A + (B Γ C) using shift-add multiplication then addition. 64 AND gates + 64 full adders, 73 test cases verified.
#### Planned
1. **Cascaded 32-bit comparators** β Replace single-layer weighted comparison with multi-layer byte-wise cascade. Each byte comparison uses 8-bit weights (max 128), well within float32 precision. Hardware-accurate and extensible to 64-bit, 128-bit, etc.
2. **Parenthetical expressions ((5 + 3) Γ 2 = 16)** β Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out.
3. **Multi-operation chains (a + b - c Γ d)** β Sequential dispatch through multiple circuits with intermediate result routing. Requires state management in interface layers.
4. **Floating point arithmetic** β IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas.
5. **Full CPU integration** β Enable memory access circuits for stateful computation. Allows multi-step algorithms executed entirely within threshold logic.
---
## Build Tool
Output filenames are auto-generated from configuration:
```
Format: neural_{alu|computer}{BITS}[_{MEMORY}].safetensors
Examples:
neural_alu8.safetensors # 8-bit, no memory
neural_alu32.safetensors # 32-bit, no memory
neural_computer8.safetensors # 8-bit, full memory (default)
neural_computer32.safetensors # 32-bit, full memory
neural_computer8_small.safetensors # 8-bit, 1KB memory
neural_computer32_small.safetensors # 32-bit, 1KB memory
neural_computer8_addr12.safetensors # 8-bit, custom 4KB (2^12 bytes)
```
```bash
# 8-bit CPU (default)
python build.py --apply all # -> neural_computer8.safetensors
python build.py -m none --apply all # -> neural_alu8.safetensors
python build.py -m scratchpad --apply all # -> neural_computer8_scratchpad.safetensors
# 16-bit ALU
python build.py --bits 16 --apply all # -> neural_computer16.safetensors
python build.py --bits 16 -m none --apply all # -> neural_alu16.safetensors
# 32-bit ALU
python build.py --bits 32 -m small --apply all # -> neural_computer32_small.safetensors
python build.py --bits 32 -m none --apply all # -> neural_alu32.safetensors
# Custom address width
python build.py --bits 16 -a 6 --apply all # -> neural_computer16_addr6.safetensors
```
**Bit widths** (`--bits`):
| Width | Range | Use Case |
|-------|-------|----------|
| 8 | 0β255 | Full CPU, legacy |
| 16 | 0β65,535 | Extended arithmetic |
| 32 | 0β4,294,967,295 | Practical arithmetic |
**Memory profiles** (`-m`):
| Profile | Size | Addr Bits | Filename Suffix | Params | Use Case |
|---------|------|-----------|-----------------|--------|----------|
| `none` | 0B | β | (uses `alu`) | ~32K | Pure ALU |
| `registers` | 16B | 4 | `_registers` | ~34K | Minimal state |
| `scratchpad` | 256B | 8 | `_scratchpad` | ~63K | 8-bit scratch |
| `small` | 1KB | 10 | `_small` | ~123K | 32-bit scratch |
| `reduced` | 4KB | 12 | `_reduced` | ~549K | Small programs |
| `full` | 64KB | 16 | (none) | ~8.29M | Full CPU |
**Custom address width** (`-a N`): Memory size = 2^N bytes, suffix = `_addrN`
---
## Citation
```bibtex
@misc{8bit-threshold-computer,
title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
author={Norton, Charles},
year={2026},
howpublished={Hugging Face},
url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
}
```
---
## License
MIT
---
## References
1. McCulloch & Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity"
2. Muroga (1971). "Threshold Logic and Its Applications"
3. Siegelmann & Sontag (1995). "On the Computational Power of Neural Nets"
4. Bengio et al. (2013). "Estimating or Propagating Gradients Through Stochastic Neurons"
5. Ma et al. (2024). "The Era of 1-bit LLMs" (BitNet b1.58)
6. HuggingFace (2024). "SmolLM2: Small Language Models" β [Model Card](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
7. Vaswani et al. (2017). "Attention Is All You Need" β Transformer architecture
8. Su et al. (2021). "RoFormer: Enhanced Transformer with Rotary Position Embedding" β RoPE
|