File size: 34,996 Bytes
bbff33a
 
 
 
 
 
 
 
 
 
486f47d
 
bbff33a
486f47d
1f44a34
bbff33a
 
486f47d
80624ac
1f44a34
 
80624ac
 
 
 
 
 
1f44a34
486f47d
1f44a34
 
 
 
 
 
 
 
4feb9eb
1f44a34
4feb9eb
80624ac
4feb9eb
2bbc3fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dda1489
2bbc3fe
 
 
 
 
 
 
 
 
 
 
 
65da55b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bbff33a
4feb9eb
bbff33a
 
 
 
 
 
56aed4a
 
 
 
 
f90261b
4feb9eb
bbff33a
486f47d
56aed4a
486f47d
 
bbff33a
 
486f47d
3a9b35a
486f47d
bbff33a
 
4feb9eb
bbff33a
 
 
486f47d
bbff33a
 
 
 
56aed4a
 
 
 
 
 
 
 
 
f90261b
56aed4a
 
f90261b
 
56aed4a
 
 
 
f90261b
 
 
 
 
 
 
56aed4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
486f47d
bbff33a
dda1489
f90261b
bbff33a
486f47d
bbff33a
486f47d
bbff33a
 
56aed4a
 
 
 
 
65da55b
 
 
4feb9eb
bbff33a
 
 
 
65da55b
4feb9eb
bbff33a
4feb9eb
bbff33a
 
 
 
 
56aed4a
 
 
 
 
 
 
 
 
4feb9eb
80624ac
 
 
bbff33a
 
 
 
 
 
 
 
 
80624ac
56aed4a
 
 
 
f90261b
 
56aed4a
 
 
 
 
 
 
 
 
 
 
 
470f0a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56aed4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
470f0a9
56aed4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
470f0a9
56aed4a
 
470f0a9
 
 
 
 
 
 
 
 
 
 
56aed4a
470f0a9
 
 
 
 
 
56aed4a
 
470f0a9
56aed4a
 
470f0a9
 
 
 
 
 
 
 
 
 
 
 
56aed4a
 
470f0a9
56aed4a
 
470f0a9
 
 
 
 
 
 
 
56aed4a
 
 
 
470f0a9
 
 
 
 
 
 
 
56aed4a
470f0a9
 
 
56aed4a
470f0a9
56aed4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
470f0a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56aed4a
 
 
470f0a9
56aed4a
 
 
470f0a9
 
 
 
 
56aed4a
 
 
ef9f9e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a2cd2fc
084c69c
a2cd2fc
084c69c
a2cd2fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f44a34
084c69c
 
470f0a9
a2cd2fc
1f44a34
a2cd2fc
1f44a34
a2cd2fc
 
 
470f0a9
084c69c
 
1f44a34
 
 
470f0a9
 
 
 
 
1f44a34
470f0a9
 
 
 
 
 
 
 
 
 
 
1f44a34
470f0a9
 
084c69c
1f44a34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef9f9e5
 
1f44a34
ef9f9e5
1f44a34
ef9f9e5
470f0a9
1f44a34
 
 
 
 
ef9f9e5
56aed4a
 
1f44a34
ef9f9e5
1f44a34
 
 
 
 
 
 
 
ef9f9e5
1f44a34
7c967c0
1f44a34
7c967c0
1f44a34
7c967c0
1f44a34
df088a9
1f44a34
56aed4a
1f44a34
80624ac
1f44a34
470f0a9
1f44a34
470f0a9
1f44a34
470f0a9
1f44a34
80624ac
1f44a34
f90261b
3a9b35a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f90261b
1f44a34
3a9b35a
 
 
 
 
 
 
f90261b
1f44a34
3a9b35a
 
f90261b
3a9b35a
 
f90261b
 
1f44a34
 
 
 
 
 
 
 
 
f90261b
3a9b35a
 
 
 
 
 
 
 
 
 
f90261b
80624ac
486f47d
 
 
 
bbff33a
 
 
f90261b
80624ac
bbff33a
486f47d
 
 
80624ac
 
 
 
 
 
 
 
56aed4a
486f47d
56aed4a
 
 
 
 
470f0a9
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
---

license: mit
tags:
- threshold-logic
- neuromorphic
- computer-architecture
- turing-complete
- loihi
- truenorth
- akida
---


# 8bit-threshold-computer

**A Turing-complete CPU implemented entirely as threshold logic gates, with 8-bit and 32-bit ALU support.**

Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0`

```

8-bit CPU:   8,290,134 params (full) / 32,397 params (pure ALU)

32-bit ALU:  202,869 params (1KB scratch memory)

```

---

## What Is This?

A complete processor where every operationβ€”from Boolean logic to arithmetic to control flowβ€”is implemented using only weighted sums and step functions. No traditional gates.

| Component | 8-bit CPU | 32-bit ALU |
|-----------|-----------|------------|
| Registers | 4 Γ— 8-bit | N/A (pure computation) |
| Memory | 0B–64KB configurable | 1KB scratch |
| ALU | 16 ops @ 8-bit | ADD, SUB, MUL, DIV, CMP, bitwise, shifts |
| Precision | 0–255 | 0–4,294,967,295 |
| Flags | Z, N, C, V | Carry/overflow |
| Control | Full ISA | Stateless |

**Turing complete.** The 8-bit CPU is verified with loops, conditionals, recursion, and self-modification. The 32-bit ALU extends arithmetic to practical ranges (0–4B) where 8-bit (0–255) is insufficient.

---

## Execution Model

A self-contained, autonomous computational machine:
- **Pure tensor computation**: State in, state out
- **Frozen circuits**: Integer weights, Heaviside activation
- **ACT execution**: Internal loop until HALT
- **No external orchestration**: One forward pass = complete program execution

```

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

            β”‚      Initial State          β”‚

            β”‚  [PC|Regs|Flags|Memory...]  β”‚

            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                          β–Ό

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

            β”‚                             β”‚

            β”‚   Threshold Circuit Layer   β”‚

            β”‚                             β”‚

            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚

            β”‚  β”‚   Fetch: PC β†’ Instr   β”‚  β”‚

            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚

            β”‚  β”‚   Decode: Opcode/Ops  β”‚  β”‚

            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚

            β”‚  β”‚   Execute: ALU/Mem    β”‚  β”‚

            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚

            β”‚  β”‚   Writeback: Results  β”‚  β”‚

            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚

            β”‚  β”‚   PC Update           β”‚  β”‚

            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚

            β”‚              β”‚              β”‚

            β”‚         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”         β”‚

            β”‚         β”‚ HALTED? β”‚         β”‚

            β”‚         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜         β”‚

            β”‚              β”‚              β”‚

            β”‚         no ──┴── yes        β”‚

            β”‚          β”‚       β”‚          β”‚

            β”‚          β–Ό       β–Ό          β”‚

            β”‚       [loop]   [exit]       β”‚

            β”‚                             β”‚

            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                          β–Ό

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

            β”‚       Final State           β”‚

            β”‚  [PC|Regs|Flags|Memory...]  β”‚

            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

### Instruction Set

| Opcode | Mnemonic | Operation |
|--------|----------|-----------|
| 0x0 | ADD | R[d] = R[a] + R[b] |
| 0x1 | SUB | R[d] = R[a] - R[b] |
| 0x2 | AND | R[d] = R[a] & R[b] |
| 0x3 | OR | R[d] = R[a] \| R[b] |
| 0x4 | XOR | R[d] = R[a] ^ R[b] |
| 0x5 | SHL | R[d] = R[a] << 1 |
| 0x6 | SHR | R[d] = R[a] >> 1 |
| 0x7 | MUL | R[d] = R[a] * R[b] |
| 0x8 | DIV | R[d] = R[a] / R[b] |
| 0x9 | CMP | flags = R[a] - R[b] |
| 0xA | LOAD | R[d] = M[addr] |
| 0xB | STORE | M[addr] = R[s] |
| 0xC | JMP | PC = addr |
| 0xD | Jcc | PC = addr if cond (imm8[2:0]: 0=Z,1=NZ,2=C,3=NC,4=N,5=P,6=V,7=NV) |
| 0xE | CALL | push PC; PC = addr |
| 0xF | HALT | stop execution |

### Design Principles

1. **Autonomy**: Machine runs without external logic
2. **Purity**: forward(state) β†’ state', no side effects
3. **Transparency**: All weights inspectable, all operations traceable
4. **Universality**: Turing complete, runs arbitrary programs

---

## Background

### Threshold Logic

A threshold gate computes a Boolean function by taking a weighted sum of binary inputs and comparing it to a threshold. If the sum meets or exceeds the threshold, the output is 1; otherwise, 0. This can be expressed as a neuron with Heaviside step activation: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0`, where weights `wα΅’` and bias `b` are integers.

Threshold gates are strictly more powerful than standard Boolean gates. A single threshold gate can compute any linearly separable Boolean functionβ€”this includes AND, OR, NAND, NOR, and many others that require multiple levels of conventional gates. Functions that are not linearly separable (such as XOR or parity) require multiple threshold gates arranged in layers.

### Historical Development

Warren McCulloch and Walter Pitts introduced the threshold neuron model in 1943, proving that networks of such neurons could compute any Boolean function. This work preceded both the perceptron and modern neural networks, establishing the theoretical foundation for neural computation.

The 1960s saw significant development in threshold logic synthesis. Researchers including Saburo Muroga, Robert McNaughton, and Michael Dertouzos developed algebraic methods for determining whether a Boolean function could be implemented as a single threshold gate, and if so, how to calculate appropriate weights. This work produced systematic techniques for threshold gate design but focused on individual gates rather than complete systems.

Frank Rosenblatt's Mark I Perceptron (1957-1960) implemented threshold neurons in hardware using potentiometers for weights, but it was a pattern classifier that learned its weights through trainingβ€”the final weight configurations were not published. Bernard Widrow's ADALINE and MADALINE systems (1960-1963) similarly used adaptive threshold elements with weights learned via the LMS algorithm.

Hava Siegelmann and Eduardo Sontag proved in the 1990s that recurrent neural networks are Turing complete. Their construction, however, relied on continuous sigmoid activation functions with infinite precisionβ€”not the discrete step function used in threshold logic. Other theoretical work on neural Turing machines and differentiable computers followed similar patterns: proving computational universality using continuous, differentiable activations suitable for gradient-based training.

### Neuromorphic Hardware

Modern neuromorphic processors implement large arrays of configurable threshold-like neurons in silicon:

**Intel Loihi** (2017) provides 128 neuromorphic cores with programmable synaptic weights, spike-based communication, and on-chip learning. The architecture supports integer weights and configurable neuron dynamics.

**IBM TrueNorth** (2014) integrates one million neurons and 256 million synapses in a 4096-core array. Each neurosynaptic core implements 256 neurons with configurable weights and thresholds. The chip was designed as an alternative to von Neumann architecture rather than an implementation of one.

**BrainChip Akida** (2021) targets edge deployment with event-based processing and integer weights. The architecture supports standard neural network operations mapped onto neuromorphic primitives.

**SpiNNaker** (University of Manchester) uses ARM processor cores to simulate spiking neural networks at scale. The platform has hosted various neural models but is simulation-based rather than native neuromorphic silicon.

Despite the availability of these platforms, published work has focused on neural network inference, sensory processing, and pattern recognition. A 2024 paper demonstrated basic logic gates, adders, and decoders on SpiNNaker and Dynap-SE1, describing this as "a first step toward the construction of a spiking computer"β€”the implementation lacked instruction fetch, program counter, memory systems, and control logic.

### This Implementation

The weights in this repository implement a complete 8-bit computer: registers, ALU with 16 operations, status flags, conditional branching, subroutine calls, stack operations, and memory access. Every component is built from threshold neurons with integer weights. The weight configurations are published in safetensors format for direct loading and deployment.

---

## Circuit Categories

| Category | Circuits | Examples |
|----------|----------|----------|
| Boolean | 9 | AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES |
| Arithmetic | 18 | Half/full adder, 2/4/8-bit ripple carry, comparators |
| ALU | 3 | 8-bit ALU, control decoder, flag computation |
| Combinational | 10 | MUX (2:1, 4:1, 8:1), DEMUX, encoders, decoders |
| Control Flow | 16 | JMP, conditional jumps, CALL, RET, PUSH, POP |
| Error Detection | 11 | Parity (XOR tree), checksum, CRC, Hamming |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
| Memory | 3 | N-bit addr decoder, 2^NΓ—8 read mux, write cells (configurable, packed) |

---

## Usage

```python

import torch

from safetensors.torch import load_file



tensors = load_file("neural_computer8.safetensors")



def heaviside(x):

    return (x >= 0).float()



# AND gate: fires when both inputs are 1

w = tensors['boolean.and.weight']  # [1, 1]

b = tensors['boolean.and.bias']    # [-2]



for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:

    inp = torch.tensor([a, b_in], dtype=torch.float32)

    out = heaviside(inp @ w + b)

    print(f"AND({a}, {b_in}) = {int(out.item())}")

```

---

## State Tensor Layout

All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).

```

[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]

```

Where N = address bits (configurable: 0-16).

Flags are ordered as: `Z, N, C, V`.

Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.

| Memory Profile | Addr Bits | Memory Size | State Bits |
|----------------|-----------|-------------|------------|
| Full CPU | 16 | 64KB | 524,376 |
| Reduced | 12 | 4KB | 32,856 |
| Scratchpad | 8 | 256B | 2,104 |
| Registers | 4 | 16B | 184 |
| Pure ALU | 0 | 0B | 56 |

---

## Instruction Encoding (16-bit)

All instruction bits are **MSB-first**.

```

15..12  11..10  9..8  7..0

opcode  rd      rs    imm8

```

Interpretation:
- **R-type**: `rd = rd op rs` (imm8 ignored).
- **I-type**: `rd = op rd, imm8` (rs ignored).
- **Address-extended**: `LOAD`, `STORE`, `JMP`, `JZ`, `CALL` consume the next word as a 16-bit address (big-endian). `imm8` is reserved, and the PC skips 4 bytes when the jump is not taken.

---

## Verification

```bash

python eval.py

python threshold_cpu.py

```

### Verification Status

| Category | Status | Notes |
|----------|--------|-------|
| Boolean gates | Exhaustively tested | All 2^n input combinations |
| Arithmetic | Exhaustively tested | Full 8-bit range |
| ALU | Exhaustively tested | All operations, all inputs |
| Control flow | Exhaustively tested | Branch/jump conditions |
| Threshold | Exhaustively tested | k-of-n, majority, etc. |
| Modular (mod 3,5,6,7,9,10,11,12) | Exhaustively tested | Multi-layer, hand-constructed |
| Parity | Exhaustively tested | XOR tree, hand-constructed |
| Modular (mod 2,4,8) | Exhaustively tested | Single-layer, trivial |

The modular arithmetic circuits for non-powers-of-2 and the parity circuits were hand-constructed because:
- Divisibility by 3, 5, etc. is **not linearly separable** in binary
- 8-bit parity (XOR of all bits) requires a tree of XOR gates

All circuits pass exhaustive testing over their full input domains.

---

## Tensor Naming Convention

```

{category}.{circuit}[.{layer}][.{component}].{weight|bias}



Examples:

  boolean.and.weight

  boolean.xor.layer1.neuron1.weight

  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight

  modular.mod5.layer2.eq3.weight

  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias



Memory circuits are stored as packed tensors to keep the safetensors header size manageable

(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).

```

---

## Hardware Compatibility

All weights are integers. All activations are Heaviside step. Designed for:

- **Intel Loihi** β€” Neuromorphic research chip
- **IBM TrueNorth** β€” 1M neuron chip
- **BrainChip Akida** β€” Edge neuromorphic processor

---

## LLM Integration

The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.

**For LLM integration, use `--memory-profile none` to generate a pure ALU model (~32K params) without memory circuits.**

### Core Thesis

Standard LLMs fail at arithmetic because they're interpolatorsβ€”they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.

We solve this by embedding **frozen, proven-correct arithmetic circuits** directly into the transformer's MLP layers. The circuits use threshold logic (weighted sums + step activation), which is structurally compatible with neural network layers. We train only the **interface layers** that learn to:

1. Extract operands from token embeddings
2. Route computation through the circuits
3. Inject results back into the residual stream

The model learns **call dispatch**, not arithmetic. The arithmetic is already solved.

### Target Model: SmolLM2-360M-Instruct

We use HuggingFace's SmolLM2-360M-Instruct as our base model. See [`llm_integration/SMOLLM2_ARCHITECTURE.md`](llm_integration/SMOLLM2_ARCHITECTURE.md) for the complete technical analysis.

| Property | Value |
|----------|-------|
| Parameters | 361.82M |
| Hidden Dimension | **960** (matches extractor input) |
| Layers | 32 transformer blocks |
| Attention | 15 query heads, 5 KV heads (GQA) |
| MLP | SwiGLU (960β†’2560β†’960) |
| Position Encoding | RoPE (theta=100k, max 8192) |

**Key insight**: The hidden dimension of 960 exactly matches our extractor requirementsβ€”no projection layer needed.

**Tokenization**: Digits are tokenized individually (`"47 + 86"` β†’ `['4', '7', ' +', ' ', '8', '6']`), with digit token IDs following `token_id = 32 + digit_value`. This enables position-based operand extraction.

**Hidden State Extraction**: Layer 31 (final, pre-LM-head) provides well-normalized representations (std=1.34) ideal for bit extraction. All 33 hidden state outputs are available (embedding + 32 layers).

### Architecture

Standard MLP block with parallel circuit path:

```

x ──┬── MLP path ────────────────┬── + ── output

    β”‚                            β”‚

    └── BitExtractor ── Circuit ─┴── BitInjector

                          β”‚

                       Router (learned weighting)

```

Augmented MLP forward pass:

```python

def forward(x):  # x: [batch, seq, d_model=960]

    # Original MLP path (unchanged)

    mlp_out = self.down_proj(silu(self.gate_proj(x)) * self.up_proj(x))



    # Circuit path (new)

    a_bits, b_bits = self.bit_extractor(x)       # [batch, seq, 8] each

    result_bits, carry = self.circuits.add_8bit(a_bits, b_bits)

    flags = self.compute_flags(result_bits, carry)

    circuit_delta = self.bit_injector(result_bits, flags)



    # Routing

    route_weights = self.router(x)  # [batch, seq, 2] softmax



    # Combine

    return mlp_out + route_weights[..., 1:2] * circuit_delta

```

### Threshold Logic Fundamentals

A threshold gate computes:

```

output = 1  if  (Ξ£ wα΅’xα΅’ + b) β‰₯ 0

         0  otherwise

```

Example gates:

```

AND: w=[1,1], b=-2

  AND(0,0) = H(-2) = 0

  AND(1,1) = H(0)  = 1



OR: w=[1,1], b=-1

  OR(0,1) = H(0) = 1

  OR(1,1) = H(1) = 1



XOR: requires 2 layers (not linearly separable)

  Layer 1: OR + NAND

  Layer 2: AND

```

Full adder = 2 half-adders + carry OR, ~4 threshold layers.
8-bit ripple carry = 8 chained full adders, ~32 threshold layers.

### Interface Layers (Trainable)

**Extractor** β€” Extracts operands and operation from LLM hidden states:

```python

class Extractor(nn.Module):

    """Attention pooling + per-bit extraction networks."""



    def __init__(self, hidden_dim=960):

        self.attention_pool = AttentionPooling(hidden_dim, num_heads=4)

        self.a_extractor = MultiHeadBitExtractor(hidden_dim)  # 8 separate bit networks

        self.b_extractor = MultiHeadBitExtractor(hidden_dim)

        self.op_router = nn.Sequential(

            nn.Linear(hidden_dim, 256), nn.GELU(),

            nn.Linear(256, 6)  # 6 operations

        )



    def forward(self, hidden_states, attention_mask):

        pooled = self.attention_pool(hidden_states, attention_mask)  # (batch, 960)

        a_bits, _ = self.a_extractor(pooled)  # (batch, 8)

        b_bits, _ = self.b_extractor(pooled)  # (batch, 8)

        op_logits = self.op_router(pooled)    # (batch, 6)

        return a_bits, b_bits, op_logits

```

**MultiHeadBitExtractor** β€” 8 specialized networks, one per bit:

```python

class MultiHeadBitExtractor(nn.Module):

    def __init__(self, hidden_dim=960):

        self.bit_extractors = nn.ModuleList([

            nn.Sequential(nn.Linear(hidden_dim, 128), nn.GELU(), nn.Linear(128, 1))

            for _ in range(8)

        ])



    def forward(self, x):

        logits = torch.cat([ext(x) for ext in self.bit_extractors], dim=-1)

        soft = torch.sigmoid(logits)

        hard = heaviside_ste(logits)

        return hard - soft.detach() + soft, logits  # STE

```

**AttentionPooling** β€” Learns which token positions matter:

```python

class AttentionPooling(nn.Module):

    """CLS-token style pooling with learned attention."""



    def __init__(self, hidden_dim=960, num_heads=4):

        self.cls_token = nn.Parameter(torch.randn(1, 1, hidden_dim) * 0.02)

        self.query = nn.Linear(hidden_dim, hidden_dim)

        self.key = nn.Linear(hidden_dim, hidden_dim)

        self.value = nn.Linear(hidden_dim, hidden_dim)

```

### Trainable Parameters

For SmolLM2-360M (hidden_dim=960):



| Component | Parameters | Description |

|-----------|------------|-------------|

| AttentionPooling | ~3.7M | 4-head attention over sequence |

| MultiHeadBitExtractor (Γ—2) | ~245K each | 8 per-bit MLPs for A and B |

| OpRouter | ~246K | 960β†’256β†’6 MLP |

| **Extractor Total** | ~4.4M | Full extraction module |



**Alternative architectures**:

- `PositionExtractor`: ~1.5M (position-specific, no attention)

- `DigitExtractor`: ~1.2M (predicts digits 0-9 instead of bits)



With `--unfreeze_layers 4`: Adds ~39.3M trainable params (top 4 transformer layers).

### Gradient Flow

Heaviside has zero gradient almost everywhere. We use **Straight-Through Estimator (STE)**:

```python

class HeavisideSTE(torch.autograd.Function):

    @staticmethod

    def forward(ctx, x):

        return (x >= 0).float()



    @staticmethod

    def backward(ctx, grad_output):

        return grad_output  # pass through unchanged

```

### Training Strategy

1. **Data**: Random 8-bit arithmetic problems (operands 0-255, 6 operations)
2. **Loss**: Multi-component BCE + CE
   - `result_loss`: BCE on output bits vs expected
   - `a_loss`, `b_loss`: BCE on extracted bits vs ground truth (2Γ— weight)
   - `op_loss`: CE on operation classification
3. **Optimizer**: AdamW, lr=3e-4, gradient clipping at 1.0
4. **Curriculum**: Epoch-based range expansion (0-9 β†’ 0-99 β†’ 0-255)
5. **Batching**: 256-4096 samples per batch (VRAM-dependent)

```bash

# Example training commands

python train.py --mode router --epochs 100                    # Sanity check

python train.py --mode llm --epochs 100 --batch_size 256      # Frozen LLM

python train.py --mode llm --unfreeze_layers 4 --batch_size 4096  # Fine-tune top layers

```

### Inference

At inference, Heaviside is true step functionβ€”no approximation. If the Extractor correctly identifies operands, the circuit **will** output the correct result.

### Target Performance

| Condition | Configuration | Accuracy |
|-----------|---------------|----------|
| Control | Vanilla SmolLM2-360M | 11.90% |
| Circuits only | Ground truth bits | 100.00% |
| Experimental | LLM + Extractor + Circuits | **Target: 100%** |

The interface generalizes to **all** 65,536 8-bit additions once trainedβ€”no memorization, the circuits compute.

### LLM Integration: Proof of Concept (In Progress)

Before proceeding with architectural extensions, we are validating the core thesis: that frozen threshold circuits can provide exact arithmetic capability to language models that otherwise fail at computation.

#### Baseline Evaluation

We evaluated SmolLM2-360M-Instruct on randomized 8-bit arithmetic using a generous answer extraction protocol. The model was prompted with a system message instructing it to output only numeric answers, and we accepted any correct number found in the output (first number, last number, or word-to-number conversion).

| Operation | SmolLM2-360M Accuracy | Notes |
|-----------|----------------------|-------|
| Addition (A + B) | 35.92% | Best performance, still fails 2/3 |
| Subtraction (A - B) | 17.72% | Poor handling of borrowing |
| Multiplication (A Γ— B) | **1.25%** | Near-total failure |
| Greater Than (A > B) | 14.37% | Often echoes expression |
| Less Than (A < B) | 4.31% | Often echoes expression |
| Equality (A == B) | 0.28% | Near-total failure |
| **Overall Fitness** | **11.90%** | 238/2000 correct |

**Methodology**: 2000 randomized test cases with operands uniformly sampled from [0, 255]. Ground truth computed as 8-bit arithmetic (matching the threshold circuit specification). Batch size 64, greedy decoding (temperature=0).

**Key Observations**:
- Multiplication accuracy (1.25%) is essentially random guessing over the output space
- Comparison operations fail because the model often echoes the expression rather than evaluating it
- Even additionβ€”the simplest operationβ€”fails nearly two-thirds of the time on 8-bit operands
- Performance degrades sharply as operand magnitude increases (edge cases like 127+128 are almost never correct)

These results establish the **control condition** for our experiment.

#### Experimental Design

| Condition | Model Configuration | Target Fitness |
|-----------|---------------------|----------------|
| **Control** | Vanilla SmolLM2-360M-Instruct | 11.90% (measured) |
| **Experimental** | SmolLM2-360M + Frozen ThresholdALU + Trained Interface | **100%** |

The experimental condition adds:
1. **BitEncoder** (trainable): Projects hidden states β†’ 24 bits (3 Γ— 8-bit operands)
2. **OpRouter** (trainable): Selects which circuit to activate based on context
3. **BitDecoder** (trainable): Projects 8-bit result β†’ hidden state delta
4. **ThresholdALU** (frozen): The verified circuits from this repository

**Training Signal**: The fitness function itself. We do not provide answer supervisionβ€”the model must learn to correctly encode operands and route to circuits such that the frozen circuits produce correct outputs. This is possible because the circuits are proven correct; the interface layers need only learn the encoding/routing protocol.

**Success Criterion**: If the experimental condition achieves 100% fitness on randomized arithmetic while the control remains at ~12%, this demonstrates:
1. The frozen threshold circuits provide exact computation
2. Neural interface layers can learn to use discrete computational substrates
3. Small language models can achieve perfect arithmetic via architectural augmentation rather than scale

#### Progress

**Stage 1: Circuit Validation β€” COMPLETE**

The frozen threshold circuits achieve 100% accuracy when given correctly formatted bit inputs:

| Test | Result |
|------|--------|
| DirectCircuitModel (ground truth bits) | 100.00% on 10,000 random cases |
| All operations (ADD, SUB, MUL, GT, LT, EQ) | 100.00% each |

This confirms the circuits compute correctly. However, this was already established by `eval.py`.

**Stage 2: LLM Baseline β€” COMPLETE**

SmolLM2-360M-Instruct baseline on randomized 8-bit arithmetic:

| Operation | Accuracy |
|-----------|----------|
| Addition | 35.92% |
| Subtraction | 17.72% |
| Multiplication | 1.25% |
| Comparisons | 0.28–14.37% |
| **Overall** | **11.90%** |

Head-to-head on 50 random cases: SmolLM2 got 7/50 (14%), circuits got 50/50 (100%).

**Stage 3: LLM Integration β€” IN PROGRESS**

The challenge: train an interface that extracts operands and operations from natural language (not from pre-formatted bit inputs).

```

"47 + 86"

    ↓

[Language Model / Extractor]

    ↓

[a_bits, b_bits, op_logits]

    ↓

[Frozen threshold circuits]

    ↓

[Result bits] β†’ 133

```

**SmolLM2 Approach** (`llm_integration/`):

Initial experiments used SmolLM2-360M-Instruct as the language understanding backbone.

| Mode | Description | Status |
|------|-------------|--------|
| `--mode router` | Train OpRouter with ground truth bits | 100% achieved |
| `--mode interface` | Train BitEncoder + OpRouter | Ready |
| `--mode llm` | Train from LLM hidden states | Explored |

**LLM Mode Options**:
- `--unfreeze_layers N`: Fine-tune top N transformer layers
- `--extract_layer N`: Extract from intermediate layer (-1 = final)
- `--position_extract`: Position-specific extraction (uses token positions)
- `--digit_pred`: Predict digits (0-9) instead of bits

**Extraction Architectures** (`model.py`):
- `Extractor`: Attention pooling + per-bit MLPs
- `PositionExtractor`: Position-aware (operand A from positions 0-2, B from 5-7)
- `DigitExtractor`: Predicts 3 digits per operand, converts to bits
- `HybridExtractor`: Digit lookup + MLP fallback for word inputs

**Curriculum Learning**: Training progresses 0-9 β†’ 0-99 β†’ 0-255 over epochs.

**Observations**: SmolLM2 integration proved challengingβ€”360M parameters of pre-trained representations largely irrelevant to arithmetic parsing, high VRAM requirements, and gradient conflicts between frozen circuits and pre-trained weights.

**Pivot: From-Scratch Extractor**

Given that the task is fundamentally simpleβ€”parse `(a, b, op)` from structured textβ€”a lightweight purpose-built model may be more appropriate than adapting a general LLM.

```

"one thousand plus two thousand"

    ↓

[Char-level tokenizer: ~40 tokens]

    ↓

[Small transformer: ~1-5M params]

    ↓

[3 heads: a_value, b_value, op_idx]

    ↓

[Frozen 32-bit threshold circuits]

    ↓

3000

```

**Design principles**:
- **Minimal Python**: All parsing logic learned in weights, not hardcoded
- **Character-level input**: No word tokenization; model learns "forty seven" = 47
- **From-scratch training**: No pre-trained weights to conflict with
- **32-bit target**: Practical arithmetic range (0–4,294,967,295)

**Planned architecture**:
- Vocab: ~40 chars (a-z, 0-9, space, operators)
- Embedding: 40 Γ— 128d
- Encoder: 2-3 transformer layers
- Output heads: `a_classifier`, `b_classifier`, `op_classifier`
- Total: ~1-5M params (vs 360M for SmolLM2)

This approach treats the problem as what it is: a structured parsing task where the frozen circuits handle all computation. The extractor need only learn the mapping from text to operandsβ€”no world knowledge required.

#### Proof of Concept Scope

- **32-bit operands** (0–4,294,967,295)
- **Six operations**: ADD, SUB, MUL, GT, LT, EQ
- **Structured input**: Digits ("1000 + 2000") and number words ("one thousand plus two thousand")

**Current Status**:
- Circuit validation: Complete (100% on 8-bit operations)
- 32-bit circuits: Built and tested (adder verified on 1M+2M=3M, etc.)
- LLM baseline: Measured (11.90% - establishes control condition)
- SmolLM2 integration: Infrastructure complete, training explored
- From-scratch extractor: Design phase

### Extension Roadmap

#### Completed

1. **32-bit operations (0–4,294,967,295)** β€” Full 32-bit ALU implemented via `--bits 32` flag:
   - 32-bit ripple carry adder (32 chained full adders) β€” **verified**
   - 32-bit subtractor (NOT + adder with carry-in)
   - 32-bit multiplication (1024 partial product ANDs)
   - 32-bit division (32 restoring stages)
   - 32-bit comparators (GT, LT, GE, LE, EQ)
   - 32-bit bitwise ops (AND, OR, XOR, NOT)
   - 32-bit shifts (SHL, SHR), INC, DEC, NEG

   **Known issue**: Single-layer 32-bit comparators use weights up to 2Β³ΒΉ, which exceeds float32 mantissa precision (24 bits). Comparisons between large numbers differing only in low bits may fail. Fix planned: cascaded byte-wise comparison (compare MSB first, if equal compare next byte, etc.).

2. **3-operand addition (15 + 27 + 33 = 75)** β€” `arithmetic.add3_8bit` chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.

3. **Order of operations (5 + 3 Γ— 2 = 11)** β€” `arithmetic.expr_add_mul` computes A + (B Γ— C) using shift-add multiplication then addition. 64 AND gates + 64 full adders, 73 test cases verified.

#### Planned

1. **Cascaded 32-bit comparators** β€” Replace single-layer weighted comparison with multi-layer byte-wise cascade. Each byte comparison uses 8-bit weights (max 128), well within float32 precision. Hardware-accurate and extensible to 64-bit, 128-bit, etc.

2. **Parenthetical expressions ((5 + 3) Γ— 2 = 16)** β€” Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out.

3. **Multi-operation chains (a + b - c Γ— d)** β€” Sequential dispatch through multiple circuits with intermediate result routing. Requires state management in interface layers.

4. **Floating point arithmetic** β€” IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas.

5. **Full CPU integration** β€” Enable memory access circuits for stateful computation. Allows multi-step algorithms executed entirely within threshold logic.

---

## Build Tool

Output filenames are auto-generated from configuration:

```

Format: neural_{alu|computer}{BITS}[_{MEMORY}].safetensors



Examples:

  neural_alu8.safetensors               # 8-bit, no memory

  neural_alu32.safetensors              # 32-bit, no memory

  neural_computer8.safetensors          # 8-bit, full memory (default)

  neural_computer32.safetensors         # 32-bit, full memory

  neural_computer8_small.safetensors    # 8-bit, 1KB memory

  neural_computer32_small.safetensors   # 32-bit, 1KB memory

  neural_computer8_addr12.safetensors   # 8-bit, custom 4KB (2^12 bytes)

```

```bash

# 8-bit CPU (default)

python build.py --apply all                                # -> neural_computer8.safetensors

python build.py -m none --apply all                        # -> neural_alu8.safetensors

python build.py -m scratchpad --apply all                  # -> neural_computer8_scratchpad.safetensors



# 16-bit ALU

python build.py --bits 16 --apply all                      # -> neural_computer16.safetensors

python build.py --bits 16 -m none --apply all              # -> neural_alu16.safetensors



# 32-bit ALU

python build.py --bits 32 -m small --apply all             # -> neural_computer32_small.safetensors

python build.py --bits 32 -m none --apply all              # -> neural_alu32.safetensors



# Custom address width

python build.py --bits 16 -a 6 --apply all                 # -> neural_computer16_addr6.safetensors

```

**Bit widths** (`--bits`):

| Width | Range | Use Case |
|-------|-------|----------|
| 8 | 0–255 | Full CPU, legacy |
| 16 | 0–65,535 | Extended arithmetic |
| 32 | 0–4,294,967,295 | Practical arithmetic |

**Memory profiles** (`-m`):

| Profile | Size | Addr Bits | Filename Suffix | Params | Use Case |
|---------|------|-----------|-----------------|--------|----------|
| `none` | 0B | β€” | (uses `alu`) | ~32K | Pure ALU |
| `registers` | 16B | 4 | `_registers` | ~34K | Minimal state |
| `scratchpad` | 256B | 8 | `_scratchpad` | ~63K | 8-bit scratch |
| `small` | 1KB | 10 | `_small` | ~123K | 32-bit scratch |
| `reduced` | 4KB | 12 | `_reduced` | ~549K | Small programs |
| `full` | 64KB | 16 | (none) | ~8.29M | Full CPU |

**Custom address width** (`-a N`): Memory size = 2^N bytes, suffix = `_addrN`

---

## Citation

```bibtex

@misc{8bit-threshold-computer,

  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},

  author={Norton, Charles},

  year={2026},

  howpublished={Hugging Face},

  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}

}

```

---

## License

MIT

---

## References

1. McCulloch & Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity"
2. Muroga (1971). "Threshold Logic and Its Applications"
3. Siegelmann & Sontag (1995). "On the Computational Power of Neural Nets"
4. Bengio et al. (2013). "Estimating or Propagating Gradients Through Stochastic Neurons"
5. Ma et al. (2024). "The Era of 1-bit LLMs" (BitNet b1.58)
6. HuggingFace (2024). "SmolLM2: Small Language Models" β€” [Model Card](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
7. Vaswani et al. (2017). "Attention Is All You Need" β€” Transformer architecture
8. Su et al. (2021). "RoFormer: Enhanced Transformer with Rotary Position Embedding" β€” RoPE