Cleanup pass: README accuracy, dead-code removal, CPU coverage
Browse filesREADME updates:
- Lead with the actual landed state: every weight is in {-1, 0, 1}
- Hardware compatibility section adds FPGA mappability and explains
why ternary weights collapse evaluation to popcount + bias (no
multipliers needed); reframes neuromorphic targets accordingly
- Verification table replaced with honest coverage labels: 8-bit
arithmetic and ALU primitives are strategic-sampling, not exhaustive;
16/32-bit are extreme-value sampling; only Boolean, control flow,
threshold k-of-n, modular, parity, pattern, and combinational are
truly exhaustive. CPU integration testing called out as a separate
category covered by test_cpu.py
eval.py: remove _test_comparators_nbits_legacy (unreachable after
bit-cascade migration) and the legacy multi-layer fallback inside
_test_modular (dead now that all modular detectors use bit-cascade
equality on freshly-built variants). The single-layer path remains
because mod 2/4/8 still legitimately use it.
CPU program suite gains two programs:
- div_via_repeated_sub: while A >= B { A -= B; quotient += 1 }
exercises CMP + JNC + SUB + ADD loop, then cross-checks the result
against the on-chip DIV opcode (0x8) on the same inputs
- bitwise_chain: AND -> OR -> XOR -> SHL -> SHR pipeline with stored
intermediates so any single-op regression is caught immediately
eval_all.py's GenericThresholdCPU.step previously only handled
opcodes 0x0/0x1/0x7/0x9/0xA-0xF; opcodes 0x2-0x6 (bitwise + shifts)
and 0x8 (DIV) fell through to NOP. Added the missing handlers; CPU
program suite reports 9/9 PASS on every memory profile.
- README.md +20 -16
- cpu_programs.py +121 -0
- eval.py +5 -242
- eval_all.py +14 -0
|
@@ -18,7 +18,7 @@ A Turing-complete CPU implemented entirely as threshold logic gates. Every gate,
|
|
| 18 |
output = 1 if (Σ wᵢ·xᵢ + b) ≥ 0 else 0
|
| 19 |
```
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
The repository ships eighteen prebuilt configurations spanning three data-path widths (8, 16, 32 bits) and six memory sizes (0 B to 64 KB). The canonical file at the repo root is the largest of these: a 32-bit data path with a 64 KB address space and ~8.47 M parameters.
|
| 24 |
|
|
@@ -261,20 +261,23 @@ Most tensors fit in `int8`; comparator weights and a few wide single-layer thres
|
|
| 261 |
|
| 262 |
## Verification
|
| 263 |
|
| 264 |
-
| Category |
|
| 265 |
-
|----------|--------|-------|
|
| 266 |
| Boolean gates | exhaustive | all 2^n input combinations |
|
| 267 |
-
| Arithmetic |
|
| 268 |
-
|
|
| 269 |
-
|
|
| 270 |
-
|
|
| 271 |
-
|
|
| 272 |
-
|
|
| 273 |
-
|
|
|
|
|
|
|
|
|
|
|
| 274 |
|
| 275 |
-
|
| 276 |
|
| 277 |
-
`eval_all.py` runs the unified suite. Exit code is the number of failing variants (0 means all pass).
|
| 278 |
|
| 279 |
---
|
| 280 |
|
|
@@ -327,11 +330,12 @@ The weights in this repository implement a complete CPU: registers, ALU with 16
|
|
| 327 |
|
| 328 |
## Hardware compatibility
|
| 329 |
|
| 330 |
-
All weights are
|
| 331 |
|
| 332 |
-
- **
|
| 333 |
-
- **
|
| 334 |
-
- **
|
|
|
|
| 335 |
|
| 336 |
---
|
| 337 |
|
|
|
|
| 18 |
output = 1 if (Σ wᵢ·xᵢ + b) ≥ 0 else 0
|
| 19 |
```
|
| 20 |
|
| 21 |
+
**Every weight in the file is in {-1, 0, 1}.** Biases are integers. Activations are the Heaviside step. Nothing else. The library was originally built with positional weights up to ±2³¹ for wide single-layer comparators; those have all been replaced with bit-cascaded multi-layer equivalents that use only ternary weights and small integer biases. Threshold-gate evaluation reduces to a popcount minus a popcount plus a bias, which is exactly what neuromorphic chips and FPGAs natively support.
|
| 22 |
|
| 23 |
The repository ships eighteen prebuilt configurations spanning three data-path widths (8, 16, 32 bits) and six memory sizes (0 B to 64 KB). The canonical file at the repo root is the largest of these: a 32-bit data path with a 64 KB address space and ~8.47 M parameters.
|
| 24 |
|
|
|
|
| 261 |
|
| 262 |
## Verification
|
| 263 |
|
| 264 |
+
| Category | Coverage | Notes |
|
| 265 |
+
|----------|----------|-------|
|
| 266 |
| Boolean gates | exhaustive | all 2^n input combinations |
|
| 267 |
+
| Arithmetic (8-bit) | strategic sampling | edge values + diagonal pairs; ~50 cases per circuit |
|
| 268 |
+
| Arithmetic (16/32-bit) | strategic sampling | extreme values + targeted bit patterns |
|
| 269 |
+
| ALU primitives (8/16/32-bit) | strategic sampling | edge inputs per operation |
|
| 270 |
+
| Control flow | exhaustive | all 2^3 input combinations per Jcc |
|
| 271 |
+
| Threshold k-of-n | exhaustive | all 256 8-bit popcounts |
|
| 272 |
+
| Modular (all moduli, 8-bit input) | exhaustive | every value in [0, 255] |
|
| 273 |
+
| Parity | exhaustive | every value in [0, 255] |
|
| 274 |
+
| Pattern recognition | exhaustive | every value in [0, 255] |
|
| 275 |
+
| Combinational logic | exhaustive | full input space per gate |
|
| 276 |
+
| CPU integration | program-level | seven assembled programs (Fibonacci, sum, sort, self-modifying JMP, all eight Jcc, CALL stack push, MUL vs repeated ADD) plus a divisor-by-repeated-subtraction cross-checked against the DIV opcode and a bitwise pipeline (AND/OR/XOR/SHL/SHR) |
|
| 277 |
|
| 278 |
+
The 8-bit arithmetic and ALU tests use strategic sampling rather than the full 65,536-case sweep because exhaustive coverage at 8-bit is feasible but not necessary given that the circuits are constructed gate-by-gate. The 16-bit and 32-bit arithmetic tests sample edge cases only; full exhaustive coverage at those widths is infeasible without specialized hardware.
|
| 279 |
|
| 280 |
+
`eval_all.py` runs the unified suite. Exit code is the number of failing variants (0 means all pass). `test_cpu.py` runs the CPU program suite against a chosen variant.
|
| 281 |
|
| 282 |
---
|
| 283 |
|
|
|
|
| 330 |
|
| 331 |
## Hardware compatibility
|
| 332 |
|
| 333 |
+
All weights are in {-1, 0, 1}, all activations are Heaviside step, and every gate is a single weighted sum followed by a sign test. This eliminates multipliers entirely: each gate evaluation is a popcount of `+1`-weighted inputs minus a popcount of `-1`-weighted inputs plus an integer bias. The circuits are intended to deploy directly on:
|
| 334 |
|
| 335 |
+
- **FPGA**: every gate maps to a small LUT cluster (or a popcount tree of LUT4/LUT6 + carry chain). Ternary weight storage compresses to 2 bits per weight; routing collapses to bit selection.
|
| 336 |
+
- **Intel Loihi**: integer weights and Heaviside threshold neurons are the native primitive. Ternary fits well within Loihi's 8-bit weight range.
|
| 337 |
+
- **IBM TrueNorth**: configurable threshold per neurosynaptic core; ternary weights and small biases are within the supported range.
|
| 338 |
+
- **BrainChip Akida**: edge-oriented integer-weight inference; ternary weights fit cleanly.
|
| 339 |
|
| 340 |
---
|
| 341 |
|
|
@@ -506,6 +506,125 @@ def cross_check_mul(mem_size: int = 256) -> ProgramResult:
|
|
| 506 |
return mem, expected, 80, f"MUL vs repeated ADD: {A_VAL} * {B_VAL} = {expected_product}"
|
| 507 |
|
| 508 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 509 |
SUITE = [
|
| 510 |
("fib", lambda mem_size: fib(11, mem_size)),
|
| 511 |
("sum_n", lambda mem_size: sum_n(10, mem_size)),
|
|
@@ -514,5 +633,7 @@ SUITE = [
|
|
| 514 |
("call_pushes_pc", lambda mem_size: call_pushes_pc(mem_size)),
|
| 515 |
("bubble_sort_4", lambda mem_size: bubble_sort_4(mem_size)),
|
| 516 |
("cross_check_mul", lambda mem_size: cross_check_mul(mem_size)),
|
|
|
|
|
|
|
| 517 |
]
|
| 518 |
|
|
|
|
| 506 |
return mem, expected, 80, f"MUL vs repeated ADD: {A_VAL} * {B_VAL} = {expected_product}"
|
| 507 |
|
| 508 |
|
| 509 |
+
def div_via_repeated_sub(mem_size: int = 256) -> ProgramResult:
|
| 510 |
+
"""Compute floor(A/B) and (A mod B) by repeated subtraction.
|
| 511 |
+
|
| 512 |
+
Loop: while A >= B { A -= B; quotient += 1 }
|
| 513 |
+
Uses CMP + JC (carry-set on no-borrow), SUB, ADD, JMP, STORE, HALT.
|
| 514 |
+
|
| 515 |
+
Cross-checked against the on-chip 8-bit DIV opcode (0x8) via a
|
| 516 |
+
second pass that uses DIV directly. Both quotients written to OUT
|
| 517 |
+
locations; the test verifies they match.
|
| 518 |
+
"""
|
| 519 |
+
A_VAL = 100
|
| 520 |
+
B_VAL = 7
|
| 521 |
+
expected_q = A_VAL // B_VAL # 14
|
| 522 |
+
expected_r = A_VAL % B_VAL # 2
|
| 523 |
+
|
| 524 |
+
a = Asm(mem_size)
|
| 525 |
+
|
| 526 |
+
a.org(0)
|
| 527 |
+
# ---- Repeated-subtraction division ----
|
| 528 |
+
a.load(0, "A") # R0 = A (will become remainder)
|
| 529 |
+
a.load(1, "B") # R1 = B (divisor)
|
| 530 |
+
a.load(2, "ZERO") # R2 = 0 (will become quotient)
|
| 531 |
+
a.load(3, "ONE") # R3 = 1 (increment)
|
| 532 |
+
|
| 533 |
+
a.label("loop")
|
| 534 |
+
a.cmp(0, 1) # CMP R0, R1; carry=1 (no-borrow) iff R0 >= R1
|
| 535 |
+
a.jnc("done") # if R0 < R1 (carry=0), exit loop
|
| 536 |
+
a.sub(0, 1) # R0 -= B
|
| 537 |
+
a.add(2, 3) # quotient += 1
|
| 538 |
+
a.jmp("loop")
|
| 539 |
+
|
| 540 |
+
a.label("done")
|
| 541 |
+
a.store(2, "OUT_Q_RPT") # quotient via repeated sub
|
| 542 |
+
a.store(0, "OUT_R_RPT") # remainder via repeated sub
|
| 543 |
+
|
| 544 |
+
# ---- Direct DIV opcode for cross-check ----
|
| 545 |
+
a.load(0, "A")
|
| 546 |
+
a.load(1, "B")
|
| 547 |
+
a.dw(_enc(0x8, 0, 1, 0)) # DIV R0, R1 -> R0 = R0 / R1 (8-bit DIV)
|
| 548 |
+
a.store(0, "OUT_Q_DIV")
|
| 549 |
+
a.halt()
|
| 550 |
+
|
| 551 |
+
a.org(0x80)
|
| 552 |
+
a.label("A"); a.db(A_VAL)
|
| 553 |
+
a.label("B"); a.db(B_VAL)
|
| 554 |
+
a.label("ZERO"); a.db(0)
|
| 555 |
+
a.label("ONE"); a.db(1)
|
| 556 |
+
a.label("OUT_Q_RPT"); a.db(0)
|
| 557 |
+
a.label("OUT_R_RPT"); a.db(0)
|
| 558 |
+
a.label("OUT_Q_DIV"); a.db(0)
|
| 559 |
+
|
| 560 |
+
mem = a.assemble()
|
| 561 |
+
expected = {
|
| 562 |
+
a.labels["OUT_Q_RPT"]: expected_q,
|
| 563 |
+
a.labels["OUT_R_RPT"]: expected_r,
|
| 564 |
+
a.labels["OUT_Q_DIV"]: expected_q,
|
| 565 |
+
}
|
| 566 |
+
return mem, expected, 4 * (A_VAL // B_VAL + 4) + 12, (
|
| 567 |
+
f"{A_VAL} / {B_VAL}: quotient {expected_q} (repeated SUB) "
|
| 568 |
+
f"matches DIV opcode result; remainder {expected_r}"
|
| 569 |
+
)
|
| 570 |
+
|
| 571 |
+
|
| 572 |
+
def bitwise_chain(mem_size: int = 256) -> ProgramResult:
|
| 573 |
+
"""Run a chain of bitwise ops and verify each intermediate value.
|
| 574 |
+
|
| 575 |
+
Sequence:
|
| 576 |
+
R0 = A & B (AND)
|
| 577 |
+
R0 = R0 | C (OR)
|
| 578 |
+
R0 = R0 ^ D (XOR)
|
| 579 |
+
R0 = R0 << 1 (SHL)
|
| 580 |
+
R0 = R0 >> 1 (SHR)
|
| 581 |
+
Stores R0 after each step. Verifies all intermediate values to
|
| 582 |
+
catch any single-op regression.
|
| 583 |
+
"""
|
| 584 |
+
A = 0xCC # 11001100
|
| 585 |
+
B = 0xF0 # 11110000
|
| 586 |
+
C = 0x0F # 00001111
|
| 587 |
+
D = 0xAA # 10101010
|
| 588 |
+
|
| 589 |
+
s1 = A & B # 0xC0
|
| 590 |
+
s2 = s1 | C # 0xCF
|
| 591 |
+
s3 = s2 ^ D # 0x65
|
| 592 |
+
s4 = (s3 << 1) & 0xFF # 0xCA
|
| 593 |
+
s5 = s4 >> 1 # 0x65
|
| 594 |
+
|
| 595 |
+
a = Asm(mem_size)
|
| 596 |
+
a.org(0)
|
| 597 |
+
a.load(0, "A"); a.load(1, "B"); a.and_(0, 1); a.store(0, "S1")
|
| 598 |
+
a.load(1, "C"); a.or_(0, 1); a.store(0, "S2")
|
| 599 |
+
a.load(1, "D"); a.xor(0, 1); a.store(0, "S3")
|
| 600 |
+
a.shl(0); a.store(0, "S4")
|
| 601 |
+
a.shr(0); a.store(0, "S5")
|
| 602 |
+
a.halt()
|
| 603 |
+
|
| 604 |
+
a.org(0x80)
|
| 605 |
+
a.label("A"); a.db(A)
|
| 606 |
+
a.label("B"); a.db(B)
|
| 607 |
+
a.label("C"); a.db(C)
|
| 608 |
+
a.label("D"); a.db(D)
|
| 609 |
+
a.label("S1"); a.db(0)
|
| 610 |
+
a.label("S2"); a.db(0)
|
| 611 |
+
a.label("S3"); a.db(0)
|
| 612 |
+
a.label("S4"); a.db(0)
|
| 613 |
+
a.label("S5"); a.db(0)
|
| 614 |
+
|
| 615 |
+
mem = a.assemble()
|
| 616 |
+
expected = {
|
| 617 |
+
a.labels["S1"]: s1,
|
| 618 |
+
a.labels["S2"]: s2,
|
| 619 |
+
a.labels["S3"]: s3,
|
| 620 |
+
a.labels["S4"]: s4,
|
| 621 |
+
a.labels["S5"]: s5,
|
| 622 |
+
}
|
| 623 |
+
return mem, expected, 30, (
|
| 624 |
+
f"bitwise chain AND/OR/XOR/SHL/SHR -> {s1:#x},{s2:#x},{s3:#x},{s4:#x},{s5:#x}"
|
| 625 |
+
)
|
| 626 |
+
|
| 627 |
+
|
| 628 |
SUITE = [
|
| 629 |
("fib", lambda mem_size: fib(11, mem_size)),
|
| 630 |
("sum_n", lambda mem_size: sum_n(10, mem_size)),
|
|
|
|
| 633 |
("call_pushes_pc", lambda mem_size: call_pushes_pc(mem_size)),
|
| 634 |
("bubble_sort_4", lambda mem_size: bubble_sort_4(mem_size)),
|
| 635 |
("cross_check_mul", lambda mem_size: cross_check_mul(mem_size)),
|
| 636 |
+
("div_via_repeated_sub", lambda mem_size: div_via_repeated_sub(mem_size)),
|
| 637 |
+
("bitwise_chain", lambda mem_size: bitwise_chain(mem_size)),
|
| 638 |
]
|
| 639 |
|
|
@@ -1884,198 +1884,6 @@ class BatchedFitnessEvaluator:
|
|
| 1884 |
|
| 1885 |
return scores, total
|
| 1886 |
|
| 1887 |
-
# Legacy single-layer/byte-cascade path retained for backwards-compat with
|
| 1888 |
-
# variants built before the bit-cascade migration. Unused on freshly-built
|
| 1889 |
-
# variants but kept to avoid surprises if someone loads an older file.
|
| 1890 |
-
def _test_comparators_nbits_legacy(self, pop: Dict, bits: int, debug: bool) -> Tuple[torch.Tensor, int]:
|
| 1891 |
-
pop_size = next(iter(pop.values())).shape[0]
|
| 1892 |
-
scores = torch.zeros(pop_size, device=self.device)
|
| 1893 |
-
total = 0
|
| 1894 |
-
if bits == 32:
|
| 1895 |
-
comp_a = self.comp32_a
|
| 1896 |
-
comp_b = self.comp32_b
|
| 1897 |
-
elif bits == 16:
|
| 1898 |
-
comp_a = self.comp_a.clamp(0, 65535)
|
| 1899 |
-
comp_b = self.comp_b.clamp(0, 65535)
|
| 1900 |
-
else:
|
| 1901 |
-
comp_a = self.comp_a
|
| 1902 |
-
comp_b = self.comp_b
|
| 1903 |
-
num_tests = len(comp_a)
|
| 1904 |
-
if bits <= 16:
|
| 1905 |
-
a_bits = torch.stack([((comp_a >> (bits - 1 - i)) & 1).float() for i in range(bits)], dim=1)
|
| 1906 |
-
b_bits = torch.stack([((comp_b >> (bits - 1 - i)) & 1).float() for i in range(bits)], dim=1)
|
| 1907 |
-
inputs = torch.cat([a_bits, b_bits], dim=1)
|
| 1908 |
-
|
| 1909 |
-
comparators = [
|
| 1910 |
-
(f'arithmetic.greaterthan{bits}bit', lambda a, b: a > b),
|
| 1911 |
-
(f'arithmetic.greaterorequal{bits}bit', lambda a, b: a >= b),
|
| 1912 |
-
(f'arithmetic.lessthan{bits}bit', lambda a, b: a < b),
|
| 1913 |
-
(f'arithmetic.lessorequal{bits}bit', lambda a, b: a <= b),
|
| 1914 |
-
]
|
| 1915 |
-
|
| 1916 |
-
for name, op in comparators:
|
| 1917 |
-
try:
|
| 1918 |
-
expected = torch.tensor([1.0 if op(a.item(), b.item()) else 0.0
|
| 1919 |
-
for a, b in zip(comp_a, comp_b)], device=self.device)
|
| 1920 |
-
w = pop[f'{name}.weight']
|
| 1921 |
-
b = pop[f'{name}.bias']
|
| 1922 |
-
out = heaviside(inputs @ w.view(pop_size, -1).T + b.view(pop_size))
|
| 1923 |
-
correct = (out == expected.unsqueeze(1)).float().sum(0)
|
| 1924 |
-
failures = []
|
| 1925 |
-
if pop_size == 1:
|
| 1926 |
-
for i in range(num_tests):
|
| 1927 |
-
if out[i, 0].item() != expected[i].item():
|
| 1928 |
-
failures.append(([int(comp_a[i].item()), int(comp_b[i].item())],
|
| 1929 |
-
expected[i].item(), out[i, 0].item()))
|
| 1930 |
-
self._record(name, int(correct[0].item()), num_tests, failures)
|
| 1931 |
-
if debug:
|
| 1932 |
-
r = self.results[-1]
|
| 1933 |
-
print(f" {r.name}: {r.passed}/{r.total} {'PASS' if r.success else 'FAIL'}")
|
| 1934 |
-
scores += correct
|
| 1935 |
-
total += num_tests
|
| 1936 |
-
except KeyError:
|
| 1937 |
-
pass
|
| 1938 |
-
|
| 1939 |
-
prefix = f'arithmetic.equality{bits}bit'
|
| 1940 |
-
try:
|
| 1941 |
-
expected = torch.tensor([1.0 if a.item() == b.item() else 0.0
|
| 1942 |
-
for a, b in zip(comp_a, comp_b)], device=self.device)
|
| 1943 |
-
w_geq = pop[f'{prefix}.layer1.geq.weight']
|
| 1944 |
-
b_geq = pop[f'{prefix}.layer1.geq.bias']
|
| 1945 |
-
w_leq = pop[f'{prefix}.layer1.leq.weight']
|
| 1946 |
-
b_leq = pop[f'{prefix}.layer1.leq.bias']
|
| 1947 |
-
h_geq = heaviside(inputs @ w_geq.view(pop_size, -1).T + b_geq.view(pop_size))
|
| 1948 |
-
h_leq = heaviside(inputs @ w_leq.view(pop_size, -1).T + b_leq.view(pop_size))
|
| 1949 |
-
hidden = torch.stack([h_geq, h_leq], dim=-1)
|
| 1950 |
-
w2 = pop[f'{prefix}.layer2.weight']
|
| 1951 |
-
b2 = pop[f'{prefix}.layer2.bias']
|
| 1952 |
-
out = heaviside((hidden * w2.view(pop_size, 1, 2)).sum(-1) + b2.view(pop_size))
|
| 1953 |
-
correct = (out == expected.unsqueeze(1)).float().sum(0)
|
| 1954 |
-
failures = []
|
| 1955 |
-
if pop_size == 1:
|
| 1956 |
-
for i in range(num_tests):
|
| 1957 |
-
if out[i, 0].item() != expected[i].item():
|
| 1958 |
-
failures.append(([int(comp_a[i].item()), int(comp_b[i].item())],
|
| 1959 |
-
expected[i].item(), out[i, 0].item()))
|
| 1960 |
-
self._record(prefix, int(correct[0].item()), num_tests, failures)
|
| 1961 |
-
if debug:
|
| 1962 |
-
r = self.results[-1]
|
| 1963 |
-
print(f" {r.name}: {r.passed}/{r.total} {'PASS' if r.success else 'FAIL'}")
|
| 1964 |
-
scores += correct
|
| 1965 |
-
total += num_tests
|
| 1966 |
-
except KeyError:
|
| 1967 |
-
pass
|
| 1968 |
-
else:
|
| 1969 |
-
num_bytes = bits // 8
|
| 1970 |
-
prefix = f"arithmetic.cmp{bits}bit"
|
| 1971 |
-
|
| 1972 |
-
byte_gt = []
|
| 1973 |
-
byte_lt = []
|
| 1974 |
-
byte_eq = []
|
| 1975 |
-
|
| 1976 |
-
for b in range(num_bytes):
|
| 1977 |
-
start_bit = b * 8
|
| 1978 |
-
a_byte = torch.stack([((comp_a >> (bits - 1 - start_bit - i)) & 1).float() for i in range(8)], dim=1)
|
| 1979 |
-
b_byte = torch.stack([((comp_b >> (bits - 1 - start_bit - i)) & 1).float() for i in range(8)], dim=1)
|
| 1980 |
-
byte_input = torch.cat([a_byte, b_byte], dim=1)
|
| 1981 |
-
|
| 1982 |
-
w_gt = pop[f'{prefix}.byte{b}.gt.weight'].view(pop_size, -1)
|
| 1983 |
-
b_gt = pop[f'{prefix}.byte{b}.gt.bias'].view(pop_size)
|
| 1984 |
-
byte_gt.append(heaviside(byte_input @ w_gt.T + b_gt))
|
| 1985 |
-
|
| 1986 |
-
w_lt = pop[f'{prefix}.byte{b}.lt.weight'].view(pop_size, -1)
|
| 1987 |
-
b_lt = pop[f'{prefix}.byte{b}.lt.bias'].view(pop_size)
|
| 1988 |
-
byte_lt.append(heaviside(byte_input @ w_lt.T + b_lt))
|
| 1989 |
-
|
| 1990 |
-
w_geq = pop[f'{prefix}.byte{b}.eq.geq.weight'].view(pop_size, -1)
|
| 1991 |
-
b_geq = pop[f'{prefix}.byte{b}.eq.geq.bias'].view(pop_size)
|
| 1992 |
-
w_leq = pop[f'{prefix}.byte{b}.eq.leq.weight'].view(pop_size, -1)
|
| 1993 |
-
b_leq = pop[f'{prefix}.byte{b}.eq.leq.bias'].view(pop_size)
|
| 1994 |
-
h_geq = heaviside(byte_input @ w_geq.T + b_geq)
|
| 1995 |
-
h_leq = heaviside(byte_input @ w_leq.T + b_leq)
|
| 1996 |
-
w_and = pop[f'{prefix}.byte{b}.eq.and.weight'].view(pop_size, -1)
|
| 1997 |
-
b_and = pop[f'{prefix}.byte{b}.eq.and.bias'].view(pop_size)
|
| 1998 |
-
eq_inp = torch.stack([h_geq, h_leq], dim=-1)
|
| 1999 |
-
byte_eq.append(heaviside((eq_inp * w_and).sum(-1) + b_and))
|
| 2000 |
-
|
| 2001 |
-
cascade_gt = []
|
| 2002 |
-
cascade_lt = []
|
| 2003 |
-
for b in range(num_bytes):
|
| 2004 |
-
if b == 0:
|
| 2005 |
-
cascade_gt.append(byte_gt[0])
|
| 2006 |
-
cascade_lt.append(byte_lt[0])
|
| 2007 |
-
else:
|
| 2008 |
-
eq_stack = torch.stack(byte_eq[:b], dim=-1)
|
| 2009 |
-
w_all_eq = pop[f'{prefix}.cascade.gt.stage{b}.all_eq.weight'].view(pop_size, -1)
|
| 2010 |
-
b_all_eq = pop[f'{prefix}.cascade.gt.stage{b}.all_eq.bias'].view(pop_size)
|
| 2011 |
-
all_eq_gt = heaviside((eq_stack * w_all_eq).sum(-1) + b_all_eq)
|
| 2012 |
-
w_and = pop[f'{prefix}.cascade.gt.stage{b}.and.weight'].view(pop_size, -1)
|
| 2013 |
-
b_and = pop[f'{prefix}.cascade.gt.stage{b}.and.bias'].view(pop_size)
|
| 2014 |
-
stage_inp = torch.stack([all_eq_gt, byte_gt[b]], dim=-1)
|
| 2015 |
-
cascade_gt.append(heaviside((stage_inp * w_and).sum(-1) + b_and))
|
| 2016 |
-
|
| 2017 |
-
w_all_eq_lt = pop[f'{prefix}.cascade.lt.stage{b}.all_eq.weight'].view(pop_size, -1)
|
| 2018 |
-
b_all_eq_lt = pop[f'{prefix}.cascade.lt.stage{b}.all_eq.bias'].view(pop_size)
|
| 2019 |
-
all_eq_lt = heaviside((eq_stack * w_all_eq_lt).sum(-1) + b_all_eq_lt)
|
| 2020 |
-
w_and_lt = pop[f'{prefix}.cascade.lt.stage{b}.and.weight'].view(pop_size, -1)
|
| 2021 |
-
b_and_lt = pop[f'{prefix}.cascade.lt.stage{b}.and.bias'].view(pop_size)
|
| 2022 |
-
stage_inp_lt = torch.stack([all_eq_lt, byte_lt[b]], dim=-1)
|
| 2023 |
-
cascade_lt.append(heaviside((stage_inp_lt * w_and_lt).sum(-1) + b_and_lt))
|
| 2024 |
-
|
| 2025 |
-
gt_stack = torch.stack(cascade_gt, dim=-1)
|
| 2026 |
-
w_gt_or = pop[f'arithmetic.greaterthan{bits}bit.weight'].view(pop_size, -1)
|
| 2027 |
-
b_gt_or = pop[f'arithmetic.greaterthan{bits}bit.bias'].view(pop_size)
|
| 2028 |
-
gt_out = heaviside((gt_stack * w_gt_or).sum(-1) + b_gt_or)
|
| 2029 |
-
|
| 2030 |
-
lt_stack = torch.stack(cascade_lt, dim=-1)
|
| 2031 |
-
w_lt_or = pop[f'arithmetic.lessthan{bits}bit.weight'].view(pop_size, -1)
|
| 2032 |
-
b_lt_or = pop[f'arithmetic.lessthan{bits}bit.bias'].view(pop_size)
|
| 2033 |
-
lt_out = heaviside((lt_stack * w_lt_or).sum(-1) + b_lt_or)
|
| 2034 |
-
|
| 2035 |
-
w_not_lt = pop[f'arithmetic.greaterorequal{bits}bit.not_lt.weight'].view(pop_size, -1)
|
| 2036 |
-
b_not_lt = pop[f'arithmetic.greaterorequal{bits}bit.not_lt.bias'].view(pop_size)
|
| 2037 |
-
not_lt = heaviside(lt_out.unsqueeze(-1) @ w_not_lt.T + b_not_lt).squeeze(-1)
|
| 2038 |
-
w_ge = pop[f'arithmetic.greaterorequal{bits}bit.weight'].view(pop_size, -1)
|
| 2039 |
-
b_ge = pop[f'arithmetic.greaterorequal{bits}bit.bias'].view(pop_size)
|
| 2040 |
-
ge_out = heaviside(not_lt.unsqueeze(-1) @ w_ge.T + b_ge).squeeze(-1)
|
| 2041 |
-
|
| 2042 |
-
w_not_gt = pop[f'arithmetic.lessorequal{bits}bit.not_gt.weight'].view(pop_size, -1)
|
| 2043 |
-
b_not_gt = pop[f'arithmetic.lessorequal{bits}bit.not_gt.bias'].view(pop_size)
|
| 2044 |
-
not_gt = heaviside(gt_out.unsqueeze(-1) @ w_not_gt.T + b_not_gt).squeeze(-1)
|
| 2045 |
-
w_le = pop[f'arithmetic.lessorequal{bits}bit.weight'].view(pop_size, -1)
|
| 2046 |
-
b_le = pop[f'arithmetic.lessorequal{bits}bit.bias'].view(pop_size)
|
| 2047 |
-
le_out = heaviside(not_gt.unsqueeze(-1) @ w_le.T + b_le).squeeze(-1)
|
| 2048 |
-
|
| 2049 |
-
eq_stack = torch.stack(byte_eq, dim=-1)
|
| 2050 |
-
w_eq_all = pop[f'arithmetic.equality{bits}bit.weight'].view(pop_size, -1)
|
| 2051 |
-
b_eq_all = pop[f'arithmetic.equality{bits}bit.bias'].view(pop_size)
|
| 2052 |
-
eq_out = heaviside((eq_stack * w_eq_all).sum(-1) + b_eq_all)
|
| 2053 |
-
|
| 2054 |
-
for name, out, op in [
|
| 2055 |
-
(f'arithmetic.greaterthan{bits}bit', gt_out, lambda a, b: a > b),
|
| 2056 |
-
(f'arithmetic.greaterorequal{bits}bit', ge_out, lambda a, b: a >= b),
|
| 2057 |
-
(f'arithmetic.lessthan{bits}bit', lt_out, lambda a, b: a < b),
|
| 2058 |
-
(f'arithmetic.lessorequal{bits}bit', le_out, lambda a, b: a <= b),
|
| 2059 |
-
(f'arithmetic.equality{bits}bit', eq_out, lambda a, b: a == b),
|
| 2060 |
-
]:
|
| 2061 |
-
expected = torch.tensor([1.0 if op(a.item(), b.item()) else 0.0
|
| 2062 |
-
for a, b in zip(comp_a, comp_b)], device=self.device)
|
| 2063 |
-
correct = (out == expected.unsqueeze(1)).float().sum(0)
|
| 2064 |
-
failures = []
|
| 2065 |
-
if pop_size == 1:
|
| 2066 |
-
for i in range(num_tests):
|
| 2067 |
-
if out[i, 0].item() != expected[i].item():
|
| 2068 |
-
failures.append(([int(comp_a[i].item()), int(comp_b[i].item())],
|
| 2069 |
-
expected[i].item(), out[i, 0].item()))
|
| 2070 |
-
self._record(name, int(correct[0].item()), num_tests, failures)
|
| 2071 |
-
if debug:
|
| 2072 |
-
r = self.results[-1]
|
| 2073 |
-
print(f" {r.name}: {r.passed}/{r.total} {'PASS' if r.success else 'FAIL'}")
|
| 2074 |
-
scores += correct
|
| 2075 |
-
total += num_tests
|
| 2076 |
-
|
| 2077 |
-
return scores, total
|
| 2078 |
-
|
| 2079 |
def _test_subtractor_nbits(self, pop: Dict, bits: int, debug: bool) -> Tuple[torch.Tensor, int]:
|
| 2080 |
"""Test N-bit subtractor circuit (A - B)."""
|
| 2081 |
pop_size = next(iter(pop.values())).shape[0]
|
|
@@ -2540,11 +2348,9 @@ class BatchedFitnessEvaluator:
|
|
| 2540 |
def _test_modular(self, pop: Dict, mod: int, debug: bool) -> Tuple[torch.Tensor, int]:
|
| 2541 |
"""Test modular divisibility circuit.
|
| 2542 |
|
| 2543 |
-
|
| 2544 |
-
|
| 2545 |
-
|
| 2546 |
-
2. Single-layer (powers of 2): {prefix}.weight directly applied
|
| 2547 |
-
3. Legacy layer1.geq/leq + layer2.eq + layer3.or (multi-layer non-ternary)
|
| 2548 |
"""
|
| 2549 |
pop_size = next(iter(pop.values())).shape[0]
|
| 2550 |
prefix = f'modular.mod{mod}'
|
|
@@ -2553,7 +2359,7 @@ class BatchedFitnessEvaluator:
|
|
| 2553 |
expected = ((self.mod_test % mod) == 0).float()
|
| 2554 |
out = None
|
| 2555 |
|
| 2556 |
-
#
|
| 2557 |
multiples = list(range(0, 256, mod))
|
| 2558 |
if (multiples
|
| 2559 |
and f'{prefix}.eq.k{multiples[0]}.all.weight' in pop
|
|
@@ -2578,56 +2384,13 @@ class BatchedFitnessEvaluator:
|
|
| 2578 |
except (KeyError, RuntimeError):
|
| 2579 |
out = None
|
| 2580 |
|
| 2581 |
-
#
|
| 2582 |
if out is None:
|
| 2583 |
try:
|
| 2584 |
w = pop[f'{prefix}.weight']
|
| 2585 |
b = pop[f'{prefix}.bias']
|
| 2586 |
out = heaviside(inputs @ w.view(pop_size, -1).T + b.view(pop_size))
|
| 2587 |
except (KeyError, RuntimeError):
|
| 2588 |
-
out = None
|
| 2589 |
-
|
| 2590 |
-
# 3. Legacy multi-layer fallback
|
| 2591 |
-
if out is None:
|
| 2592 |
-
try:
|
| 2593 |
-
geq_outputs = {}
|
| 2594 |
-
leq_outputs = {}
|
| 2595 |
-
i = 0
|
| 2596 |
-
while True:
|
| 2597 |
-
found = False
|
| 2598 |
-
if f'{prefix}.layer1.geq{i}.weight' in pop:
|
| 2599 |
-
w = pop[f'{prefix}.layer1.geq{i}.weight'].view(pop_size, -1)
|
| 2600 |
-
b = pop[f'{prefix}.layer1.geq{i}.bias'].view(pop_size)
|
| 2601 |
-
geq_outputs[i] = heaviside(inputs @ w.T + b)
|
| 2602 |
-
found = True
|
| 2603 |
-
if f'{prefix}.layer1.leq{i}.weight' in pop:
|
| 2604 |
-
w = pop[f'{prefix}.layer1.leq{i}.weight'].view(pop_size, -1)
|
| 2605 |
-
b = pop[f'{prefix}.layer1.leq{i}.bias'].view(pop_size)
|
| 2606 |
-
leq_outputs[i] = heaviside(inputs @ w.T + b)
|
| 2607 |
-
found = True
|
| 2608 |
-
if not found:
|
| 2609 |
-
break
|
| 2610 |
-
i += 1
|
| 2611 |
-
|
| 2612 |
-
if not geq_outputs and not leq_outputs:
|
| 2613 |
-
return torch.zeros(pop_size, device=self.device), 0
|
| 2614 |
-
|
| 2615 |
-
eq_outputs = []
|
| 2616 |
-
i = 0
|
| 2617 |
-
while f'{prefix}.layer2.eq{i}.weight' in pop:
|
| 2618 |
-
w = pop[f'{prefix}.layer2.eq{i}.weight'].view(pop_size, -1)
|
| 2619 |
-
b = pop[f'{prefix}.layer2.eq{i}.bias'].view(pop_size)
|
| 2620 |
-
eq_in = torch.stack([geq_outputs.get(i, torch.zeros(256, pop_size, device=self.device)),
|
| 2621 |
-
leq_outputs.get(i, torch.zeros(256, pop_size, device=self.device))], dim=-1)
|
| 2622 |
-
eq_outputs.append(heaviside((eq_in * w).sum(-1) + b))
|
| 2623 |
-
i += 1
|
| 2624 |
-
if not eq_outputs:
|
| 2625 |
-
return torch.zeros(pop_size, device=self.device), 0
|
| 2626 |
-
eq_stack = torch.stack(eq_outputs, dim=-1)
|
| 2627 |
-
w3 = pop[f'{prefix}.layer3.or.weight'].view(pop_size, -1)
|
| 2628 |
-
b3 = pop[f'{prefix}.layer3.or.bias'].view(pop_size)
|
| 2629 |
-
out = heaviside((eq_stack * w3).sum(-1) + b3)
|
| 2630 |
-
except Exception:
|
| 2631 |
return torch.zeros(pop_size, device=self.device), 0
|
| 2632 |
|
| 2633 |
correct = (out == expected.unsqueeze(1)).float().sum(0)
|
|
|
|
| 1884 |
|
| 1885 |
return scores, total
|
| 1886 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1887 |
def _test_subtractor_nbits(self, pop: Dict, bits: int, debug: bool) -> Tuple[torch.Tensor, int]:
|
| 1888 |
"""Test N-bit subtractor circuit (A - B)."""
|
| 1889 |
pop_size = next(iter(pop.values())).shape[0]
|
|
|
|
| 2348 |
def _test_modular(self, pop: Dict, mod: int, debug: bool) -> Tuple[torch.Tensor, int]:
|
| 2349 |
"""Test modular divisibility circuit.
|
| 2350 |
|
| 2351 |
+
Two structures: mod 3/5/6/7/9/10/11/12 use bit-cascade equality
|
| 2352 |
+
per multiple of N (`{prefix}.eq.k{k}.*` + final OR at `{prefix}`).
|
| 2353 |
+
mod 2/4/8 use a single-layer ternary detector at `{prefix}` directly.
|
|
|
|
|
|
|
| 2354 |
"""
|
| 2355 |
pop_size = next(iter(pop.values())).shape[0]
|
| 2356 |
prefix = f'modular.mod{mod}'
|
|
|
|
| 2359 |
expected = ((self.mod_test % mod) == 0).float()
|
| 2360 |
out = None
|
| 2361 |
|
| 2362 |
+
# Bit-cascade equality structure (non-power-of-2 moduli)
|
| 2363 |
multiples = list(range(0, 256, mod))
|
| 2364 |
if (multiples
|
| 2365 |
and f'{prefix}.eq.k{multiples[0]}.all.weight' in pop
|
|
|
|
| 2384 |
except (KeyError, RuntimeError):
|
| 2385 |
out = None
|
| 2386 |
|
| 2387 |
+
# Single-layer ternary detector (powers of 2)
|
| 2388 |
if out is None:
|
| 2389 |
try:
|
| 2390 |
w = pop[f'{prefix}.weight']
|
| 2391 |
b = pop[f'{prefix}.bias']
|
| 2392 |
out = heaviside(inputs @ w.view(pop_size, -1).T + b.view(pop_size))
|
| 2393 |
except (KeyError, RuntimeError):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2394 |
return torch.zeros(pop_size, device=self.device), 0
|
| 2395 |
|
| 2396 |
correct = (out == expected.unsqueeze(1)).float().sum(0)
|
|
@@ -330,8 +330,22 @@ class GenericThresholdCPU:
|
|
| 330 |
elif opcode == 0x1:
|
| 331 |
result, carry = self.alu.sub8(a, b)
|
| 332 |
overflow = 1 if (((a ^ b) & (a ^ result)) & 0x80) else 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 333 |
elif opcode == 0x7:
|
| 334 |
result = self.alu.mul8(a, b)
|
|
|
|
|
|
|
| 335 |
elif opcode == 0x9:
|
| 336 |
r2, carry = self.alu.sub8(a, b)
|
| 337 |
z = 1 if r2 == 0 else 0
|
|
|
|
| 330 |
elif opcode == 0x1:
|
| 331 |
result, carry = self.alu.sub8(a, b)
|
| 332 |
overflow = 1 if (((a ^ b) & (a ^ result)) & 0x80) else 0
|
| 333 |
+
elif opcode == 0x2: # AND
|
| 334 |
+
result = a & b
|
| 335 |
+
elif opcode == 0x3: # OR
|
| 336 |
+
result = a | b
|
| 337 |
+
elif opcode == 0x4: # XOR
|
| 338 |
+
result = a ^ b
|
| 339 |
+
elif opcode == 0x5: # SHL by 1 (8-bit)
|
| 340 |
+
result = (a << 1) & 0xFF
|
| 341 |
+
carry = 1 if (a & 0x80) else 0
|
| 342 |
+
elif opcode == 0x6: # SHR by 1
|
| 343 |
+
result = a >> 1
|
| 344 |
+
carry = a & 0x1
|
| 345 |
elif opcode == 0x7:
|
| 346 |
result = self.alu.mul8(a, b)
|
| 347 |
+
elif opcode == 0x8: # DIV (sets R[d] = R[d] / R[s]; 0xFF on divide by zero)
|
| 348 |
+
result = (a // b) if b != 0 else 0xFF
|
| 349 |
elif opcode == 0x9:
|
| 350 |
r2, carry = self.alu.sub8(a, b)
|
| 351 |
z = 1 if r2 == 0 else 0
|