Sync packed memory + 16-bit addressing

Browse files

Files changed (17) hide show

.gitattributes +1 -0
README.md +17 -15
cpu/cycle.py +22 -11
cpu/state.py +3 -3
cpu/threshold_cpu.py +435 -0
eval/build_memory.py +69 -28
eval/comprehensive_eval.py +155 -54
eval/cpu_cycle_test.py +44 -13
eval/iron_eval.py +110 -22
neural_computer.safetensors +2 -2
routing.json +0 -0
routing/generate_routing.py +32 -43
routing/routing.json +0 -0
routing/routing_schema.md +88 -23
routing/validate_packed_memory.py +117 -0
tensors.txt +0 -0
todo.md +27 -26

.gitattributes CHANGED Viewed

@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 __pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
 __pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
 eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text

 __pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
 __pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
 eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
+tensors.txt filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -17,8 +17,8 @@ tags:
 Every logic gate is a threshold neuron: `output = 1 if (Σ wᵢxᵢ + b) ≥ 0 else 0`
 ```
-Tensors:    24,200
-Parameters: 40,323
 ```
 ---
@@ -30,7 +30,7 @@ A complete 8-bit processor where every operation—from Boolean logic to arithme
 | Component | Specification |
 |-----------|---------------|
 | Registers | 4 × 8-bit general purpose |
-| Memory | 256 bytes addressable |
 | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
 | Flags | Zero, Negative, Carry, Overflow |
 | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
@@ -90,7 +90,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
 | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
 | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
 | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
-| Memory | 3 | 8-bit addr decoder, 256x8 read mux, write cell update |
 ---
@@ -122,14 +122,14 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
 All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
 ```
-[ PC[8] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[8] | CTRL[4] | MEM[256][8] ]
 ```
 Flags are ordered as: `Z, N, C, V`.
 Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
-Total state size: `2120` bits.
 ---
@@ -145,8 +145,7 @@ opcode  rd      rs    imm8
 Interpretation:
 - **R-type**: `rd = rd op rs` (imm8 ignored).
 - **I-type**: `rd = op rd, imm8` (rs ignored).
-- **Jumps/Calls**: `imm8` is the absolute target address.
-- **LOAD/STORE**: `imm8` is the absolute memory address.
 ---
@@ -185,12 +184,15 @@ All circuits pass exhaustive testing over their full input domains.
 ```
 {category}.{circuit}[.{layer}][.{component}].{weight|bias}
-Examples:
-  boolean.and.weight
-  boolean.xor.layer1.neuron1.weight
-  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
-  modular.mod5.layer2.eq3.weight
-  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
 ```
 ---
@@ -209,7 +211,7 @@ All weights are integers. All activations are Heaviside step. Designed for:
 | File | Description |
 |------|-------------|
-| `neural_computer.safetensors` | 24,200 tensors, 40,323 parameters |
 | `iron_eval.py` | Comprehensive test suite |
 | `prune_weights.py` | Weight optimization tool |

 Every logic gate is a threshold neuron: `output = 1 if (Σ wᵢxᵢ + b) ≥ 0 else 0`
 ```
+Tensors:    6,296
+Parameters: 8,267,667
 ```
 ---
 | Component | Specification |
 |-----------|---------------|
 | Registers | 4 × 8-bit general purpose |
+| Memory | 64KB addressable |
 | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
 | Flags | Zero, Negative, Carry, Overflow |
 | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
 | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
 | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
 | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
+| Memory | 3 | 16-bit addr decoder, 65536x8 read mux, write cell update (packed) |
 ---
 All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
 ```
+[ PC[16] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[16] | CTRL[4] | MEM[65536][8] ]
 ```
 Flags are ordered as: `Z, N, C, V`.
 Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
+Total state size: `524376` bits.
 ---
 Interpretation:
 - **R-type**: `rd = rd op rs` (imm8 ignored).
 - **I-type**: `rd = op rd, imm8` (rs ignored).
+- **Address-extended**: `LOAD`, `STORE`, `JMP`, `JZ`, `CALL` consume the next word as a 16-bit address (big-endian). `imm8` is reserved, and the PC skips 4 bytes when the jump is not taken.
 ---
 ```
 {category}.{circuit}[.{layer}][.{component}].{weight|bias}
+Examples:
+  boolean.and.weight
+  boolean.xor.layer1.neuron1.weight
+  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
+  modular.mod5.layer2.eq3.weight
+  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
+Memory circuits are stored as packed tensors to keep the safetensors header size manageable
+(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).
 ```
 ---
 | File | Description |
 |------|-------------|
+| `neural_computer.safetensors` | 6,296 tensors, 8,267,667 parameters |
 | `iron_eval.py` | Comprehensive test suite |
 | `prune_weights.py` | Weight optimization tool |

cpu/cycle.py CHANGED Viewed

@@ -50,14 +50,22 @@ def step(state: CPUState) -> CPUState:
     # Fetch: two bytes, big-endian
     hi = s.mem[s.pc]
-    lo = s.mem[(s.pc + 1) & 0xFF]
     s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
-    next_pc = (s.pc + 2) & 0xFF
     opcode, rd, rs, imm8 = decode_ir(s.ir)
     a = s.regs[rd]
     b = s.regs[rs]
     write_result = True
     result = a
     carry = 0
@@ -94,23 +102,26 @@ def step(state: CPUState) -> CPUState:
         result, carry, overflow = _alu_sub(a, b)
         write_result = False
     elif opcode == 0xA:  # LOAD
-        result = s.mem[imm8]
     elif opcode == 0xB:  # STORE
-        s.mem[imm8] = b & 0xFF
         write_result = False
     elif opcode == 0xC:  # JMP
-        s.pc = imm8 & 0xFF
         write_result = False
     elif opcode == 0xD:  # JZ
         if s.flags[0] == 1:
-            s.pc = imm8 & 0xFF
         else:
-            s.pc = next_pc
         write_result = False
     elif opcode == 0xE:  # CALL
-        s.sp = (s.sp - 1) & 0xFF
-        s.mem[s.sp] = next_pc
-        s.pc = imm8 & 0xFF
         write_result = False
     elif opcode == 0xF:  # HALT
         s.ctrl[0] = 1
@@ -123,7 +134,7 @@ def step(state: CPUState) -> CPUState:
         s.regs[rd] = result & 0xFF
     if opcode not in (0xC, 0xD, 0xE):
-        s.pc = next_pc
     return s

     # Fetch: two bytes, big-endian
     hi = s.mem[s.pc]
+    lo = s.mem[(s.pc + 1) & 0xFFFF]
     s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
+    next_pc = (s.pc + 2) & 0xFFFF
     opcode, rd, rs, imm8 = decode_ir(s.ir)
     a = s.regs[rd]
     b = s.regs[rs]
+    addr16 = None
+    next_pc_ext = next_pc
+    if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
+        addr_hi = s.mem[next_pc]
+        addr_lo = s.mem[(next_pc + 1) & 0xFFFF]
+        addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
+        next_pc_ext = (next_pc + 2) & 0xFFFF
     write_result = True
     result = a
     carry = 0
         result, carry, overflow = _alu_sub(a, b)
         write_result = False
     elif opcode == 0xA:  # LOAD
+        result = s.mem[addr16]
     elif opcode == 0xB:  # STORE
+        s.mem[addr16] = b & 0xFF
         write_result = False
     elif opcode == 0xC:  # JMP
+        s.pc = addr16 & 0xFFFF
         write_result = False
     elif opcode == 0xD:  # JZ
         if s.flags[0] == 1:
+            s.pc = addr16 & 0xFFFF
         else:
+            s.pc = next_pc_ext
         write_result = False
     elif opcode == 0xE:  # CALL
+        ret_addr = next_pc_ext & 0xFFFF
+        s.sp = (s.sp - 1) & 0xFFFF
+        s.mem[s.sp] = (ret_addr >> 8) & 0xFF
+        s.sp = (s.sp - 1) & 0xFFFF
+        s.mem[s.sp] = ret_addr & 0xFF
+        s.pc = addr16 & 0xFFFF
         write_result = False
     elif opcode == 0xF:  # HALT
         s.ctrl[0] = 1
         s.regs[rd] = result & 0xFF
     if opcode not in (0xC, 0xD, 0xE):
+        s.pc = next_pc_ext
     return s

cpu/state.py CHANGED Viewed

@@ -11,14 +11,14 @@ from typing import List
 FLAG_NAMES = ["Z", "N", "C", "V"]
 CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
-PC_BITS = 8
 IR_BITS = 16
 REG_BITS = 8
 REG_COUNT = 4
 FLAG_BITS = 4
-SP_BITS = 8
 CTRL_BITS = 4
-MEM_BYTES = 256
 MEM_BITS = MEM_BYTES * 8
 STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS

 FLAG_NAMES = ["Z", "N", "C", "V"]
 CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
+PC_BITS = 16
 IR_BITS = 16
 REG_BITS = 8
 REG_COUNT = 4
 FLAG_BITS = 4
+SP_BITS = 16
 CTRL_BITS = 4
+MEM_BYTES = 65536
 MEM_BITS = MEM_BYTES * 8
 STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS

cpu/threshold_cpu.py ADDED Viewed

	@@ -0,0 +1,435 @@

+"""
+Threshold-weight runtime for the 8-bit CPU.
+Implements a reference cycle using the frozen circuit weights for core ALU ops.
+"""
+from __future__ import annotations
+from pathlib import Path
+from typing import List, Tuple
+import torch
+from safetensors.torch import load_file
+from .state import CPUState, pack_state, unpack_state, REG_BITS, PC_BITS, MEM_BYTES
+def heaviside(x: torch.Tensor) -> torch.Tensor:
+    return (x >= 0).float()
+def int_to_bits_msb(value: int, width: int) -> List[int]:
+    return [(value >> (width - 1 - i)) & 1 for i in range(width)]
+def bits_to_int_msb(bits: List[int]) -> int:
+    value = 0
+    for bit in bits:
+        value = (value << 1) | int(bit)
+    return value
+def bits_msb_to_lsb(bits: List[int]) -> List[int]:
+    return list(reversed(bits))
+DEFAULT_MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
+class ThresholdALU:
+    def __init__(self, model_path: str, device: str = "cpu") -> None:
+        self.device = device
+        self.tensors = {k: v.float().to(device) for k, v in load_file(model_path).items()}
+    def _get(self, name: str) -> torch.Tensor:
+        return self.tensors[name]
+    def _eval_gate(self, weight_key: str, bias_key: str, inputs: List[float]) -> float:
+        w = self._get(weight_key)
+        b = self._get(bias_key)
+        inp = torch.tensor(inputs, device=self.device)
+        return heaviside((inp * w).sum() + b).item()
+    def _eval_xor(self, prefix: str, inputs: List[float]) -> float:
+        inp = torch.tensor(inputs, device=self.device)
+        w_or = self._get(f"{prefix}.layer1.or.weight")
+        b_or = self._get(f"{prefix}.layer1.or.bias")
+        w_nand = self._get(f"{prefix}.layer1.nand.weight")
+        b_nand = self._get(f"{prefix}.layer1.nand.bias")
+        w2 = self._get(f"{prefix}.layer2.weight")
+        b2 = self._get(f"{prefix}.layer2.bias")
+        h_or = heaviside((inp * w_or).sum() + b_or).item()
+        h_nand = heaviside((inp * w_nand).sum() + b_nand).item()
+        hidden = torch.tensor([h_or, h_nand], device=self.device)
+        return heaviside((hidden * w2).sum() + b2).item()
+    def _eval_full_adder(self, prefix: str, a: float, b: float, cin: float) -> Tuple[float, float]:
+        ha1_sum = self._eval_xor(f"{prefix}.ha1.sum", [a, b])
+        ha1_carry = self._eval_gate(f"{prefix}.ha1.carry.weight", f"{prefix}.ha1.carry.bias", [a, b])
+        ha2_sum = self._eval_xor(f"{prefix}.ha2.sum", [ha1_sum, cin])
+        ha2_carry = self._eval_gate(
+            f"{prefix}.ha2.carry.weight", f"{prefix}.ha2.carry.bias", [ha1_sum, cin]
+        )
+        cout = self._eval_gate(f"{prefix}.carry_or.weight", f"{prefix}.carry_or.bias", [ha1_carry, ha2_carry])
+        return ha2_sum, cout
+    def add(self, a: int, b: int) -> Tuple[int, int, int]:
+        a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
+        b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
+        carry = 0.0
+        sum_bits: List[int] = []
+        for bit in range(REG_BITS):
+            sum_bit, carry = self._eval_full_adder(
+                f"arithmetic.ripplecarry8bit.fa{bit}", float(a_bits[bit]), float(b_bits[bit]), carry
+            )
+            sum_bits.append(int(sum_bit))
+        result = bits_to_int_msb(list(reversed(sum_bits)))
+        carry_out = int(carry)
+        overflow = 1 if (((a ^ result) & (b ^ result)) & 0x80) else 0
+        return result, carry_out, overflow
+    def sub(self, a: int, b: int) -> Tuple[int, int, int]:
+        a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
+        b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
+        carry = 1.0  # two's complement carry-in
+        sum_bits: List[int] = []
+        for bit in range(REG_BITS):
+            notb = self._eval_gate(
+                f"arithmetic.sub8bit.notb{bit}.weight",
+                f"arithmetic.sub8bit.notb{bit}.bias",
+                [float(b_bits[bit])],
+            )
+            xor1 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor1", [float(a_bits[bit]), notb])
+            xor2 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor2", [xor1, carry])
+            and1 = self._eval_gate(
+                f"arithmetic.sub8bit.fa{bit}.and1.weight",
+                f"arithmetic.sub8bit.fa{bit}.and1.bias",
+                [float(a_bits[bit]), notb],
+            )
+            and2 = self._eval_gate(
+                f"arithmetic.sub8bit.fa{bit}.and2.weight",
+                f"arithmetic.sub8bit.fa{bit}.and2.bias",
+                [xor1, carry],
+            )
+            carry = self._eval_gate(
+                f"arithmetic.sub8bit.fa{bit}.or_carry.weight",
+                f"arithmetic.sub8bit.fa{bit}.or_carry.bias",
+                [and1, and2],
+            )
+            sum_bits.append(int(xor2))
+        result = bits_to_int_msb(list(reversed(sum_bits)))
+        carry_out = int(carry)
+        overflow = 1 if (((a ^ b) & (a ^ result)) & 0x80) else 0
+        return result, carry_out, overflow
+    def bitwise_and(self, a: int, b: int) -> int:
+        a_bits = int_to_bits_msb(a, REG_BITS)
+        b_bits = int_to_bits_msb(b, REG_BITS)
+        w = self._get("alu.alu8bit.and.weight")
+        bias = self._get("alu.alu8bit.and.bias")
+        out_bits = []
+        for bit in range(REG_BITS):
+            inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
+            out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
+            out_bits.append(int(out))
+        return bits_to_int_msb(out_bits)
+    def bitwise_or(self, a: int, b: int) -> int:
+        a_bits = int_to_bits_msb(a, REG_BITS)
+        b_bits = int_to_bits_msb(b, REG_BITS)
+        w = self._get("alu.alu8bit.or.weight")
+        bias = self._get("alu.alu8bit.or.bias")
+        out_bits = []
+        for bit in range(REG_BITS):
+            inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
+            out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
+            out_bits.append(int(out))
+        return bits_to_int_msb(out_bits)
+    def bitwise_not(self, a: int) -> int:
+        a_bits = int_to_bits_msb(a, REG_BITS)
+        w = self._get("alu.alu8bit.not.weight")
+        bias = self._get("alu.alu8bit.not.bias")
+        out_bits = []
+        for bit in range(REG_BITS):
+            inp = torch.tensor([float(a_bits[bit])], device=self.device)
+            out = heaviside((inp * w[bit]).sum() + bias[bit]).item()
+            out_bits.append(int(out))
+        return bits_to_int_msb(out_bits)
+    def bitwise_xor(self, a: int, b: int) -> int:
+        a_bits = int_to_bits_msb(a, REG_BITS)
+        b_bits = int_to_bits_msb(b, REG_BITS)
+        w_or = self._get("alu.alu8bit.xor.layer1.or.weight")
+        b_or = self._get("alu.alu8bit.xor.layer1.or.bias")
+        w_nand = self._get("alu.alu8bit.xor.layer1.nand.weight")
+        b_nand = self._get("alu.alu8bit.xor.layer1.nand.bias")
+        w2 = self._get("alu.alu8bit.xor.layer2.weight")
+        b2 = self._get("alu.alu8bit.xor.layer2.bias")
+        out_bits = []
+        for bit in range(REG_BITS):
+            inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
+            h_or = heaviside((inp * w_or[bit * 2:bit * 2 + 2]).sum() + b_or[bit])
+            h_nand = heaviside((inp * w_nand[bit * 2:bit * 2 + 2]).sum() + b_nand[bit])
+            hidden = torch.stack([h_or, h_nand])
+            out = heaviside((hidden * w2[bit * 2:bit * 2 + 2]).sum() + b2[bit]).item()
+            out_bits.append(int(out))
+        return bits_to_int_msb(out_bits)
+class ThresholdCPU:
+    def __init__(self, model_path: str | Path = DEFAULT_MODEL_PATH, device: str = "cpu") -> None:
+        self.device = device
+        self.alu = ThresholdALU(str(model_path), device=device)
+    @staticmethod
+    def decode_ir(ir: int) -> Tuple[int, int, int, int]:
+        opcode = (ir >> 12) & 0xF
+        rd = (ir >> 10) & 0x3
+        rs = (ir >> 8) & 0x3
+        imm8 = ir & 0xFF
+        return opcode, rd, rs, imm8
+    @staticmethod
+    def flags_from_result(result: int, carry: int, overflow: int) -> List[int]:
+        z = 1 if result == 0 else 0
+        n = 1 if (result & 0x80) else 0
+        c = 1 if carry else 0
+        v = 1 if overflow else 0
+        return [z, n, c, v]
+    def _addr_decode(self, addr: int) -> torch.Tensor:
+        bits = torch.tensor(int_to_bits_msb(addr, PC_BITS), device=self.device, dtype=torch.float32)
+        w = self.alu._get("memory.addr_decode.weight")
+        b = self.alu._get("memory.addr_decode.bias")
+        return heaviside((w * bits).sum(dim=1) + b)
+    def _memory_read(self, mem: List[int], addr: int) -> int:
+        sel = self._addr_decode(addr)
+        mem_bits = torch.tensor(
+            [int_to_bits_msb(byte, REG_BITS) for byte in mem],
+            device=self.device,
+            dtype=torch.float32,
+        )
+        and_w = self.alu._get("memory.read.and.weight")
+        and_b = self.alu._get("memory.read.and.bias")
+        or_w = self.alu._get("memory.read.or.weight")
+        or_b = self.alu._get("memory.read.or.bias")
+        out_bits: List[int] = []
+        for bit in range(REG_BITS):
+            inp = torch.stack([mem_bits[:, bit], sel], dim=1)
+            and_out = heaviside((inp * and_w[bit]).sum(dim=1) + and_b[bit])
+            out_bit = heaviside((and_out * or_w[bit]).sum() + or_b[bit]).item()
+            out_bits.append(int(out_bit))
+        return bits_to_int_msb(out_bits)
+    def _memory_write(self, mem: List[int], addr: int, value: int) -> List[int]:
+        sel = self._addr_decode(addr)
+        data_bits = torch.tensor(int_to_bits_msb(value, REG_BITS), device=self.device, dtype=torch.float32)
+        mem_bits = torch.tensor(
+            [int_to_bits_msb(byte, REG_BITS) for byte in mem],
+            device=self.device,
+            dtype=torch.float32,
+        )
+        sel_w = self.alu._get("memory.write.sel.weight")
+        sel_b = self.alu._get("memory.write.sel.bias")
+        nsel_w = self.alu._get("memory.write.nsel.weight").squeeze(1)
+        nsel_b = self.alu._get("memory.write.nsel.bias")
+        and_old_w = self.alu._get("memory.write.and_old.weight")
+        and_old_b = self.alu._get("memory.write.and_old.bias")
+        and_new_w = self.alu._get("memory.write.and_new.weight")
+        and_new_b = self.alu._get("memory.write.and_new.bias")
+        or_w = self.alu._get("memory.write.or.weight")
+        or_b = self.alu._get("memory.write.or.bias")
+        we = torch.ones_like(sel)
+        sel_inp = torch.stack([sel, we], dim=1)
+        write_sel = heaviside((sel_inp * sel_w).sum(dim=1) + sel_b)
+        nsel = heaviside((write_sel * nsel_w) + nsel_b)
+        new_mem_bits = torch.zeros((MEM_BYTES, REG_BITS), device=self.device)
+        for bit in range(REG_BITS):
+            old_bit = mem_bits[:, bit]
+            data_bit = data_bits[bit].expand(MEM_BYTES)
+            inp_old = torch.stack([old_bit, nsel], dim=1)
+            inp_new = torch.stack([data_bit, write_sel], dim=1)
+            and_old = heaviside((inp_old * and_old_w[:, bit]).sum(dim=1) + and_old_b[:, bit])
+            and_new = heaviside((inp_new * and_new_w[:, bit]).sum(dim=1) + and_new_b[:, bit])
+            or_inp = torch.stack([and_old, and_new], dim=1)
+            out_bit = heaviside((or_inp * or_w[:, bit]).sum(dim=1) + or_b[:, bit])
+            new_mem_bits[:, bit] = out_bit
+        return [bits_to_int_msb([int(b) for b in new_mem_bits[i].tolist()]) for i in range(MEM_BYTES)]
+    def _conditional_jump_byte(self, prefix: str, pc_byte: int, target_byte: int, flag: int) -> int:
+        pc_bits = int_to_bits_msb(pc_byte, REG_BITS)
+        target_bits = int_to_bits_msb(target_byte, REG_BITS)
+        out_bits: List[int] = []
+        for bit in range(REG_BITS):
+            not_sel = self.alu._eval_gate(
+                f"{prefix}.bit{bit}.not_sel.weight",
+                f"{prefix}.bit{bit}.not_sel.bias",
+                [float(flag)],
+            )
+            and_a = self.alu._eval_gate(
+                f"{prefix}.bit{bit}.and_a.weight",
+                f"{prefix}.bit{bit}.and_a.bias",
+                [float(pc_bits[bit]), not_sel],
+            )
+            and_b = self.alu._eval_gate(
+                f"{prefix}.bit{bit}.and_b.weight",
+                f"{prefix}.bit{bit}.and_b.bias",
+                [float(target_bits[bit]), float(flag)],
+            )
+            out_bit = self.alu._eval_gate(
+                f"{prefix}.bit{bit}.or.weight",
+                f"{prefix}.bit{bit}.or.bias",
+                [and_a, and_b],
+            )
+            out_bits.append(int(out_bit))
+        return bits_to_int_msb(out_bits)
+    def step(self, state: CPUState) -> CPUState:
+        if state.ctrl[0] == 1:  # HALT
+            return state.copy()
+        s = state.copy()
+        # Fetch: two bytes, big-endian
+        hi = self._memory_read(s.mem, s.pc)
+        lo = self._memory_read(s.mem, (s.pc + 1) & 0xFFFF)
+        s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
+        next_pc = (s.pc + 2) & 0xFFFF
+        opcode, rd, rs, imm8 = self.decode_ir(s.ir)
+        a = s.regs[rd]
+        b = s.regs[rs]
+        addr16 = None
+        next_pc_ext = next_pc
+        if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
+            addr_hi = self._memory_read(s.mem, next_pc)
+            addr_lo = self._memory_read(s.mem, (next_pc + 1) & 0xFFFF)
+            addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
+            next_pc_ext = (next_pc + 2) & 0xFFFF
+        write_result = True
+        result = a
+        carry = 0
+        overflow = 0
+        if opcode == 0x0:  # ADD
+            result, carry, overflow = self.alu.add(a, b)
+        elif opcode == 0x1:  # SUB
+            result, carry, overflow = self.alu.sub(a, b)
+        elif opcode == 0x2:  # AND
+            result = self.alu.bitwise_and(a, b)
+        elif opcode == 0x3:  # OR
+            result = self.alu.bitwise_or(a, b)
+        elif opcode == 0x4:  # XOR
+            result = self.alu.bitwise_xor(a, b)
+        elif opcode == 0x5:  # SHL
+            carry = 1 if (a & 0x80) else 0
+            result = (a << 1) & 0xFF
+        elif opcode == 0x6:  # SHR
+            carry = 1 if (a & 0x01) else 0
+            result = (a >> 1) & 0xFF
+        elif opcode == 0x7:  # MUL
+            full = a * b
+            result = full & 0xFF
+            carry = 1 if full > 0xFF else 0
+        elif opcode == 0x8:  # DIV
+            if b == 0:
+                result = 0
+                carry = 1
+                overflow = 1
+            else:
+                result = (a // b) & 0xFF
+        elif opcode == 0x9:  # CMP
+            result, carry, overflow = self.alu.sub(a, b)
+            write_result = False
+        elif opcode == 0xA:  # LOAD
+            result = self._memory_read(s.mem, addr16)
+        elif opcode == 0xB:  # STORE
+            s.mem = self._memory_write(s.mem, addr16, b & 0xFF)
+            write_result = False
+        elif opcode == 0xC:  # JMP
+            s.pc = addr16 & 0xFFFF
+            write_result = False
+        elif opcode == 0xD:  # JZ
+            hi_pc = self._conditional_jump_byte(
+                "control.jz",
+                (next_pc_ext >> 8) & 0xFF,
+                (addr16 >> 8) & 0xFF,
+                s.flags[0],
+            )
+            lo_pc = self._conditional_jump_byte(
+                "control.jz",
+                next_pc_ext & 0xFF,
+                addr16 & 0xFF,
+                s.flags[0],
+            )
+            s.pc = ((hi_pc & 0xFF) << 8) | (lo_pc & 0xFF)
+            write_result = False
+        elif opcode == 0xE:  # CALL
+            ret_addr = next_pc_ext & 0xFFFF
+            s.sp = (s.sp - 1) & 0xFFFF
+            s.mem = self._memory_write(s.mem, s.sp, (ret_addr >> 8) & 0xFF)
+            s.sp = (s.sp - 1) & 0xFFFF
+            s.mem = self._memory_write(s.mem, s.sp, ret_addr & 0xFF)
+            s.pc = addr16 & 0xFFFF
+            write_result = False
+        elif opcode == 0xF:  # HALT
+            s.ctrl[0] = 1
+            write_result = False
+        if opcode <= 0x9 or opcode == 0xA:
+            s.flags = self.flags_from_result(result, carry, overflow)
+        if write_result:
+            s.regs[rd] = result & 0xFF
+        if opcode not in (0xC, 0xD, 0xE):
+            s.pc = next_pc_ext
+        return s
+    def run_until_halt(self, state: CPUState, max_cycles: int = 256) -> Tuple[CPUState, int]:
+        s = state.copy()
+        for i in range(max_cycles):
+            if s.ctrl[0] == 1:
+                return s, i
+            s = self.step(s)
+        return s, max_cycles
+    def forward(self, state_bits: torch.Tensor, max_cycles: int = 256) -> torch.Tensor:
+        bits_list = [int(b) for b in state_bits.detach().cpu().flatten().tolist()]
+        state = unpack_state(bits_list)
+        final, _ = self.run_until_halt(state, max_cycles=max_cycles)
+        return torch.tensor(pack_state(final), dtype=torch.float32)

eval/build_memory.py CHANGED Viewed

@@ -1,5 +1,5 @@
 """
-Generate memory and fetch/load/store buffers for the 8-bit threshold computer.
 Updates neural_computer.safetensors and tensors.txt in-place.
 """
@@ -16,6 +16,9 @@ from safetensors.torch import save_file
 MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
 MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
 def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
     tensors: Dict[str, torch.Tensor] = {}
@@ -34,32 +37,59 @@ def add_gate(tensors: Dict[str, torch.Tensor], name: str, weight: Iterable[float
     tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
-def add_decoder_8to256(tensors: Dict[str, torch.Tensor]) -> None:
-    for addr in range(256):
-        bits = [(addr >> (7 - i)) & 1 for i in range(8)]  # MSB-first
-        weights = [1.0 if bit == 1 else -1.0 for bit in bits]
-        bias = -float(sum(bits))
-        add_gate(tensors, f"memory.addr_decode.out{addr}", weights, [bias])
 def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
-    # AND each mem bit with its address select, then OR across all addresses.
-    for bit in range(8):
-        for addr in range(256):
-            add_gate(tensors, f"memory.read.bit{bit}.and{addr}", [1.0, 1.0], [-2.0])
-        add_gate(tensors, f"memory.read.bit{bit}.or", [1.0] * 256, [-1.0])
 def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
-    # write_sel = addr_select AND write_enable
-    # new_bit = (NOT write_sel AND old_bit) OR (write_sel AND write_data_bit)
-    for addr in range(256):
-        add_gate(tensors, f"memory.write.sel.addr{addr}", [1.0, 1.0], [-2.0])
-        add_gate(tensors, f"memory.write.nsel.addr{addr}", [-1.0], [0.0])
-        for bit in range(8):
-            add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.and_old", [1.0, 1.0], [-2.0])
-            add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.and_new", [1.0, 1.0], [-2.0])
-            add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.or", [1.0, 1.0], [-1.0])
 def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
@@ -69,16 +99,15 @@ def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
     for bit in range(8):
         add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
         add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
         add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
 def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
-    # Bump manifest version to reflect memory integration.
-    key = "manifest.version"
-    if key not in tensors:
-        tensors[key] = torch.tensor([2.0], dtype=torch.float32)
-        return
-    tensors[key] = torch.tensor([2.0], dtype=torch.float32)
 def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
@@ -94,7 +123,19 @@ def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
 def main() -> None:
     tensors = load_tensors(MODEL_PATH)
-    add_decoder_8to256(tensors)
     add_memory_read_mux(tensors)
     add_memory_write_cells(tensors)
     add_fetch_load_store_buffers(tensors)

 """
+Generate 64KB memory circuits and fetch/load/store buffers for the 8-bit threshold computer.
 Updates neural_computer.safetensors and tensors.txt in-place.
 """
 MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
 MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
+ADDR_BITS = 16
+MEM_BYTES = 1 << ADDR_BITS
 def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
     tensors: Dict[str, torch.Tensor] = {}
     tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
+def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None:
+    for key in list(tensors.keys()):
+        if any(key.startswith(prefix) for prefix in prefixes):
+            del tensors[key]
+def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
+    weights = torch.empty((MEM_BYTES, ADDR_BITS), dtype=torch.float32)
+    bias = torch.empty((MEM_BYTES,), dtype=torch.float32)
+    for addr in range(MEM_BYTES):
+        bits = [(addr >> (ADDR_BITS - 1 - i)) & 1 for i in range(ADDR_BITS)]  # MSB-first
+        weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
+        bias[addr] = -float(sum(bits))
+    tensors["memory.addr_decode.weight"] = weights
+    tensors["memory.addr_decode.bias"] = bias
 def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
+    # Packed AND/OR weights for read mux.
+    and_weight = torch.ones((8, MEM_BYTES, 2), dtype=torch.float32)
+    and_bias = torch.full((8, MEM_BYTES), -2.0, dtype=torch.float32)
+    or_weight = torch.ones((8, MEM_BYTES), dtype=torch.float32)
+    or_bias = torch.full((8,), -1.0, dtype=torch.float32)
+    tensors["memory.read.and.weight"] = and_weight
+    tensors["memory.read.and.bias"] = and_bias
+    tensors["memory.read.or.weight"] = or_weight
+    tensors["memory.read.or.bias"] = or_bias
 def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
+    # Packed write gate weights.
+    sel_weight = torch.ones((MEM_BYTES, 2), dtype=torch.float32)
+    sel_bias = torch.full((MEM_BYTES,), -2.0, dtype=torch.float32)
+    nsel_weight = torch.full((MEM_BYTES, 1), -1.0, dtype=torch.float32)
+    nsel_bias = torch.zeros((MEM_BYTES,), dtype=torch.float32)
+    and_old_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
+    and_old_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
+    and_new_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
+    and_new_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
+    or_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
+    or_bias = torch.full((MEM_BYTES, 8), -1.0, dtype=torch.float32)
+    tensors["memory.write.sel.weight"] = sel_weight
+    tensors["memory.write.sel.bias"] = sel_bias
+    tensors["memory.write.nsel.weight"] = nsel_weight
+    tensors["memory.write.nsel.bias"] = nsel_bias
+    tensors["memory.write.and_old.weight"] = and_old_weight
+    tensors["memory.write.and_old.bias"] = and_old_bias
+    tensors["memory.write.and_new.weight"] = and_new_weight
+    tensors["memory.write.and_new.bias"] = and_new_bias
+    tensors["memory.write.or.weight"] = or_weight
+    tensors["memory.write.or.bias"] = or_bias
 def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
     for bit in range(8):
         add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
         add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
+    for bit in range(ADDR_BITS):
         add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
 def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
+    # Update manifest constants to reflect 16-bit address space.
+    tensors["manifest.memory_bytes"] = torch.tensor([float(MEM_BYTES)], dtype=torch.float32)
+    tensors["manifest.pc_width"] = torch.tensor([float(ADDR_BITS)], dtype=torch.float32)
+    tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
 def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
 def main() -> None:
     tensors = load_tensors(MODEL_PATH)
+    drop_prefixes(
+        tensors,
+        [
+            "memory.addr_decode.",
+            "memory.read.",
+            "memory.write.",
+            "control.fetch.ir.",
+            "control.load.",
+            "control.store.",
+            "control.mem_addr.",
+        ],
+    )
+    add_decoder(tensors)
     add_memory_read_mux(tensors)
     add_memory_write_cells(tensors)
     add_fetch_load_store_buffers(tensors)

eval/comprehensive_eval.py CHANGED Viewed

@@ -1900,12 +1900,12 @@ class CircuitEvaluator:
             ('manifest.alu_operations', 16),
             ('manifest.flags', 4),
             ('manifest.instruction_width', 16),
-            ('manifest.memory_bytes', 256),
-            ('manifest.pc_width', 8),
             ('manifest.register_width', 8),
             ('manifest.registers', 4),
             ('manifest.turing_complete', 1),
-            ('manifest.version', 2),
         ]
         failures = []
@@ -2200,61 +2200,79 @@ class CircuitEvaluator:
     # MEMORY CIRCUITS
     # =========================================================================
-    def test_memory_decoder_8to256(self) -> TestResult:
-        """Test 8-to-256 address decoder exhaustively."""
         failures = []
         passed = 0
-        total = 256 * 256
-        for addr in range(256):
-            addr_bits = torch.tensor([(addr >> (7 - i)) & 1 for i in range(8)],
                                      device=self.device, dtype=torch.float32)
-            for out_idx in range(256):
-                w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
-                b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
-                output = heaviside((addr_bits * w).sum() + b).item()
-                expected = 1.0 if out_idx == addr else 0.0
-                if output == expected:
-                    passed += 1
-                elif len(failures) < 20:
-                    failures.append(((addr, out_idx), expected, output))
         return TestResult('memory.addr_decode', passed, total, failures)
     def test_memory_read_mux(self) -> TestResult:
-        """Test 256-byte memory read mux for a few representative addresses."""
         failures = []
         passed = 0
         total = 0
-        mem = [(addr * 37) & 0xFF for addr in range(256)]
-        test_addrs = [0, 1, 2, 127, 255]
         for addr in test_addrs:
-            addr_bits = torch.tensor([(addr >> (7 - i)) & 1 for i in range(8)],
                                      device=self.device, dtype=torch.float32)
             selects = []
-            for out_idx in range(256):
-                w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
-                b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
-                selects.append(heaviside((addr_bits * w).sum() + b).item())
             for bit in range(8):
                 and_vals = []
-                for out_idx in range(256):
                     mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
                     inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
-                    w = self.reg.get(f'memory.read.bit{bit}.and{out_idx}.weight')
-                    b = self.reg.get(f'memory.read.bit{bit}.and{out_idx}.bias')
                     and_vals.append(heaviside((inp * w).sum() + b).item())
                 or_inp = torch.tensor(and_vals, device=self.device)
-                w_or = self.reg.get(f'memory.read.bit{bit}.or.weight')
-                b_or = self.reg.get(f'memory.read.bit{bit}.or.bias')
-                output = heaviside((or_inp * w_or).sum() + b_or).item()
                 expected = float((mem[addr] >> (7 - bit)) & 1)
                 total += 1
@@ -2271,49 +2289,58 @@ class CircuitEvaluator:
         passed = 0
         total = 0
-        mem = [(addr * 13 + 7) & 0xFF for addr in range(256)]
         test_cases = [
             (0xA5, 42, 1.0),
-            (0x3C, 200, 0.0),
         ]
         for write_data, write_addr, write_en in test_cases:
-            addr_bits = torch.tensor([(write_addr >> (7 - i)) & 1 for i in range(8)],
                                      device=self.device, dtype=torch.float32)
-            decodes = []
-            for out_idx in range(256):
-                w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
-                b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
-                decodes.append(heaviside((addr_bits * w).sum() + b).item())
-            for out_idx in range(256):
                 sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
-                w_sel = self.reg.get(f'memory.write.sel.addr{out_idx}.weight')
-                b_sel = self.reg.get(f'memory.write.sel.addr{out_idx}.bias')
-                sel = heaviside((sel_inp * w_sel).sum() + b_sel).item()
-                w_nsel = self.reg.get(f'memory.write.nsel.addr{out_idx}.weight')
-                b_nsel = self.reg.get(f'memory.write.nsel.addr{out_idx}.bias')
-                nsel = heaviside(sel * w_nsel + b_nsel).item()
                 for bit in range(8):
                     old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
                     data_bit = float((write_data >> (7 - bit)) & 1)
                     inp_old = torch.tensor([old_bit, nsel], device=self.device)
-                    w_old = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_old.weight')
-                    b_old = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_old.bias')
                     and_old = heaviside((inp_old * w_old).sum() + b_old).item()
                     inp_new = torch.tensor([data_bit, sel], device=self.device)
-                    w_new = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_new.weight')
-                    b_new = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_new.bias')
                     and_new = heaviside((inp_new * w_new).sum() + b_new).item()
                     inp_or = torch.tensor([and_old, and_new], device=self.device)
-                    w_or = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.or.weight')
-                    b_or = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.or.bias')
                     output = heaviside((inp_or * w_or).sum() + b_or).item()
                     expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
@@ -2339,15 +2366,88 @@ class CircuitEvaluator:
                 passed += 2
         for bit in range(8):
-            for name in ['control.load', 'control.store', 'control.mem_addr']:
                 total += 2
                 if self.reg.has(f'{name}.bit{bit}.weight'):
                     self.reg.get(f'{name}.bit{bit}.weight')
                     self.reg.get(f'{name}.bit{bit}.bias')
                     passed += 2
         return TestResult('control.fetch_load_store', passed, total, [])
     # =========================================================================
     # ARITHMETIC - ADDITIONAL CIRCUITS
     # =========================================================================
@@ -3010,10 +3110,11 @@ class ComprehensiveEvaluator:
         # Memory
         if verbose:
             print("\n=== MEMORY ===")
-        self._run_test(self.evaluator.test_memory_decoder_8to256, verbose)
         self._run_test(self.evaluator.test_memory_read_mux, verbose)
         self._run_test(self.evaluator.test_memory_write_cells, verbose)
         self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
         # Error detection
         if verbose:

             ('manifest.alu_operations', 16),
             ('manifest.flags', 4),
             ('manifest.instruction_width', 16),
+            ('manifest.memory_bytes', 65536),
+            ('manifest.pc_width', 16),
             ('manifest.register_width', 8),
             ('manifest.registers', 4),
             ('manifest.turing_complete', 1),
+            ('manifest.version', 3),
         ]
         failures = []
     # MEMORY CIRCUITS
     # =========================================================================
+    def test_memory_decoder_16to65536(self) -> TestResult:
+        """Test 16-to-65536 address decoder with full-address coverage."""
         failures = []
         passed = 0
+        mem_size = 1 << 16
+        total = mem_size * 2
+        w_all = self.reg.get('memory.addr_decode.weight')
+        b_all = self.reg.get('memory.addr_decode.bias')
+        for addr in range(mem_size):
+            addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
                                      device=self.device, dtype=torch.float32)
+            out_idx = addr
+            w = w_all[out_idx]
+            b = b_all[out_idx]
+            output = heaviside((addr_bits * w).sum() + b).item()
+            expected = 1.0
+            if output == expected:
+                passed += 1
+            elif len(failures) < 20:
+                failures.append(((addr, out_idx), expected, output))
+            out_idx = (addr + 1) & 0xFFFF
+            w = w_all[out_idx]
+            b = b_all[out_idx]
+            output = heaviside((addr_bits * w).sum() + b).item()
+            expected = 0.0
+            if output == expected:
+                passed += 1
+            elif len(failures) < 20:
+                failures.append(((addr, out_idx), expected, output))
         return TestResult('memory.addr_decode', passed, total, failures)
     def test_memory_read_mux(self) -> TestResult:
+        """Test 64KB memory read mux for a few representative addresses."""
         failures = []
         passed = 0
         total = 0
+        mem_size = 1 << 16
+        mem = [(addr * 37) & 0xFF for addr in range(mem_size)]
+        test_addrs = [0x0000, 0x1234, 0xFFFF]
+        dec_w = self.reg.get('memory.addr_decode.weight')
+        dec_b = self.reg.get('memory.addr_decode.bias')
+        and_w = self.reg.get('memory.read.and.weight')
+        and_b = self.reg.get('memory.read.and.bias')
+        or_w = self.reg.get('memory.read.or.weight')
+        or_b = self.reg.get('memory.read.or.bias')
         for addr in test_addrs:
+            addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
                                      device=self.device, dtype=torch.float32)
             selects = []
+            for out_idx in range(mem_size):
+                output = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
+                selects.append(output)
             for bit in range(8):
                 and_vals = []
+                for out_idx in range(mem_size):
                     mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
                     inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
+                    w = and_w[bit, out_idx]
+                    b = and_b[bit, out_idx]
                     and_vals.append(heaviside((inp * w).sum() + b).item())
                 or_inp = torch.tensor(and_vals, device=self.device)
+                output = heaviside((or_inp * or_w[bit]).sum() + or_b[bit]).item()
                 expected = float((mem[addr] >> (7 - bit)) & 1)
                 total += 1
         passed = 0
         total = 0
+        mem_size = 1 << 16
+        mem = [(addr * 13 + 7) & 0xFF for addr in range(mem_size)]
         test_cases = [
             (0xA5, 42, 1.0),
+            (0x3C, 0xBEEF, 0.0),
         ]
+        dec_w = self.reg.get('memory.addr_decode.weight')
+        dec_b = self.reg.get('memory.addr_decode.bias')
+        sel_w = self.reg.get('memory.write.sel.weight')
+        sel_b = self.reg.get('memory.write.sel.bias')
+        nsel_w = self.reg.get('memory.write.nsel.weight')
+        nsel_b = self.reg.get('memory.write.nsel.bias')
+        and_old_w = self.reg.get('memory.write.and_old.weight')
+        and_old_b = self.reg.get('memory.write.and_old.bias')
+        and_new_w = self.reg.get('memory.write.and_new.weight')
+        and_new_b = self.reg.get('memory.write.and_new.bias')
+        or_w = self.reg.get('memory.write.or.weight')
+        or_b = self.reg.get('memory.write.or.bias')
         for write_data, write_addr, write_en in test_cases:
+            addr_bits = torch.tensor([(write_addr >> (15 - i)) & 1 for i in range(16)],
                                      device=self.device, dtype=torch.float32)
+            sample_addrs = [write_addr, (write_addr + 1) & 0xFFFF, 0x0000, 0xFFFF]
+            decodes = {}
+            for out_idx in sample_addrs:
+                decodes[out_idx] = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
+            for out_idx in sample_addrs:
                 sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
+                sel = heaviside((sel_inp * sel_w[out_idx]).sum() + sel_b[out_idx]).item()
+                nsel = heaviside(sel * nsel_w[out_idx] + nsel_b[out_idx]).item()
                 for bit in range(8):
                     old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
                     data_bit = float((write_data >> (7 - bit)) & 1)
                     inp_old = torch.tensor([old_bit, nsel], device=self.device)
+                    w_old = and_old_w[out_idx, bit]
+                    b_old = and_old_b[out_idx, bit]
                     and_old = heaviside((inp_old * w_old).sum() + b_old).item()
                     inp_new = torch.tensor([data_bit, sel], device=self.device)
+                    w_new = and_new_w[out_idx, bit]
+                    b_new = and_new_b[out_idx, bit]
                     and_new = heaviside((inp_new * w_new).sum() + b_new).item()
                     inp_or = torch.tensor([and_old, and_new], device=self.device)
+                    w_or = or_w[out_idx, bit]
+                    b_or = or_b[out_idx, bit]
                     output = heaviside((inp_or * w_or).sum() + b_or).item()
                     expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
                 passed += 2
         for bit in range(8):
+            for name in ['control.load', 'control.store']:
                 total += 2
                 if self.reg.has(f'{name}.bit{bit}.weight'):
                     self.reg.get(f'{name}.bit{bit}.weight')
                     self.reg.get(f'{name}.bit{bit}.bias')
                     passed += 2
+        for bit in range(16):
+            total += 2
+            if self.reg.has(f'control.mem_addr.bit{bit}.weight'):
+                self.reg.get(f'control.mem_addr.bit{bit}.weight')
+                self.reg.get(f'control.mem_addr.bit{bit}.bias')
+                passed += 2
         return TestResult('control.fetch_load_store', passed, total, [])
+    def test_packed_memory_routing(self) -> TestResult:
+        """Validate packed memory tensor routing and shapes."""
+        failures = []
+        passed = 0
+        total = 0
+        circuits = ["memory.addr_decode", "memory.read", "memory.write"]
+        routing = self.routing_eval.routing.get("circuits", {})
+        routing_keys = set()
+        for circuit in circuits:
+            total += 1
+            if circuit not in routing:
+                failures.append((circuit, "routing", "missing"))
+                continue
+            passed += 1
+            internal = routing[circuit].get("internal", {})
+            for value in internal.values():
+                if isinstance(value, list):
+                    routing_keys.update(value)
+        total += 1
+        if routing_keys and all(key for key in routing_keys):
+            passed += 1
+        else:
+            failures.append(("packed_keys", "non-empty", "empty"))
+        mem_bytes = int(self.reg.get("manifest.memory_bytes").item()) if self.reg.has("manifest.memory_bytes") else 65536
+        pc_width = int(self.reg.get("manifest.pc_width").item()) if self.reg.has("manifest.pc_width") else 16
+        reg_width = int(self.reg.get("manifest.register_width").item()) if self.reg.has("manifest.register_width") else 8
+        expected_shapes = {
+            "memory.addr_decode.weight": (mem_bytes, pc_width),
+            "memory.addr_decode.bias": (mem_bytes,),
+            "memory.read.and.weight": (reg_width, mem_bytes, 2),
+            "memory.read.and.bias": (reg_width, mem_bytes),
+            "memory.read.or.weight": (reg_width, mem_bytes),
+            "memory.read.or.bias": (reg_width,),
+            "memory.write.sel.weight": (mem_bytes, 2),
+            "memory.write.sel.bias": (mem_bytes,),
+            "memory.write.nsel.weight": (mem_bytes, 1),
+            "memory.write.nsel.bias": (mem_bytes,),
+            "memory.write.and_old.weight": (mem_bytes, reg_width, 2),
+            "memory.write.and_old.bias": (mem_bytes, reg_width),
+            "memory.write.and_new.weight": (mem_bytes, reg_width, 2),
+            "memory.write.and_new.bias": (mem_bytes, reg_width),
+            "memory.write.or.weight": (mem_bytes, reg_width, 2),
+            "memory.write.or.bias": (mem_bytes, reg_width),
+        }
+        for key, expected in expected_shapes.items():
+            total += 1
+            if key not in routing_keys:
+                failures.append((key, "routing_ref", "missing"))
+                continue
+            if not self.reg.has(key):
+                failures.append((key, "tensor_exists", "missing"))
+                continue
+            actual = tuple(self.reg.get(key).shape)
+            if actual == expected:
+                passed += 1
+            else:
+                failures.append((key, expected, actual))
+        return TestResult('memory.packed_routing', passed, total, failures)
     # =========================================================================
     # ARITHMETIC - ADDITIONAL CIRCUITS
     # =========================================================================
         # Memory
         if verbose:
             print("\n=== MEMORY ===")
+        self._run_test(self.evaluator.test_memory_decoder_16to65536, verbose)
         self._run_test(self.evaluator.test_memory_read_mux, verbose)
         self._run_test(self.evaluator.test_memory_write_cells, verbose)
         self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
+        self._run_test(self.evaluator.test_packed_memory_routing, verbose)
         # Error detection
         if verbose:

eval/cpu_cycle_test.py CHANGED Viewed

@@ -7,8 +7,11 @@ from pathlib import Path
 sys.path.append(str(Path(__file__).resolve().parent.parent))
 from cpu.cycle import run_until_halt
-from cpu.state import CPUState
 def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
@@ -16,28 +19,36 @@ def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
 def write_instr(mem, addr, instr):
-    mem[addr & 0xFF] = (instr >> 8) & 0xFF
-    mem[(addr + 1) & 0xFF] = instr & 0xFF
 def main() -> None:
-    mem = [0] * 256
-    write_instr(mem, 0x00, encode(0xA, 0, 0, 0x10))  # LOAD R0, [0x10]
-    write_instr(mem, 0x02, encode(0xA, 1, 0, 0x11))  # LOAD R1, [0x11]
-    write_instr(mem, 0x04, encode(0x0, 0, 1, 0x00))  # ADD R0, R1
-    write_instr(mem, 0x06, encode(0xB, 0, 0, 0x12))  # STORE R0 -> [0x12]
-    write_instr(mem, 0x08, encode(0xF, 0, 0, 0x00))  # HALT
-    mem[0x10] = 5
-    mem[0x11] = 7
     state = CPUState(
         pc=0,
         ir=0,
         regs=[0, 0, 0, 0],
         flags=[0, 0, 0, 0],
-        sp=0xFF,
         ctrl=[0, 0, 0, 0],
         mem=mem,
     )
@@ -46,9 +57,29 @@ def main() -> None:
     assert final.ctrl[0] == 1, "HALT flag not set"
     assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
-    assert final.mem[0x12] == 12, f"MEM[0x12] expected 12, got {final.mem[0x12]}"
     assert cycles <= 10, f"Unexpected cycle count: {cycles}"
     print("cpu_cycle_test: ok")

 sys.path.append(str(Path(__file__).resolve().parent.parent))
+import torch
 from cpu.cycle import run_until_halt
+from cpu.state import CPUState, pack_state, unpack_state
+from cpu.threshold_cpu import ThresholdCPU
 def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
 def write_instr(mem, addr, instr):
+    mem[addr & 0xFFFF] = (instr >> 8) & 0xFF
+    mem[(addr + 1) & 0xFFFF] = instr & 0xFF
+def write_addr(mem, addr, value):
+    mem[addr & 0xFFFF] = (value >> 8) & 0xFF
+    mem[(addr + 1) & 0xFFFF] = value & 0xFF
 def main() -> None:
+    mem = [0] * 65536
+    write_instr(mem, 0x0000, encode(0xA, 0, 0, 0x00))  # LOAD R0, [addr]
+    write_addr(mem, 0x0002, 0x0100)
+    write_instr(mem, 0x0004, encode(0xA, 1, 0, 0x00))  # LOAD R1, [addr]
+    write_addr(mem, 0x0006, 0x0101)
+    write_instr(mem, 0x0008, encode(0x0, 0, 1, 0x00))  # ADD R0, R1
+    write_instr(mem, 0x000A, encode(0xB, 0, 0, 0x00))  # STORE R0 -> [addr]
+    write_addr(mem, 0x000C, 0x0102)
+    write_instr(mem, 0x000E, encode(0xF, 0, 0, 0x00))  # HALT
+    mem[0x0100] = 5
+    mem[0x0101] = 7
     state = CPUState(
         pc=0,
         ir=0,
         regs=[0, 0, 0, 0],
         flags=[0, 0, 0, 0],
+        sp=0xFFFE,
         ctrl=[0, 0, 0, 0],
         mem=mem,
     )
     assert final.ctrl[0] == 1, "HALT flag not set"
     assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
+    assert final.mem[0x0102] == 12, f"MEM[0x0102] expected 12, got {final.mem[0x0102]}"
     assert cycles <= 10, f"Unexpected cycle count: {cycles}"
+    # Threshold-weight runtime should match reference behavior.
+    threshold_cpu = ThresholdCPU()
+    t_final, t_cycles = threshold_cpu.run_until_halt(state, max_cycles=20)
+    assert t_final.ctrl[0] == 1, "Threshold HALT flag not set"
+    assert t_final.regs[0] == final.regs[0], f"Threshold R0 mismatch: {t_final.regs[0]} != {final.regs[0]}"
+    assert t_final.mem[0x0102] == final.mem[0x0102], (
+        f"Threshold MEM[0x0102] mismatch: {t_final.mem[0x0102]} != {final.mem[0x0102]}"
+    )
+    assert t_cycles == cycles, f"Threshold cycle count mismatch: {t_cycles} != {cycles}"
+    # Validate forward() state I/O.
+    bits = torch.tensor(pack_state(state), dtype=torch.float32)
+    out_bits = threshold_cpu.forward(bits, max_cycles=20)
+    out_state = unpack_state([int(b) for b in out_bits.tolist()])
+    assert out_state.regs[0] == final.regs[0], f"Forward R0 mismatch: {out_state.regs[0]} != {final.regs[0]}"
+    assert out_state.mem[0x0102] == final.mem[0x0102], (
+        f"Forward MEM[0x0102] mismatch: {out_state.mem[0x0102]} != {final.mem[0x0102]}"
+    )
     print("cpu_cycle_test: ok")

eval/iron_eval.py CHANGED Viewed

@@ -8,9 +8,11 @@ GPU-optimized for population-based evolution.
 Target: ~40GB VRAM on RTX 6000 Ada (4M population)
 """
-import torch
-from typing import Dict, Tuple
-from safetensors import safe_open
 def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
@@ -32,10 +34,20 @@ class BatchedFitnessEvaluator:
     GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
     """
-    def __init__(self, device='cuda'):
-        self.device = device
-        self._setup_tests()
     def _setup_tests(self):
         """Pre-compute all test vectors."""
         d = self.device
@@ -3146,10 +3158,10 @@ class BatchedFitnessEvaluator:
         return scores, total_tests
-    def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
-        """
-        MANIFEST - Verify manifest values are preserved.
-        """
         pop_size = next(iter(pop.values())).shape[0]
         scores = torch.zeros(pop_size, device=self.device)
         total_tests = 0
@@ -3158,12 +3170,12 @@ class BatchedFitnessEvaluator:
             ('manifest.alu_operations', 16),
             ('manifest.flags', 4),
             ('manifest.instruction_width', 16),
-            ('manifest.memory_bytes', 256),
-            ('manifest.pc_width', 8),
             ('manifest.register_width', 8),
             ('manifest.registers', 4),
             ('manifest.turing_complete', 1),
-            ('manifest.version', 2),
         ]
         for tensor_name, expected_value in manifest_tensors:
@@ -3175,7 +3187,79 @@ class BatchedFitnessEvaluator:
         if debug and pop_size == 1:
             print(f"    Manifest: {int(scores[0].item())}/{total_tests}")
-        return scores, total_tests
     def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
         """
@@ -3328,13 +3412,17 @@ class BatchedFitnessEvaluator:
         total_scores += incdec_scores
         total_tests += incdec_tests
-        manifest_scores, manifest_tests = self._test_manifest(pop, debug)
-        total_scores += manifest_scores
-        total_tests += manifest_tests
-        eq_scores, eq_tests = self._test_equality_circuit(pop, debug)
-        total_scores += eq_scores
-        total_tests += eq_tests
         minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
         total_scores += minmax_scores

 Target: ~40GB VRAM on RTX 6000 Ada (4M population)
 """
+import json
+import os
+import torch
+from typing import Dict, Tuple
+from safetensors import safe_open
 def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
     GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
     """
+    def __init__(self, device='cuda'):
+        self.device = device
+        self.routing = self._load_routing()
+        self._setup_tests()
+    def _load_routing(self) -> Dict:
+        """Load routing.json for packed memory validation."""
+        root = os.path.dirname(os.path.dirname(__file__))
+        path = os.path.join(root, "routing.json")
+        if os.path.exists(path):
+            with open(path, "r", encoding="utf-8") as fh:
+                return json.load(fh)
+        return {"circuits": {}}
     def _setup_tests(self):
         """Pre-compute all test vectors."""
         d = self.device
         return scores, total_tests
+    def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
+        """
+        MANIFEST - Verify manifest values are preserved.
+        """
         pop_size = next(iter(pop.values())).shape[0]
         scores = torch.zeros(pop_size, device=self.device)
         total_tests = 0
             ('manifest.alu_operations', 16),
             ('manifest.flags', 4),
             ('manifest.instruction_width', 16),
+            ('manifest.memory_bytes', 65536),
+            ('manifest.pc_width', 16),
             ('manifest.register_width', 8),
             ('manifest.registers', 4),
             ('manifest.turing_complete', 1),
+            ('manifest.version', 3),
         ]
         for tensor_name, expected_value in manifest_tensors:
         if debug and pop_size == 1:
             print(f"    Manifest: {int(scores[0].item())}/{total_tests}")
+        return scores, total_tests
+    def _test_packed_memory_routing(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
+        """
+        PACKED MEMORY ROUTING - Validate routing references and tensor shapes.
+        """
+        pop_size = next(iter(pop.values())).shape[0]
+        scores = torch.zeros(pop_size, device=self.device)
+        total_tests = 0
+        routing = self.routing.get("circuits", {})
+        circuits = ["memory.addr_decode", "memory.read", "memory.write"]
+        routing_keys = set()
+        for circuit in circuits:
+            total_tests += 1
+            if circuit not in routing:
+                continue
+            scores += 1
+            internal = routing[circuit].get("internal", {})
+            for value in internal.values():
+                if isinstance(value, list):
+                    routing_keys.update(value)
+        total_tests += 1
+        if routing_keys and all(key for key in routing_keys):
+            scores += 1
+        if "manifest.memory_bytes" in pop:
+            mem_bytes = int(pop["manifest.memory_bytes"][0].item())
+        else:
+            mem_bytes = 65536
+        if "manifest.pc_width" in pop:
+            pc_width = int(pop["manifest.pc_width"][0].item())
+        else:
+            pc_width = 16
+        if "manifest.register_width" in pop:
+            reg_width = int(pop["manifest.register_width"][0].item())
+        else:
+            reg_width = 8
+        expected_shapes = {
+            "memory.addr_decode.weight": (pop_size, mem_bytes, pc_width),
+            "memory.addr_decode.bias": (pop_size, mem_bytes),
+            "memory.read.and.weight": (pop_size, reg_width, mem_bytes, 2),
+            "memory.read.and.bias": (pop_size, reg_width, mem_bytes),
+            "memory.read.or.weight": (pop_size, reg_width, mem_bytes),
+            "memory.read.or.bias": (pop_size, reg_width),
+            "memory.write.sel.weight": (pop_size, mem_bytes, 2),
+            "memory.write.sel.bias": (pop_size, mem_bytes),
+            "memory.write.nsel.weight": (pop_size, mem_bytes, 1),
+            "memory.write.nsel.bias": (pop_size, mem_bytes),
+            "memory.write.and_old.weight": (pop_size, mem_bytes, reg_width, 2),
+            "memory.write.and_old.bias": (pop_size, mem_bytes, reg_width),
+            "memory.write.and_new.weight": (pop_size, mem_bytes, reg_width, 2),
+            "memory.write.and_new.bias": (pop_size, mem_bytes, reg_width),
+            "memory.write.or.weight": (pop_size, mem_bytes, reg_width, 2),
+            "memory.write.or.bias": (pop_size, mem_bytes, reg_width),
+        }
+        for key, expected in expected_shapes.items():
+            total_tests += 1
+            if key not in routing_keys:
+                continue
+            if key not in pop:
+                continue
+            if tuple(pop[key].shape) == expected:
+                scores += 1
+        if debug and pop_size == 1:
+            print(f"    Packed Memory Routing: {int(scores[0].item())}/{total_tests}")
+        return scores, total_tests
     def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
         """
         total_scores += incdec_scores
         total_tests += incdec_tests
+        manifest_scores, manifest_tests = self._test_manifest(pop, debug)
+        total_scores += manifest_scores
+        total_tests += manifest_tests
+        packed_scores, packed_tests = self._test_packed_memory_routing(pop, debug)
+        total_scores += packed_scores
+        total_tests += packed_tests
+        eq_scores, eq_tests = self._test_equality_circuit(pop, debug)
+        total_scores += eq_scores
+        total_tests += eq_tests
         minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
         total_scores += minmax_scores

neural_computer.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9c40d35edb9ed7d37c0454b3aacde3d8effc68dfbc707b68ae1feb585836581f
-size 2525316

 version https://git-lfs.github.com/spec/v1
+oid sha256:ba0c0e7e6286bc5a55d66ecbda8a1d43084a72e6a960d898b268fb6558c473a4
+size 33725820

routing.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

routing/generate_routing.py CHANGED Viewed

@@ -5,7 +5,10 @@ Maps each gate to its input sources.
 import json
 from safetensors import safe_open
-from collections import defaultdict
 def get_all_gates(tensors_path):
     """Extract all unique gate paths from tensors file."""
@@ -423,12 +426,12 @@ def generate_manifest_routing():
         'manifest.alu_operations': {'type': 'constant', 'value': 16},
         'manifest.flags': {'type': 'constant', 'value': 4},
         'manifest.instruction_width': {'type': 'constant', 'value': 16},
-        'manifest.memory_bytes': {'type': 'constant', 'value': 256},
-        'manifest.pc_width': {'type': 'constant', 'value': 8},
         'manifest.register_width': {'type': 'constant', 'value': 8},
         'manifest.registers': {'type': 'constant', 'value': 4},
         'manifest.turing_complete': {'type': 'constant', 'value': 1},
-        'manifest.version': {'type': 'constant', 'value': 2}
     }
@@ -1032,9 +1035,9 @@ def generate_control_routing():
         'internal': internal_store
     }
-    internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(8)}
     routing['control.mem_addr'] = {
-        'inputs': ['$addr[0:7]'],
         'type': 'buffer',
         'internal': internal_mem_addr
     }
@@ -1043,52 +1046,38 @@ def generate_control_routing():
 def generate_memory_routing():
-    """Generate routing for memory decoder, read mux, and write cell update."""
     routing = {}
-    addr_bits = [f'$addr[{i}]' for i in range(8)]
-    internal_dec = {f'out{addr}': addr_bits for addr in range(256)}
     routing['memory.addr_decode'] = {
-        'inputs': ['$addr[0:7]'],
-        'type': 'decoder',
-        'internal': internal_dec
     }
-    internal_read = {}
-    for bit in range(8):
-        for addr in range(256):
-            internal_read[f'bit{bit}.and{addr}'] = [f'$mem[{addr}][{bit}]', f'$sel[{addr}]']
-        internal_read[f'bit{bit}.or'] = [f'bit{bit}.and{i}' for i in range(256)]
     routing['memory.read'] = {
-        'inputs': ['$mem[0:255][0:7]', '$sel[0:255]'],
-        'type': 'read_mux',
-        'internal': internal_read,
-        'outputs': {f'bit{bit}': f'bit{bit}.or' for bit in range(8)}
-    }
-    internal_write = {}
-    for addr in range(256):
-        internal_write[f'sel.addr{addr}'] = [f'$sel[{addr}]', '$we']
-        internal_write[f'nsel.addr{addr}'] = [f'sel.addr{addr}']
-        for bit in range(8):
-            internal_write[f'addr{addr}.bit{bit}.and_old'] = [f'$mem[{addr}][{bit}]', f'nsel.addr{addr}']
-            internal_write[f'addr{addr}.bit{bit}.and_new'] = [f'$write_data[{bit}]', f'sel.addr{addr}']
-            internal_write[f'addr{addr}.bit{bit}.or'] = [
-                f'addr{addr}.bit{bit}.and_old',
-                f'addr{addr}.bit{bit}.and_new'
-            ]
-    outputs = {
-        f'mem[{addr}][{bit}]': f'addr{addr}.bit{bit}.or'
-        for addr in range(256) for bit in range(8)
     }
     routing['memory.write'] = {
-        'inputs': ['$mem[0:255][0:7]', '$write_data[0:7]', '$sel[0:255]', '$we'],
-        'type': 'write_mux',
-        'internal': internal_write,
-        'outputs': outputs
     }
     return routing

 import json
 from safetensors import safe_open
+from collections import defaultdict
+ADDR_BITS = 16
+MEM_BYTES = 1 << ADDR_BITS
 def get_all_gates(tensors_path):
     """Extract all unique gate paths from tensors file."""
         'manifest.alu_operations': {'type': 'constant', 'value': 16},
         'manifest.flags': {'type': 'constant', 'value': 4},
         'manifest.instruction_width': {'type': 'constant', 'value': 16},
+        'manifest.memory_bytes': {'type': 'constant', 'value': 65536},
+        'manifest.pc_width': {'type': 'constant', 'value': 16},
         'manifest.register_width': {'type': 'constant', 'value': 8},
         'manifest.registers': {'type': 'constant', 'value': 4},
         'manifest.turing_complete': {'type': 'constant', 'value': 1},
+        'manifest.version': {'type': 'constant', 'value': 3}
     }
         'internal': internal_store
     }
+    internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(ADDR_BITS)}
     routing['control.mem_addr'] = {
+        'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
         'type': 'buffer',
         'internal': internal_mem_addr
     }
 def generate_memory_routing():
+    """Generate routing for packed memory decoder, read mux, and write cell update."""
     routing = {}
     routing['memory.addr_decode'] = {
+        'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
+        'type': 'decoder_packed',
+        'internal': {
+            'weight': ['memory.addr_decode.weight'],
+            'bias': ['memory.addr_decode.bias'],
+        }
     }
     routing['memory.read'] = {
+        'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', f'$sel[0:{MEM_BYTES - 1}]'],
+        'type': 'read_mux_packed',
+        'internal': {
+            'and': ['memory.read.and.weight', 'memory.read.and.bias'],
+            'or': ['memory.read.or.weight', 'memory.read.or.bias'],
+        },
+        'outputs': {f'bit{bit}': f'bit{bit}' for bit in range(8)}
     }
     routing['memory.write'] = {
+        'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', '$write_data[0:7]', f'$sel[0:{MEM_BYTES - 1}]', '$we'],
+        'type': 'write_mux_packed',
+        'internal': {
+            'sel': ['memory.write.sel.weight', 'memory.write.sel.bias'],
+            'nsel': ['memory.write.nsel.weight', 'memory.write.nsel.bias'],
+            'and_old': ['memory.write.and_old.weight', 'memory.write.and_old.bias'],
+            'and_new': ['memory.write.and_new.weight', 'memory.write.and_new.bias'],
+            'or': ['memory.write.or.weight', 'memory.write.or.bias'],
+        }
     }
     return routing

routing/routing.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

routing/routing_schema.md CHANGED Viewed

@@ -37,8 +37,11 @@ The routing file (`routing.json`) defines how gates are interconnected. Each ent
 6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
    - Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
-## Circuit Types
 ### Single-Layer Gates
 Gates with just `.weight` and `.bias`:
@@ -77,30 +80,92 @@ Complex circuits with sub-components:
 }
 ```
-### Bit-Indexed Circuits
-Circuits operating on multi-bit values:
-```json
-"arithmetic.ripplecarry8bit": {
-  "external_inputs": ["$a[0:7]", "$b[0:7]"],
-  "gates": {
-    "fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
-    "fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
-    ...
-  }
-}
-```
-## Naming Conventions
 - External inputs: `$name` or `$name[bit]`
 - Constants: `#0`, `#1`
 - Internal gates: relative path from circuit root
 - Outputs: named in `outputs` section
-## Validation Rules
-1. Every gate in routing must exist in tensors file
-2. Every tensor must have routing entry
-3. Input count must match weight dimensions
-4. No circular dependencies (DAG only)
-5. All referenced sources must exist

 6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
    - Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
+7. **Packed memory tensors**: For 64KB memory, routing uses packed tensor blocks instead of per-gate entries.
+   - Example: `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`
+## Circuit Types
 ### Single-Layer Gates
 Gates with just `.weight` and `.bias`:
 }
 ```
+### Bit-Indexed Circuits
+Circuits operating on multi-bit values:
+```json
+"arithmetic.ripplecarry8bit": {
+  "external_inputs": ["$a[0:7]", "$b[0:7]"],
+  "gates": {
+    "fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
+    "fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
+    ...
+  }
+}
+```
+### Packed Memory Circuits
+64KB memory routing uses packed tensors to avoid exploding the header size. The routing entry
+declares a packed type and lists the tensor blocks used for the operation.
+```json
+"memory.addr_decode": {
+  "inputs": ["$addr[0:15]"],
+  "type": "decoder_packed",
+  "internal": {
+    "weight": ["memory.addr_decode.weight"],
+    "bias": ["memory.addr_decode.bias"]
+  }
+}
+"memory.read": {
+  "inputs": ["$mem[0:65535][0:7]", "$sel[0:65535]"],
+  "type": "read_mux_packed",
+  "internal": {
+    "and": ["memory.read.and.weight", "memory.read.and.bias"],
+    "or": ["memory.read.or.weight", "memory.read.or.bias"]
+  },
+  "outputs": { "bit0": "bit0", "bit1": "bit1", "bit2": "bit2", "bit3": "bit3",
+               "bit4": "bit4", "bit5": "bit5", "bit6": "bit6", "bit7": "bit7" }
+}
+"memory.write": {
+  "inputs": ["$mem[0:65535][0:7]", "$write_data[0:7]", "$sel[0:65535]", "$we"],
+  "type": "write_mux_packed",
+  "internal": {
+    "sel": ["memory.write.sel.weight", "memory.write.sel.bias"],
+    "nsel": ["memory.write.nsel.weight", "memory.write.nsel.bias"],
+    "and_old": ["memory.write.and_old.weight", "memory.write.and_old.bias"],
+    "and_new": ["memory.write.and_new.weight", "memory.write.and_new.bias"],
+    "or": ["memory.write.or.weight", "memory.write.or.bias"]
+  }
+}
+```
+Packed tensor mapping (shapes assume 16-bit address, 8-bit data):
+- `memory.addr_decode.weight`: [65536, 16]
+- `memory.addr_decode.bias`: [65536]
+- `memory.read.and.weight`: [8, 65536, 2]
+- `memory.read.and.bias`: [8, 65536]
+- `memory.read.or.weight`: [8, 65536]
+- `memory.read.or.bias`: [8]
+- `memory.write.sel.weight`: [65536, 2]
+- `memory.write.sel.bias`: [65536]
+- `memory.write.nsel.weight`: [65536, 1]
+- `memory.write.nsel.bias`: [65536]
+- `memory.write.and_old.weight`: [65536, 8, 2]
+- `memory.write.and_old.bias`: [65536, 8]
+- `memory.write.and_new.weight`: [65536, 8, 2]
+- `memory.write.and_new.bias`: [65536, 8]
+- `memory.write.or.weight`: [65536, 8, 2]
+- `memory.write.or.bias`: [65536, 8]
+Semantics are the same as the unrolled circuits, but computed in bulk:
+- decode: `sel[i] = H(sum(addr_bits * weight[i]) + bias[i])`
+- read: `bit[b] = H(sum(H([mem_bit, sel] * and_w[b,i] + and_b[b,i]) * or_w[b]) + or_b[b])`
+- write: `new_bit = H(H([old_bit, nsel] * and_old_w + and_old_b) + H([data_bit, sel] * and_new_w + and_new_b) - 1)`
+## Naming Conventions
 - External inputs: `$name` or `$name[bit]`
 - Constants: `#0`, `#1`
 - Internal gates: relative path from circuit root
 - Outputs: named in `outputs` section
+## Validation Rules
+1. Every gate in routing must exist in tensors file
+2. Every tensor must have routing entry
+3. Input count must match weight dimensions
+4. No circular dependencies (DAG only)
+5. All referenced sources must exist
+6. Packed memory circuits are valid when the packed tensor blocks exist and match the expected shapes

routing/validate_packed_memory.py ADDED Viewed

	@@ -0,0 +1,117 @@

+"""
+Validate packed memory tensor references in routing.json against safetensors.
+"""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+from typing import Dict, Iterable, List, Tuple
+from safetensors import safe_open
+def _load_json(path: Path) -> Dict:
+    with path.open("r", encoding="utf-8") as fh:
+        return json.load(fh)
+def _get_scalar_tensor(f, name: str, default: int) -> int:
+    if name not in f.keys():
+        return default
+    tensor = f.get_tensor(name)
+    return int(tensor.item())
+def _gather_internal_keys(routing: Dict, circuit_name: str) -> List[str]:
+    circuit = routing.get("circuits", {}).get(circuit_name)
+    if circuit is None:
+        return []
+    internal = circuit.get("internal", {})
+    keys: List[str] = []
+    for value in internal.values():
+        if isinstance(value, list):
+            keys.extend(value)
+    return keys
+def _shape_matches(actual: Iterable[int], expected: Iterable[int]) -> bool:
+    return tuple(actual) == tuple(expected)
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Validate packed memory routing tensors.")
+    parser.add_argument(
+        "--routing",
+        type=Path,
+        default=Path(__file__).resolve().parent / "routing.json",
+        help="Path to routing.json",
+    )
+    parser.add_argument(
+        "--model",
+        type=Path,
+        default=Path(__file__).resolve().parent.parent / "neural_computer.safetensors",
+        help="Path to neural_computer.safetensors",
+    )
+    args = parser.parse_args()
+    routing = _load_json(args.routing)
+    routing_keys = set()
+    for name in ("memory.addr_decode", "memory.read", "memory.write"):
+        routing_keys.update(_gather_internal_keys(routing, name))
+    missing_routing = [k for k in routing_keys if not k]
+    if missing_routing:
+        print("routing.json contains empty packed tensor entries.", file=sys.stderr)
+        return 1
+    with safe_open(str(args.model), framework="pt") as f:
+        mem_bytes = _get_scalar_tensor(f, "manifest.memory_bytes", 65536)
+        pc_width = _get_scalar_tensor(f, "manifest.pc_width", 16)
+        reg_width = _get_scalar_tensor(f, "manifest.register_width", 8)
+        expected_shapes: Dict[str, Tuple[int, ...]] = {
+            "memory.addr_decode.weight": (mem_bytes, pc_width),
+            "memory.addr_decode.bias": (mem_bytes,),
+            "memory.read.and.weight": (reg_width, mem_bytes, 2),
+            "memory.read.and.bias": (reg_width, mem_bytes),
+            "memory.read.or.weight": (reg_width, mem_bytes),
+            "memory.read.or.bias": (reg_width,),
+            "memory.write.sel.weight": (mem_bytes, 2),
+            "memory.write.sel.bias": (mem_bytes,),
+            "memory.write.nsel.weight": (mem_bytes, 1),
+            "memory.write.nsel.bias": (mem_bytes,),
+            "memory.write.and_old.weight": (mem_bytes, reg_width, 2),
+            "memory.write.and_old.bias": (mem_bytes, reg_width),
+            "memory.write.and_new.weight": (mem_bytes, reg_width, 2),
+            "memory.write.and_new.bias": (mem_bytes, reg_width),
+            "memory.write.or.weight": (mem_bytes, reg_width, 2),
+            "memory.write.or.bias": (mem_bytes, reg_width),
+        }
+        errors = []
+        for key, expected in expected_shapes.items():
+            if key not in routing_keys:
+                errors.append(f"routing.json missing key: {key}")
+                continue
+            if key not in f.keys():
+                errors.append(f"safetensors missing key: {key}")
+                continue
+            actual = f.get_tensor(key).shape
+            if not _shape_matches(actual, expected):
+                errors.append(f"{key} shape {tuple(actual)} != {expected}")
+    if errors:
+        print("Packed memory validation failed:", file=sys.stderr)
+        for err in errors:
+            print(f"  - {err}", file=sys.stderr)
+        return 1
+    print("Packed memory routing validation: ok")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

tensors.txt CHANGED Viewed

The diff for this file is too large to render. See raw diff

todo.md CHANGED Viewed

@@ -56,54 +56,55 @@ The machine runs. Callers just provide initial state and collect results.
 ### State Tensor Layout
 ```
 ┌────────┬──────────┬───────┬────────┬─────────────────────┐
-│ PC [8] │ Regs[32] │Flags[4│Ctrl[4] │ Memory [N × 8]      │
 └────────┴──────────┴───────┴────────┴─────────────────────┘
-    8    +   32     +   4   +   4    +   N × 8   bits
 ```
 ### Memory Hierarchy
 | Level | Size | Tensors | Access |
 |-------|------|---------|--------|
 | Registers | 4 × 8-bit | Direct wiring | Immediate |
-| Hot cache | 256 bytes | ~6,400 | 8-bit addressed |
-| Cold bank | 64KB | ~1.6M | 16-bit addressed |
 ### Full 64KB Configuration
 - Address space: 0x0000 - 0xFFFF
 - Routing circuits: ~1.64M tensors
-- State tensor: 48 + 524,288 = 524,336 bits per instance
 ## Phase 1: Memory Infrastructure
 | Component | Description | Tensors | Status |
 |-----------|-------------|---------|--------|
-| Address Decoder 8-bit | 8-bit → 256 one-hot | ~520 | Pending |
-| Address Decoder 16-bit | 16-bit → 65536 one-hot | ~65,600 | Pending |
-| Memory Read MUX 256 | 256-to-1 × 8 bits | ~2,048 | Pending |
-| Memory Read MUX 64K | 65536-to-1 × 8 bits | ~524,288 | Pending |
-| Memory Write Demux | Route write to address | ~524,288 | Pending |
-| Memory Cell Logic | Conditional update | ~524,288 | Pending |
 ## Phase 2: Execution Engine
 | Component | Description | Status |
 |-----------|-------------|--------|
-| Instruction Fetch | PC → Memory → IR | Pending |
-| Operand Fetch | Decode → Register/Memory Read | Pending |
-| ALU Dispatch | Opcode → Operation Select | Pending |
-| Result Writeback | Route to destination | Pending |
-| Flag Update | Compute Z/N/C/V | Partial |
 | PC Advance | Increment or Jump | Done |
 | Halt Detection | HALT opcode → stop | Done |
 ## Phase 3: ACT Integration
 | Component | Description | Status |
 |-----------|-------------|--------|
-| Cycle Block | All Phase 2 as single layer | Pending |
-| Recurrence Wrapper | Loop until halt signal | Pending |
-| Max Cycles Guard | Prevent infinite loops | Pending |
-| State I/O | Pack/unpack state tensor | Pending |
 ## Instruction Set
@@ -119,11 +120,11 @@ The machine runs. Callers just provide initial state and collect results.
 | 0x7 | MUL | R[d] = R[a] * R[b] | Done |
 | 0x8 | DIV | R[d] = R[a] / R[b] | Done |
 | 0x9 | CMP | flags = R[a] - R[b] | Done |
-| 0xA | LOAD | R[d] = M[addr] | Pending |
-| 0xB | STORE | M[addr] = R[s] | Pending |
-| 0xC | JMP | PC = addr | Partial |
 | 0xD | JZ/JNZ | PC = addr if flag | Done |
-| 0xE | CALL | push PC; PC = addr | Partial |
 | 0xF | HALT | stop execution | Done |
 ## Completed Circuits
@@ -151,8 +152,8 @@ The machine runs. Callers just provide initial state and collect results.
 - Comparators, threshold gates
 - Conditional jumps
-**Current: 24,200 tensors**
-**Projected: ~1.65M tensors (with 64KB memory)**
 ## Applications

 ### State Tensor Layout
 ```
 ┌────────┬──────────┬───────┬────────┬─────────────────────┐
+│ PC [16] │ Regs[32] │Flags[4│Ctrl[4] │ Memory [N × 8]      │
 └────────┴──────────┴───────┴────────┴─────────────────────┘
+    16   +   32     +   4   +   4    +   N × 8   bits
 ```
 ### Memory Hierarchy
 | Level | Size | Tensors | Access |
 |-------|------|---------|--------|
 | Registers | 4 × 8-bit | Direct wiring | Immediate |
+| Main memory | 64KB | ~1.6M | 16-bit addressed |
 ### Full 64KB Configuration
 - Address space: 0x0000 - 0xFFFF
 - Routing circuits: ~1.64M tensors
+- State tensor: 88 + 524,288 = 524,376 bits per instance
 ## Phase 1: Memory Infrastructure
+64KB memory circuits are implemented and pass comprehensive eval.
 | Component | Description | Tensors | Status |
 |-----------|-------------|---------|--------|
+| Address Decoder 16-bit | 16-bit → 65536 one-hot | 2 (packed) | Done |
+| Memory Read MUX 64K | 65536-to-1 × 8 bits | 4 (packed) | Done |
+| Memory Write Demux | Route write to address | 4 (packed) | Done |
+| Memory Cell Logic | Conditional update | 6 (packed) | Done |
 ## Phase 2: Execution Engine
 | Component | Description | Status |
 |-----------|-------------|--------|
+| Instruction Fetch | PC → Memory → IR | Done |
+| Operand Fetch | Decode → Register/Memory Read | Done |
+| ALU Dispatch | Opcode → Operation Select | Done |
+| Result Writeback | Route to destination | Done |
+| Flag Update | Compute Z/N/C/V | Done |
 | PC Advance | Increment or Jump | Done |
 | Halt Detection | HALT opcode → stop | Done |
 ## Phase 3: ACT Integration
+Threshold runtime available in cpu/threshold_cpu.py (cycle + ACT loop + state I/O).
 | Component | Description | Status |
 |-----------|-------------|--------|
+| Cycle Block | All Phase 2 as single layer | Done |
+| Recurrence Wrapper | Loop until halt signal | Done |
+| Max Cycles Guard | Prevent infinite loops | Done |
+| State I/O | Pack/unpack state tensor | Done |
 ## Instruction Set
 | 0x7 | MUL | R[d] = R[a] * R[b] | Done |
 | 0x8 | DIV | R[d] = R[a] / R[b] | Done |
 | 0x9 | CMP | flags = R[a] - R[b] | Done |
+| 0xA | LOAD | R[d] = M[addr] | Done |
+| 0xB | STORE | M[addr] = R[s] | Done |
+| 0xC | JMP | PC = addr | Done |
 | 0xD | JZ/JNZ | PC = addr if flag | Done |
+| 0xE | CALL | push PC; PC = addr | Done |
 | 0xF | HALT | stop execution | Done |
 ## Completed Circuits
 - Comparators, threshold gates
 - Conditional jumps
+**Current: 6,296 tensors (packed memory)**
+**Parameters: 8,267,667**
 ## Applications