phanerozoic commited on
Commit
9734d14
Β·
verified Β·
1 Parent(s): acd53bb

Sync packed memory + 16-bit addressing

Browse files
.gitattributes CHANGED
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  __pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
37
  __pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
38
  eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
 
 
36
  __pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
37
  __pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
38
  eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
39
+ tensors.txt filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -17,8 +17,8 @@ tags:
17
  Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0`
18
 
19
  ```
20
- Tensors: 24,200
21
- Parameters: 40,323
22
  ```
23
 
24
  ---
@@ -30,7 +30,7 @@ A complete 8-bit processor where every operationβ€”from Boolean logic to arithme
30
  | Component | Specification |
31
  |-----------|---------------|
32
  | Registers | 4 Γ— 8-bit general purpose |
33
- | Memory | 256 bytes addressable |
34
  | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
35
  | Flags | Zero, Negative, Carry, Overflow |
36
  | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
@@ -90,7 +90,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
90
  | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
91
  | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
92
  | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
93
- | Memory | 3 | 8-bit addr decoder, 256x8 read mux, write cell update |
94
 
95
  ---
96
 
@@ -122,14 +122,14 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
122
  All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
123
 
124
  ```
125
- [ PC[8] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[8] | CTRL[4] | MEM[256][8] ]
126
  ```
127
 
128
  Flags are ordered as: `Z, N, C, V`.
129
 
130
  Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
131
 
132
- Total state size: `2120` bits.
133
 
134
  ---
135
 
@@ -145,8 +145,7 @@ opcode rd rs imm8
145
  Interpretation:
146
  - **R-type**: `rd = rd op rs` (imm8 ignored).
147
  - **I-type**: `rd = op rd, imm8` (rs ignored).
148
- - **Jumps/Calls**: `imm8` is the absolute target address.
149
- - **LOAD/STORE**: `imm8` is the absolute memory address.
150
 
151
  ---
152
 
@@ -185,12 +184,15 @@ All circuits pass exhaustive testing over their full input domains.
185
  ```
186
  {category}.{circuit}[.{layer}][.{component}].{weight|bias}
187
 
188
- Examples:
189
- boolean.and.weight
190
- boolean.xor.layer1.neuron1.weight
191
- arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
192
- modular.mod5.layer2.eq3.weight
193
- error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
 
 
 
194
  ```
195
 
196
  ---
@@ -209,7 +211,7 @@ All weights are integers. All activations are Heaviside step. Designed for:
209
 
210
  | File | Description |
211
  |------|-------------|
212
- | `neural_computer.safetensors` | 24,200 tensors, 40,323 parameters |
213
  | `iron_eval.py` | Comprehensive test suite |
214
  | `prune_weights.py` | Weight optimization tool |
215
 
 
17
  Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0`
18
 
19
  ```
20
+ Tensors: 6,296
21
+ Parameters: 8,267,667
22
  ```
23
 
24
  ---
 
30
  | Component | Specification |
31
  |-----------|---------------|
32
  | Registers | 4 Γ— 8-bit general purpose |
33
+ | Memory | 64KB addressable |
34
  | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
35
  | Flags | Zero, Negative, Carry, Overflow |
36
  | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
 
90
  | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
91
  | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
92
  | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
93
+ | Memory | 3 | 16-bit addr decoder, 65536x8 read mux, write cell update (packed) |
94
 
95
  ---
96
 
 
122
  All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
123
 
124
  ```
125
+ [ PC[16] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[16] | CTRL[4] | MEM[65536][8] ]
126
  ```
127
 
128
  Flags are ordered as: `Z, N, C, V`.
129
 
130
  Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
131
 
132
+ Total state size: `524376` bits.
133
 
134
  ---
135
 
 
145
  Interpretation:
146
  - **R-type**: `rd = rd op rs` (imm8 ignored).
147
  - **I-type**: `rd = op rd, imm8` (rs ignored).
148
+ - **Address-extended**: `LOAD`, `STORE`, `JMP`, `JZ`, `CALL` consume the next word as a 16-bit address (big-endian). `imm8` is reserved, and the PC skips 4 bytes when the jump is not taken.
 
149
 
150
  ---
151
 
 
184
  ```
185
  {category}.{circuit}[.{layer}][.{component}].{weight|bias}
186
 
187
+ Examples:
188
+ boolean.and.weight
189
+ boolean.xor.layer1.neuron1.weight
190
+ arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
191
+ modular.mod5.layer2.eq3.weight
192
+ error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
193
+
194
+ Memory circuits are stored as packed tensors to keep the safetensors header size manageable
195
+ (e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).
196
  ```
197
 
198
  ---
 
211
 
212
  | File | Description |
213
  |------|-------------|
214
+ | `neural_computer.safetensors` | 6,296 tensors, 8,267,667 parameters |
215
  | `iron_eval.py` | Comprehensive test suite |
216
  | `prune_weights.py` | Weight optimization tool |
217
 
cpu/cycle.py CHANGED
@@ -50,14 +50,22 @@ def step(state: CPUState) -> CPUState:
50
 
51
  # Fetch: two bytes, big-endian
52
  hi = s.mem[s.pc]
53
- lo = s.mem[(s.pc + 1) & 0xFF]
54
  s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
55
- next_pc = (s.pc + 2) & 0xFF
56
 
57
  opcode, rd, rs, imm8 = decode_ir(s.ir)
58
  a = s.regs[rd]
59
  b = s.regs[rs]
60
 
 
 
 
 
 
 
 
 
61
  write_result = True
62
  result = a
63
  carry = 0
@@ -94,23 +102,26 @@ def step(state: CPUState) -> CPUState:
94
  result, carry, overflow = _alu_sub(a, b)
95
  write_result = False
96
  elif opcode == 0xA: # LOAD
97
- result = s.mem[imm8]
98
  elif opcode == 0xB: # STORE
99
- s.mem[imm8] = b & 0xFF
100
  write_result = False
101
  elif opcode == 0xC: # JMP
102
- s.pc = imm8 & 0xFF
103
  write_result = False
104
  elif opcode == 0xD: # JZ
105
  if s.flags[0] == 1:
106
- s.pc = imm8 & 0xFF
107
  else:
108
- s.pc = next_pc
109
  write_result = False
110
  elif opcode == 0xE: # CALL
111
- s.sp = (s.sp - 1) & 0xFF
112
- s.mem[s.sp] = next_pc
113
- s.pc = imm8 & 0xFF
 
 
 
114
  write_result = False
115
  elif opcode == 0xF: # HALT
116
  s.ctrl[0] = 1
@@ -123,7 +134,7 @@ def step(state: CPUState) -> CPUState:
123
  s.regs[rd] = result & 0xFF
124
 
125
  if opcode not in (0xC, 0xD, 0xE):
126
- s.pc = next_pc
127
 
128
  return s
129
 
 
50
 
51
  # Fetch: two bytes, big-endian
52
  hi = s.mem[s.pc]
53
+ lo = s.mem[(s.pc + 1) & 0xFFFF]
54
  s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
55
+ next_pc = (s.pc + 2) & 0xFFFF
56
 
57
  opcode, rd, rs, imm8 = decode_ir(s.ir)
58
  a = s.regs[rd]
59
  b = s.regs[rs]
60
 
61
+ addr16 = None
62
+ next_pc_ext = next_pc
63
+ if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
64
+ addr_hi = s.mem[next_pc]
65
+ addr_lo = s.mem[(next_pc + 1) & 0xFFFF]
66
+ addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
67
+ next_pc_ext = (next_pc + 2) & 0xFFFF
68
+
69
  write_result = True
70
  result = a
71
  carry = 0
 
102
  result, carry, overflow = _alu_sub(a, b)
103
  write_result = False
104
  elif opcode == 0xA: # LOAD
105
+ result = s.mem[addr16]
106
  elif opcode == 0xB: # STORE
107
+ s.mem[addr16] = b & 0xFF
108
  write_result = False
109
  elif opcode == 0xC: # JMP
110
+ s.pc = addr16 & 0xFFFF
111
  write_result = False
112
  elif opcode == 0xD: # JZ
113
  if s.flags[0] == 1:
114
+ s.pc = addr16 & 0xFFFF
115
  else:
116
+ s.pc = next_pc_ext
117
  write_result = False
118
  elif opcode == 0xE: # CALL
119
+ ret_addr = next_pc_ext & 0xFFFF
120
+ s.sp = (s.sp - 1) & 0xFFFF
121
+ s.mem[s.sp] = (ret_addr >> 8) & 0xFF
122
+ s.sp = (s.sp - 1) & 0xFFFF
123
+ s.mem[s.sp] = ret_addr & 0xFF
124
+ s.pc = addr16 & 0xFFFF
125
  write_result = False
126
  elif opcode == 0xF: # HALT
127
  s.ctrl[0] = 1
 
134
  s.regs[rd] = result & 0xFF
135
 
136
  if opcode not in (0xC, 0xD, 0xE):
137
+ s.pc = next_pc_ext
138
 
139
  return s
140
 
cpu/state.py CHANGED
@@ -11,14 +11,14 @@ from typing import List
11
  FLAG_NAMES = ["Z", "N", "C", "V"]
12
  CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
13
 
14
- PC_BITS = 8
15
  IR_BITS = 16
16
  REG_BITS = 8
17
  REG_COUNT = 4
18
  FLAG_BITS = 4
19
- SP_BITS = 8
20
  CTRL_BITS = 4
21
- MEM_BYTES = 256
22
  MEM_BITS = MEM_BYTES * 8
23
 
24
  STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS
 
11
  FLAG_NAMES = ["Z", "N", "C", "V"]
12
  CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
13
 
14
+ PC_BITS = 16
15
  IR_BITS = 16
16
  REG_BITS = 8
17
  REG_COUNT = 4
18
  FLAG_BITS = 4
19
+ SP_BITS = 16
20
  CTRL_BITS = 4
21
+ MEM_BYTES = 65536
22
  MEM_BITS = MEM_BYTES * 8
23
 
24
  STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS
cpu/threshold_cpu.py ADDED
@@ -0,0 +1,435 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Threshold-weight runtime for the 8-bit CPU.
3
+
4
+ Implements a reference cycle using the frozen circuit weights for core ALU ops.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ from pathlib import Path
10
+ from typing import List, Tuple
11
+
12
+ import torch
13
+ from safetensors.torch import load_file
14
+
15
+ from .state import CPUState, pack_state, unpack_state, REG_BITS, PC_BITS, MEM_BYTES
16
+
17
+
18
+ def heaviside(x: torch.Tensor) -> torch.Tensor:
19
+ return (x >= 0).float()
20
+
21
+
22
+ def int_to_bits_msb(value: int, width: int) -> List[int]:
23
+ return [(value >> (width - 1 - i)) & 1 for i in range(width)]
24
+
25
+
26
+ def bits_to_int_msb(bits: List[int]) -> int:
27
+ value = 0
28
+ for bit in bits:
29
+ value = (value << 1) | int(bit)
30
+ return value
31
+
32
+
33
+ def bits_msb_to_lsb(bits: List[int]) -> List[int]:
34
+ return list(reversed(bits))
35
+
36
+
37
+ DEFAULT_MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
38
+
39
+
40
+ class ThresholdALU:
41
+ def __init__(self, model_path: str, device: str = "cpu") -> None:
42
+ self.device = device
43
+ self.tensors = {k: v.float().to(device) for k, v in load_file(model_path).items()}
44
+
45
+ def _get(self, name: str) -> torch.Tensor:
46
+ return self.tensors[name]
47
+
48
+ def _eval_gate(self, weight_key: str, bias_key: str, inputs: List[float]) -> float:
49
+ w = self._get(weight_key)
50
+ b = self._get(bias_key)
51
+ inp = torch.tensor(inputs, device=self.device)
52
+ return heaviside((inp * w).sum() + b).item()
53
+
54
+ def _eval_xor(self, prefix: str, inputs: List[float]) -> float:
55
+ inp = torch.tensor(inputs, device=self.device)
56
+ w_or = self._get(f"{prefix}.layer1.or.weight")
57
+ b_or = self._get(f"{prefix}.layer1.or.bias")
58
+ w_nand = self._get(f"{prefix}.layer1.nand.weight")
59
+ b_nand = self._get(f"{prefix}.layer1.nand.bias")
60
+ w2 = self._get(f"{prefix}.layer2.weight")
61
+ b2 = self._get(f"{prefix}.layer2.bias")
62
+
63
+ h_or = heaviside((inp * w_or).sum() + b_or).item()
64
+ h_nand = heaviside((inp * w_nand).sum() + b_nand).item()
65
+ hidden = torch.tensor([h_or, h_nand], device=self.device)
66
+ return heaviside((hidden * w2).sum() + b2).item()
67
+
68
+ def _eval_full_adder(self, prefix: str, a: float, b: float, cin: float) -> Tuple[float, float]:
69
+ ha1_sum = self._eval_xor(f"{prefix}.ha1.sum", [a, b])
70
+ ha1_carry = self._eval_gate(f"{prefix}.ha1.carry.weight", f"{prefix}.ha1.carry.bias", [a, b])
71
+
72
+ ha2_sum = self._eval_xor(f"{prefix}.ha2.sum", [ha1_sum, cin])
73
+ ha2_carry = self._eval_gate(
74
+ f"{prefix}.ha2.carry.weight", f"{prefix}.ha2.carry.bias", [ha1_sum, cin]
75
+ )
76
+
77
+ cout = self._eval_gate(f"{prefix}.carry_or.weight", f"{prefix}.carry_or.bias", [ha1_carry, ha2_carry])
78
+ return ha2_sum, cout
79
+
80
+ def add(self, a: int, b: int) -> Tuple[int, int, int]:
81
+ a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
82
+ b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
83
+
84
+ carry = 0.0
85
+ sum_bits: List[int] = []
86
+ for bit in range(REG_BITS):
87
+ sum_bit, carry = self._eval_full_adder(
88
+ f"arithmetic.ripplecarry8bit.fa{bit}", float(a_bits[bit]), float(b_bits[bit]), carry
89
+ )
90
+ sum_bits.append(int(sum_bit))
91
+
92
+ result = bits_to_int_msb(list(reversed(sum_bits)))
93
+ carry_out = int(carry)
94
+ overflow = 1 if (((a ^ result) & (b ^ result)) & 0x80) else 0
95
+ return result, carry_out, overflow
96
+
97
+ def sub(self, a: int, b: int) -> Tuple[int, int, int]:
98
+ a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
99
+ b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
100
+
101
+ carry = 1.0 # two's complement carry-in
102
+ sum_bits: List[int] = []
103
+ for bit in range(REG_BITS):
104
+ notb = self._eval_gate(
105
+ f"arithmetic.sub8bit.notb{bit}.weight",
106
+ f"arithmetic.sub8bit.notb{bit}.bias",
107
+ [float(b_bits[bit])],
108
+ )
109
+
110
+ xor1 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor1", [float(a_bits[bit]), notb])
111
+ xor2 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor2", [xor1, carry])
112
+
113
+ and1 = self._eval_gate(
114
+ f"arithmetic.sub8bit.fa{bit}.and1.weight",
115
+ f"arithmetic.sub8bit.fa{bit}.and1.bias",
116
+ [float(a_bits[bit]), notb],
117
+ )
118
+ and2 = self._eval_gate(
119
+ f"arithmetic.sub8bit.fa{bit}.and2.weight",
120
+ f"arithmetic.sub8bit.fa{bit}.and2.bias",
121
+ [xor1, carry],
122
+ )
123
+ carry = self._eval_gate(
124
+ f"arithmetic.sub8bit.fa{bit}.or_carry.weight",
125
+ f"arithmetic.sub8bit.fa{bit}.or_carry.bias",
126
+ [and1, and2],
127
+ )
128
+
129
+ sum_bits.append(int(xor2))
130
+
131
+ result = bits_to_int_msb(list(reversed(sum_bits)))
132
+ carry_out = int(carry)
133
+ overflow = 1 if (((a ^ b) & (a ^ result)) & 0x80) else 0
134
+ return result, carry_out, overflow
135
+
136
+ def bitwise_and(self, a: int, b: int) -> int:
137
+ a_bits = int_to_bits_msb(a, REG_BITS)
138
+ b_bits = int_to_bits_msb(b, REG_BITS)
139
+ w = self._get("alu.alu8bit.and.weight")
140
+ bias = self._get("alu.alu8bit.and.bias")
141
+
142
+ out_bits = []
143
+ for bit in range(REG_BITS):
144
+ inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
145
+ out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
146
+ out_bits.append(int(out))
147
+
148
+ return bits_to_int_msb(out_bits)
149
+
150
+ def bitwise_or(self, a: int, b: int) -> int:
151
+ a_bits = int_to_bits_msb(a, REG_BITS)
152
+ b_bits = int_to_bits_msb(b, REG_BITS)
153
+ w = self._get("alu.alu8bit.or.weight")
154
+ bias = self._get("alu.alu8bit.or.bias")
155
+
156
+ out_bits = []
157
+ for bit in range(REG_BITS):
158
+ inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
159
+ out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
160
+ out_bits.append(int(out))
161
+
162
+ return bits_to_int_msb(out_bits)
163
+
164
+ def bitwise_not(self, a: int) -> int:
165
+ a_bits = int_to_bits_msb(a, REG_BITS)
166
+ w = self._get("alu.alu8bit.not.weight")
167
+ bias = self._get("alu.alu8bit.not.bias")
168
+
169
+ out_bits = []
170
+ for bit in range(REG_BITS):
171
+ inp = torch.tensor([float(a_bits[bit])], device=self.device)
172
+ out = heaviside((inp * w[bit]).sum() + bias[bit]).item()
173
+ out_bits.append(int(out))
174
+
175
+ return bits_to_int_msb(out_bits)
176
+
177
+ def bitwise_xor(self, a: int, b: int) -> int:
178
+ a_bits = int_to_bits_msb(a, REG_BITS)
179
+ b_bits = int_to_bits_msb(b, REG_BITS)
180
+
181
+ w_or = self._get("alu.alu8bit.xor.layer1.or.weight")
182
+ b_or = self._get("alu.alu8bit.xor.layer1.or.bias")
183
+ w_nand = self._get("alu.alu8bit.xor.layer1.nand.weight")
184
+ b_nand = self._get("alu.alu8bit.xor.layer1.nand.bias")
185
+ w2 = self._get("alu.alu8bit.xor.layer2.weight")
186
+ b2 = self._get("alu.alu8bit.xor.layer2.bias")
187
+
188
+ out_bits = []
189
+ for bit in range(REG_BITS):
190
+ inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
191
+ h_or = heaviside((inp * w_or[bit * 2:bit * 2 + 2]).sum() + b_or[bit])
192
+ h_nand = heaviside((inp * w_nand[bit * 2:bit * 2 + 2]).sum() + b_nand[bit])
193
+ hidden = torch.stack([h_or, h_nand])
194
+ out = heaviside((hidden * w2[bit * 2:bit * 2 + 2]).sum() + b2[bit]).item()
195
+ out_bits.append(int(out))
196
+
197
+ return bits_to_int_msb(out_bits)
198
+
199
+
200
+ class ThresholdCPU:
201
+ def __init__(self, model_path: str | Path = DEFAULT_MODEL_PATH, device: str = "cpu") -> None:
202
+ self.device = device
203
+ self.alu = ThresholdALU(str(model_path), device=device)
204
+
205
+ @staticmethod
206
+ def decode_ir(ir: int) -> Tuple[int, int, int, int]:
207
+ opcode = (ir >> 12) & 0xF
208
+ rd = (ir >> 10) & 0x3
209
+ rs = (ir >> 8) & 0x3
210
+ imm8 = ir & 0xFF
211
+ return opcode, rd, rs, imm8
212
+
213
+ @staticmethod
214
+ def flags_from_result(result: int, carry: int, overflow: int) -> List[int]:
215
+ z = 1 if result == 0 else 0
216
+ n = 1 if (result & 0x80) else 0
217
+ c = 1 if carry else 0
218
+ v = 1 if overflow else 0
219
+ return [z, n, c, v]
220
+
221
+ def _addr_decode(self, addr: int) -> torch.Tensor:
222
+ bits = torch.tensor(int_to_bits_msb(addr, PC_BITS), device=self.device, dtype=torch.float32)
223
+ w = self.alu._get("memory.addr_decode.weight")
224
+ b = self.alu._get("memory.addr_decode.bias")
225
+ return heaviside((w * bits).sum(dim=1) + b)
226
+
227
+ def _memory_read(self, mem: List[int], addr: int) -> int:
228
+ sel = self._addr_decode(addr)
229
+ mem_bits = torch.tensor(
230
+ [int_to_bits_msb(byte, REG_BITS) for byte in mem],
231
+ device=self.device,
232
+ dtype=torch.float32,
233
+ )
234
+ and_w = self.alu._get("memory.read.and.weight")
235
+ and_b = self.alu._get("memory.read.and.bias")
236
+ or_w = self.alu._get("memory.read.or.weight")
237
+ or_b = self.alu._get("memory.read.or.bias")
238
+
239
+ out_bits: List[int] = []
240
+ for bit in range(REG_BITS):
241
+ inp = torch.stack([mem_bits[:, bit], sel], dim=1)
242
+ and_out = heaviside((inp * and_w[bit]).sum(dim=1) + and_b[bit])
243
+ out_bit = heaviside((and_out * or_w[bit]).sum() + or_b[bit]).item()
244
+ out_bits.append(int(out_bit))
245
+
246
+ return bits_to_int_msb(out_bits)
247
+
248
+ def _memory_write(self, mem: List[int], addr: int, value: int) -> List[int]:
249
+ sel = self._addr_decode(addr)
250
+ data_bits = torch.tensor(int_to_bits_msb(value, REG_BITS), device=self.device, dtype=torch.float32)
251
+ mem_bits = torch.tensor(
252
+ [int_to_bits_msb(byte, REG_BITS) for byte in mem],
253
+ device=self.device,
254
+ dtype=torch.float32,
255
+ )
256
+
257
+ sel_w = self.alu._get("memory.write.sel.weight")
258
+ sel_b = self.alu._get("memory.write.sel.bias")
259
+ nsel_w = self.alu._get("memory.write.nsel.weight").squeeze(1)
260
+ nsel_b = self.alu._get("memory.write.nsel.bias")
261
+ and_old_w = self.alu._get("memory.write.and_old.weight")
262
+ and_old_b = self.alu._get("memory.write.and_old.bias")
263
+ and_new_w = self.alu._get("memory.write.and_new.weight")
264
+ and_new_b = self.alu._get("memory.write.and_new.bias")
265
+ or_w = self.alu._get("memory.write.or.weight")
266
+ or_b = self.alu._get("memory.write.or.bias")
267
+
268
+ we = torch.ones_like(sel)
269
+ sel_inp = torch.stack([sel, we], dim=1)
270
+ write_sel = heaviside((sel_inp * sel_w).sum(dim=1) + sel_b)
271
+ nsel = heaviside((write_sel * nsel_w) + nsel_b)
272
+
273
+ new_mem_bits = torch.zeros((MEM_BYTES, REG_BITS), device=self.device)
274
+ for bit in range(REG_BITS):
275
+ old_bit = mem_bits[:, bit]
276
+ data_bit = data_bits[bit].expand(MEM_BYTES)
277
+ inp_old = torch.stack([old_bit, nsel], dim=1)
278
+ inp_new = torch.stack([data_bit, write_sel], dim=1)
279
+
280
+ and_old = heaviside((inp_old * and_old_w[:, bit]).sum(dim=1) + and_old_b[:, bit])
281
+ and_new = heaviside((inp_new * and_new_w[:, bit]).sum(dim=1) + and_new_b[:, bit])
282
+ or_inp = torch.stack([and_old, and_new], dim=1)
283
+ out_bit = heaviside((or_inp * or_w[:, bit]).sum(dim=1) + or_b[:, bit])
284
+ new_mem_bits[:, bit] = out_bit
285
+
286
+ return [bits_to_int_msb([int(b) for b in new_mem_bits[i].tolist()]) for i in range(MEM_BYTES)]
287
+
288
+ def _conditional_jump_byte(self, prefix: str, pc_byte: int, target_byte: int, flag: int) -> int:
289
+ pc_bits = int_to_bits_msb(pc_byte, REG_BITS)
290
+ target_bits = int_to_bits_msb(target_byte, REG_BITS)
291
+
292
+ out_bits: List[int] = []
293
+ for bit in range(REG_BITS):
294
+ not_sel = self.alu._eval_gate(
295
+ f"{prefix}.bit{bit}.not_sel.weight",
296
+ f"{prefix}.bit{bit}.not_sel.bias",
297
+ [float(flag)],
298
+ )
299
+ and_a = self.alu._eval_gate(
300
+ f"{prefix}.bit{bit}.and_a.weight",
301
+ f"{prefix}.bit{bit}.and_a.bias",
302
+ [float(pc_bits[bit]), not_sel],
303
+ )
304
+ and_b = self.alu._eval_gate(
305
+ f"{prefix}.bit{bit}.and_b.weight",
306
+ f"{prefix}.bit{bit}.and_b.bias",
307
+ [float(target_bits[bit]), float(flag)],
308
+ )
309
+ out_bit = self.alu._eval_gate(
310
+ f"{prefix}.bit{bit}.or.weight",
311
+ f"{prefix}.bit{bit}.or.bias",
312
+ [and_a, and_b],
313
+ )
314
+ out_bits.append(int(out_bit))
315
+
316
+ return bits_to_int_msb(out_bits)
317
+
318
+ def step(self, state: CPUState) -> CPUState:
319
+ if state.ctrl[0] == 1: # HALT
320
+ return state.copy()
321
+
322
+ s = state.copy()
323
+
324
+ # Fetch: two bytes, big-endian
325
+ hi = self._memory_read(s.mem, s.pc)
326
+ lo = self._memory_read(s.mem, (s.pc + 1) & 0xFFFF)
327
+ s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
328
+ next_pc = (s.pc + 2) & 0xFFFF
329
+
330
+ opcode, rd, rs, imm8 = self.decode_ir(s.ir)
331
+ a = s.regs[rd]
332
+ b = s.regs[rs]
333
+
334
+ addr16 = None
335
+ next_pc_ext = next_pc
336
+ if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
337
+ addr_hi = self._memory_read(s.mem, next_pc)
338
+ addr_lo = self._memory_read(s.mem, (next_pc + 1) & 0xFFFF)
339
+ addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
340
+ next_pc_ext = (next_pc + 2) & 0xFFFF
341
+
342
+ write_result = True
343
+ result = a
344
+ carry = 0
345
+ overflow = 0
346
+
347
+ if opcode == 0x0: # ADD
348
+ result, carry, overflow = self.alu.add(a, b)
349
+ elif opcode == 0x1: # SUB
350
+ result, carry, overflow = self.alu.sub(a, b)
351
+ elif opcode == 0x2: # AND
352
+ result = self.alu.bitwise_and(a, b)
353
+ elif opcode == 0x3: # OR
354
+ result = self.alu.bitwise_or(a, b)
355
+ elif opcode == 0x4: # XOR
356
+ result = self.alu.bitwise_xor(a, b)
357
+ elif opcode == 0x5: # SHL
358
+ carry = 1 if (a & 0x80) else 0
359
+ result = (a << 1) & 0xFF
360
+ elif opcode == 0x6: # SHR
361
+ carry = 1 if (a & 0x01) else 0
362
+ result = (a >> 1) & 0xFF
363
+ elif opcode == 0x7: # MUL
364
+ full = a * b
365
+ result = full & 0xFF
366
+ carry = 1 if full > 0xFF else 0
367
+ elif opcode == 0x8: # DIV
368
+ if b == 0:
369
+ result = 0
370
+ carry = 1
371
+ overflow = 1
372
+ else:
373
+ result = (a // b) & 0xFF
374
+ elif opcode == 0x9: # CMP
375
+ result, carry, overflow = self.alu.sub(a, b)
376
+ write_result = False
377
+ elif opcode == 0xA: # LOAD
378
+ result = self._memory_read(s.mem, addr16)
379
+ elif opcode == 0xB: # STORE
380
+ s.mem = self._memory_write(s.mem, addr16, b & 0xFF)
381
+ write_result = False
382
+ elif opcode == 0xC: # JMP
383
+ s.pc = addr16 & 0xFFFF
384
+ write_result = False
385
+ elif opcode == 0xD: # JZ
386
+ hi_pc = self._conditional_jump_byte(
387
+ "control.jz",
388
+ (next_pc_ext >> 8) & 0xFF,
389
+ (addr16 >> 8) & 0xFF,
390
+ s.flags[0],
391
+ )
392
+ lo_pc = self._conditional_jump_byte(
393
+ "control.jz",
394
+ next_pc_ext & 0xFF,
395
+ addr16 & 0xFF,
396
+ s.flags[0],
397
+ )
398
+ s.pc = ((hi_pc & 0xFF) << 8) | (lo_pc & 0xFF)
399
+ write_result = False
400
+ elif opcode == 0xE: # CALL
401
+ ret_addr = next_pc_ext & 0xFFFF
402
+ s.sp = (s.sp - 1) & 0xFFFF
403
+ s.mem = self._memory_write(s.mem, s.sp, (ret_addr >> 8) & 0xFF)
404
+ s.sp = (s.sp - 1) & 0xFFFF
405
+ s.mem = self._memory_write(s.mem, s.sp, ret_addr & 0xFF)
406
+ s.pc = addr16 & 0xFFFF
407
+ write_result = False
408
+ elif opcode == 0xF: # HALT
409
+ s.ctrl[0] = 1
410
+ write_result = False
411
+
412
+ if opcode <= 0x9 or opcode == 0xA:
413
+ s.flags = self.flags_from_result(result, carry, overflow)
414
+
415
+ if write_result:
416
+ s.regs[rd] = result & 0xFF
417
+
418
+ if opcode not in (0xC, 0xD, 0xE):
419
+ s.pc = next_pc_ext
420
+
421
+ return s
422
+
423
+ def run_until_halt(self, state: CPUState, max_cycles: int = 256) -> Tuple[CPUState, int]:
424
+ s = state.copy()
425
+ for i in range(max_cycles):
426
+ if s.ctrl[0] == 1:
427
+ return s, i
428
+ s = self.step(s)
429
+ return s, max_cycles
430
+
431
+ def forward(self, state_bits: torch.Tensor, max_cycles: int = 256) -> torch.Tensor:
432
+ bits_list = [int(b) for b in state_bits.detach().cpu().flatten().tolist()]
433
+ state = unpack_state(bits_list)
434
+ final, _ = self.run_until_halt(state, max_cycles=max_cycles)
435
+ return torch.tensor(pack_state(final), dtype=torch.float32)
eval/build_memory.py CHANGED
@@ -1,5 +1,5 @@
1
  """
2
- Generate memory and fetch/load/store buffers for the 8-bit threshold computer.
3
  Updates neural_computer.safetensors and tensors.txt in-place.
4
  """
5
 
@@ -16,6 +16,9 @@ from safetensors.torch import save_file
16
  MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
17
  MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
18
 
 
 
 
19
 
20
  def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
21
  tensors: Dict[str, torch.Tensor] = {}
@@ -34,32 +37,59 @@ def add_gate(tensors: Dict[str, torch.Tensor], name: str, weight: Iterable[float
34
  tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
35
 
36
 
37
- def add_decoder_8to256(tensors: Dict[str, torch.Tensor]) -> None:
38
- for addr in range(256):
39
- bits = [(addr >> (7 - i)) & 1 for i in range(8)] # MSB-first
40
- weights = [1.0 if bit == 1 else -1.0 for bit in bits]
41
- bias = -float(sum(bits))
42
- add_gate(tensors, f"memory.addr_decode.out{addr}", weights, [bias])
 
 
 
 
 
 
 
 
 
43
 
44
 
45
  def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
46
- # AND each mem bit with its address select, then OR across all addresses.
47
- for bit in range(8):
48
- for addr in range(256):
49
- add_gate(tensors, f"memory.read.bit{bit}.and{addr}", [1.0, 1.0], [-2.0])
50
- add_gate(tensors, f"memory.read.bit{bit}.or", [1.0] * 256, [-1.0])
 
 
 
 
51
 
52
 
53
  def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
54
- # write_sel = addr_select AND write_enable
55
- # new_bit = (NOT write_sel AND old_bit) OR (write_sel AND write_data_bit)
56
- for addr in range(256):
57
- add_gate(tensors, f"memory.write.sel.addr{addr}", [1.0, 1.0], [-2.0])
58
- add_gate(tensors, f"memory.write.nsel.addr{addr}", [-1.0], [0.0])
59
- for bit in range(8):
60
- add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.and_old", [1.0, 1.0], [-2.0])
61
- add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.and_new", [1.0, 1.0], [-2.0])
62
- add_gate(tensors, f"memory.write.addr{addr}.bit{bit}.or", [1.0, 1.0], [-1.0])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
 
65
  def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
@@ -69,16 +99,15 @@ def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
69
  for bit in range(8):
70
  add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
71
  add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
 
72
  add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
73
 
74
 
75
  def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
76
- # Bump manifest version to reflect memory integration.
77
- key = "manifest.version"
78
- if key not in tensors:
79
- tensors[key] = torch.tensor([2.0], dtype=torch.float32)
80
- return
81
- tensors[key] = torch.tensor([2.0], dtype=torch.float32)
82
 
83
 
84
  def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
@@ -94,7 +123,19 @@ def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
94
 
95
  def main() -> None:
96
  tensors = load_tensors(MODEL_PATH)
97
- add_decoder_8to256(tensors)
 
 
 
 
 
 
 
 
 
 
 
 
98
  add_memory_read_mux(tensors)
99
  add_memory_write_cells(tensors)
100
  add_fetch_load_store_buffers(tensors)
 
1
  """
2
+ Generate 64KB memory circuits and fetch/load/store buffers for the 8-bit threshold computer.
3
  Updates neural_computer.safetensors and tensors.txt in-place.
4
  """
5
 
 
16
  MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
17
  MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
18
 
19
+ ADDR_BITS = 16
20
+ MEM_BYTES = 1 << ADDR_BITS
21
+
22
 
23
  def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
24
  tensors: Dict[str, torch.Tensor] = {}
 
37
  tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
38
 
39
 
40
+ def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None:
41
+ for key in list(tensors.keys()):
42
+ if any(key.startswith(prefix) for prefix in prefixes):
43
+ del tensors[key]
44
+
45
+
46
+ def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
47
+ weights = torch.empty((MEM_BYTES, ADDR_BITS), dtype=torch.float32)
48
+ bias = torch.empty((MEM_BYTES,), dtype=torch.float32)
49
+ for addr in range(MEM_BYTES):
50
+ bits = [(addr >> (ADDR_BITS - 1 - i)) & 1 for i in range(ADDR_BITS)] # MSB-first
51
+ weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
52
+ bias[addr] = -float(sum(bits))
53
+ tensors["memory.addr_decode.weight"] = weights
54
+ tensors["memory.addr_decode.bias"] = bias
55
 
56
 
57
  def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
58
+ # Packed AND/OR weights for read mux.
59
+ and_weight = torch.ones((8, MEM_BYTES, 2), dtype=torch.float32)
60
+ and_bias = torch.full((8, MEM_BYTES), -2.0, dtype=torch.float32)
61
+ or_weight = torch.ones((8, MEM_BYTES), dtype=torch.float32)
62
+ or_bias = torch.full((8,), -1.0, dtype=torch.float32)
63
+ tensors["memory.read.and.weight"] = and_weight
64
+ tensors["memory.read.and.bias"] = and_bias
65
+ tensors["memory.read.or.weight"] = or_weight
66
+ tensors["memory.read.or.bias"] = or_bias
67
 
68
 
69
  def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
70
+ # Packed write gate weights.
71
+ sel_weight = torch.ones((MEM_BYTES, 2), dtype=torch.float32)
72
+ sel_bias = torch.full((MEM_BYTES,), -2.0, dtype=torch.float32)
73
+ nsel_weight = torch.full((MEM_BYTES, 1), -1.0, dtype=torch.float32)
74
+ nsel_bias = torch.zeros((MEM_BYTES,), dtype=torch.float32)
75
+
76
+ and_old_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
77
+ and_old_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
78
+ and_new_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
79
+ and_new_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
80
+ or_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
81
+ or_bias = torch.full((MEM_BYTES, 8), -1.0, dtype=torch.float32)
82
+
83
+ tensors["memory.write.sel.weight"] = sel_weight
84
+ tensors["memory.write.sel.bias"] = sel_bias
85
+ tensors["memory.write.nsel.weight"] = nsel_weight
86
+ tensors["memory.write.nsel.bias"] = nsel_bias
87
+ tensors["memory.write.and_old.weight"] = and_old_weight
88
+ tensors["memory.write.and_old.bias"] = and_old_bias
89
+ tensors["memory.write.and_new.weight"] = and_new_weight
90
+ tensors["memory.write.and_new.bias"] = and_new_bias
91
+ tensors["memory.write.or.weight"] = or_weight
92
+ tensors["memory.write.or.bias"] = or_bias
93
 
94
 
95
  def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
 
99
  for bit in range(8):
100
  add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
101
  add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
102
+ for bit in range(ADDR_BITS):
103
  add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
104
 
105
 
106
  def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
107
+ # Update manifest constants to reflect 16-bit address space.
108
+ tensors["manifest.memory_bytes"] = torch.tensor([float(MEM_BYTES)], dtype=torch.float32)
109
+ tensors["manifest.pc_width"] = torch.tensor([float(ADDR_BITS)], dtype=torch.float32)
110
+ tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
 
 
111
 
112
 
113
  def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
 
123
 
124
  def main() -> None:
125
  tensors = load_tensors(MODEL_PATH)
126
+ drop_prefixes(
127
+ tensors,
128
+ [
129
+ "memory.addr_decode.",
130
+ "memory.read.",
131
+ "memory.write.",
132
+ "control.fetch.ir.",
133
+ "control.load.",
134
+ "control.store.",
135
+ "control.mem_addr.",
136
+ ],
137
+ )
138
+ add_decoder(tensors)
139
  add_memory_read_mux(tensors)
140
  add_memory_write_cells(tensors)
141
  add_fetch_load_store_buffers(tensors)
eval/comprehensive_eval.py CHANGED
@@ -1900,12 +1900,12 @@ class CircuitEvaluator:
1900
  ('manifest.alu_operations', 16),
1901
  ('manifest.flags', 4),
1902
  ('manifest.instruction_width', 16),
1903
- ('manifest.memory_bytes', 256),
1904
- ('manifest.pc_width', 8),
1905
  ('manifest.register_width', 8),
1906
  ('manifest.registers', 4),
1907
  ('manifest.turing_complete', 1),
1908
- ('manifest.version', 2),
1909
  ]
1910
 
1911
  failures = []
@@ -2200,61 +2200,79 @@ class CircuitEvaluator:
2200
  # MEMORY CIRCUITS
2201
  # =========================================================================
2202
 
2203
- def test_memory_decoder_8to256(self) -> TestResult:
2204
- """Test 8-to-256 address decoder exhaustively."""
2205
  failures = []
2206
  passed = 0
2207
- total = 256 * 256
 
2208
 
2209
- for addr in range(256):
2210
- addr_bits = torch.tensor([(addr >> (7 - i)) & 1 for i in range(8)],
 
 
 
2211
  device=self.device, dtype=torch.float32)
2212
 
2213
- for out_idx in range(256):
2214
- w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
2215
- b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
2216
- output = heaviside((addr_bits * w).sum() + b).item()
2217
- expected = 1.0 if out_idx == addr else 0.0
 
 
 
 
2218
 
2219
- if output == expected:
2220
- passed += 1
2221
- elif len(failures) < 20:
2222
- failures.append(((addr, out_idx), expected, output))
 
 
 
 
 
2223
 
2224
  return TestResult('memory.addr_decode', passed, total, failures)
2225
 
2226
  def test_memory_read_mux(self) -> TestResult:
2227
- """Test 256-byte memory read mux for a few representative addresses."""
2228
  failures = []
2229
  passed = 0
2230
  total = 0
2231
 
2232
- mem = [(addr * 37) & 0xFF for addr in range(256)]
2233
- test_addrs = [0, 1, 2, 127, 255]
 
 
 
 
 
 
 
 
2234
 
2235
  for addr in test_addrs:
2236
- addr_bits = torch.tensor([(addr >> (7 - i)) & 1 for i in range(8)],
2237
  device=self.device, dtype=torch.float32)
2238
 
2239
  selects = []
2240
- for out_idx in range(256):
2241
- w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
2242
- b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
2243
- selects.append(heaviside((addr_bits * w).sum() + b).item())
2244
 
2245
  for bit in range(8):
2246
  and_vals = []
2247
- for out_idx in range(256):
2248
  mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
2249
  inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
2250
- w = self.reg.get(f'memory.read.bit{bit}.and{out_idx}.weight')
2251
- b = self.reg.get(f'memory.read.bit{bit}.and{out_idx}.bias')
2252
  and_vals.append(heaviside((inp * w).sum() + b).item())
2253
 
2254
  or_inp = torch.tensor(and_vals, device=self.device)
2255
- w_or = self.reg.get(f'memory.read.bit{bit}.or.weight')
2256
- b_or = self.reg.get(f'memory.read.bit{bit}.or.bias')
2257
- output = heaviside((or_inp * w_or).sum() + b_or).item()
2258
  expected = float((mem[addr] >> (7 - bit)) & 1)
2259
 
2260
  total += 1
@@ -2271,49 +2289,58 @@ class CircuitEvaluator:
2271
  passed = 0
2272
  total = 0
2273
 
2274
- mem = [(addr * 13 + 7) & 0xFF for addr in range(256)]
 
2275
  test_cases = [
2276
  (0xA5, 42, 1.0),
2277
- (0x3C, 200, 0.0),
2278
  ]
2279
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2280
  for write_data, write_addr, write_en in test_cases:
2281
- addr_bits = torch.tensor([(write_addr >> (7 - i)) & 1 for i in range(8)],
2282
  device=self.device, dtype=torch.float32)
2283
 
2284
- decodes = []
2285
- for out_idx in range(256):
2286
- w = self.reg.get(f'memory.addr_decode.out{out_idx}.weight')
2287
- b = self.reg.get(f'memory.addr_decode.out{out_idx}.bias')
2288
- decodes.append(heaviside((addr_bits * w).sum() + b).item())
2289
 
2290
- for out_idx in range(256):
2291
  sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
2292
- w_sel = self.reg.get(f'memory.write.sel.addr{out_idx}.weight')
2293
- b_sel = self.reg.get(f'memory.write.sel.addr{out_idx}.bias')
2294
- sel = heaviside((sel_inp * w_sel).sum() + b_sel).item()
2295
 
2296
- w_nsel = self.reg.get(f'memory.write.nsel.addr{out_idx}.weight')
2297
- b_nsel = self.reg.get(f'memory.write.nsel.addr{out_idx}.bias')
2298
- nsel = heaviside(sel * w_nsel + b_nsel).item()
2299
 
2300
  for bit in range(8):
2301
  old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
2302
  data_bit = float((write_data >> (7 - bit)) & 1)
2303
 
2304
  inp_old = torch.tensor([old_bit, nsel], device=self.device)
2305
- w_old = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_old.weight')
2306
- b_old = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_old.bias')
2307
  and_old = heaviside((inp_old * w_old).sum() + b_old).item()
2308
 
2309
  inp_new = torch.tensor([data_bit, sel], device=self.device)
2310
- w_new = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_new.weight')
2311
- b_new = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.and_new.bias')
2312
  and_new = heaviside((inp_new * w_new).sum() + b_new).item()
2313
 
2314
  inp_or = torch.tensor([and_old, and_new], device=self.device)
2315
- w_or = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.or.weight')
2316
- b_or = self.reg.get(f'memory.write.addr{out_idx}.bit{bit}.or.bias')
2317
  output = heaviside((inp_or * w_or).sum() + b_or).item()
2318
 
2319
  expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
@@ -2339,15 +2366,88 @@ class CircuitEvaluator:
2339
  passed += 2
2340
 
2341
  for bit in range(8):
2342
- for name in ['control.load', 'control.store', 'control.mem_addr']:
2343
  total += 2
2344
  if self.reg.has(f'{name}.bit{bit}.weight'):
2345
  self.reg.get(f'{name}.bit{bit}.weight')
2346
  self.reg.get(f'{name}.bit{bit}.bias')
2347
  passed += 2
2348
 
 
 
 
 
 
 
 
2349
  return TestResult('control.fetch_load_store', passed, total, [])
2350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2351
  # =========================================================================
2352
  # ARITHMETIC - ADDITIONAL CIRCUITS
2353
  # =========================================================================
@@ -3010,10 +3110,11 @@ class ComprehensiveEvaluator:
3010
  # Memory
3011
  if verbose:
3012
  print("\n=== MEMORY ===")
3013
- self._run_test(self.evaluator.test_memory_decoder_8to256, verbose)
3014
  self._run_test(self.evaluator.test_memory_read_mux, verbose)
3015
  self._run_test(self.evaluator.test_memory_write_cells, verbose)
3016
  self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
 
3017
 
3018
  # Error detection
3019
  if verbose:
 
1900
  ('manifest.alu_operations', 16),
1901
  ('manifest.flags', 4),
1902
  ('manifest.instruction_width', 16),
1903
+ ('manifest.memory_bytes', 65536),
1904
+ ('manifest.pc_width', 16),
1905
  ('manifest.register_width', 8),
1906
  ('manifest.registers', 4),
1907
  ('manifest.turing_complete', 1),
1908
+ ('manifest.version', 3),
1909
  ]
1910
 
1911
  failures = []
 
2200
  # MEMORY CIRCUITS
2201
  # =========================================================================
2202
 
2203
+ def test_memory_decoder_16to65536(self) -> TestResult:
2204
+ """Test 16-to-65536 address decoder with full-address coverage."""
2205
  failures = []
2206
  passed = 0
2207
+ mem_size = 1 << 16
2208
+ total = mem_size * 2
2209
 
2210
+ w_all = self.reg.get('memory.addr_decode.weight')
2211
+ b_all = self.reg.get('memory.addr_decode.bias')
2212
+
2213
+ for addr in range(mem_size):
2214
+ addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
2215
  device=self.device, dtype=torch.float32)
2216
 
2217
+ out_idx = addr
2218
+ w = w_all[out_idx]
2219
+ b = b_all[out_idx]
2220
+ output = heaviside((addr_bits * w).sum() + b).item()
2221
+ expected = 1.0
2222
+ if output == expected:
2223
+ passed += 1
2224
+ elif len(failures) < 20:
2225
+ failures.append(((addr, out_idx), expected, output))
2226
 
2227
+ out_idx = (addr + 1) & 0xFFFF
2228
+ w = w_all[out_idx]
2229
+ b = b_all[out_idx]
2230
+ output = heaviside((addr_bits * w).sum() + b).item()
2231
+ expected = 0.0
2232
+ if output == expected:
2233
+ passed += 1
2234
+ elif len(failures) < 20:
2235
+ failures.append(((addr, out_idx), expected, output))
2236
 
2237
  return TestResult('memory.addr_decode', passed, total, failures)
2238
 
2239
  def test_memory_read_mux(self) -> TestResult:
2240
+ """Test 64KB memory read mux for a few representative addresses."""
2241
  failures = []
2242
  passed = 0
2243
  total = 0
2244
 
2245
+ mem_size = 1 << 16
2246
+ mem = [(addr * 37) & 0xFF for addr in range(mem_size)]
2247
+ test_addrs = [0x0000, 0x1234, 0xFFFF]
2248
+
2249
+ dec_w = self.reg.get('memory.addr_decode.weight')
2250
+ dec_b = self.reg.get('memory.addr_decode.bias')
2251
+ and_w = self.reg.get('memory.read.and.weight')
2252
+ and_b = self.reg.get('memory.read.and.bias')
2253
+ or_w = self.reg.get('memory.read.or.weight')
2254
+ or_b = self.reg.get('memory.read.or.bias')
2255
 
2256
  for addr in test_addrs:
2257
+ addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
2258
  device=self.device, dtype=torch.float32)
2259
 
2260
  selects = []
2261
+ for out_idx in range(mem_size):
2262
+ output = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
2263
+ selects.append(output)
 
2264
 
2265
  for bit in range(8):
2266
  and_vals = []
2267
+ for out_idx in range(mem_size):
2268
  mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
2269
  inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
2270
+ w = and_w[bit, out_idx]
2271
+ b = and_b[bit, out_idx]
2272
  and_vals.append(heaviside((inp * w).sum() + b).item())
2273
 
2274
  or_inp = torch.tensor(and_vals, device=self.device)
2275
+ output = heaviside((or_inp * or_w[bit]).sum() + or_b[bit]).item()
 
 
2276
  expected = float((mem[addr] >> (7 - bit)) & 1)
2277
 
2278
  total += 1
 
2289
  passed = 0
2290
  total = 0
2291
 
2292
+ mem_size = 1 << 16
2293
+ mem = [(addr * 13 + 7) & 0xFF for addr in range(mem_size)]
2294
  test_cases = [
2295
  (0xA5, 42, 1.0),
2296
+ (0x3C, 0xBEEF, 0.0),
2297
  ]
2298
 
2299
+ dec_w = self.reg.get('memory.addr_decode.weight')
2300
+ dec_b = self.reg.get('memory.addr_decode.bias')
2301
+ sel_w = self.reg.get('memory.write.sel.weight')
2302
+ sel_b = self.reg.get('memory.write.sel.bias')
2303
+ nsel_w = self.reg.get('memory.write.nsel.weight')
2304
+ nsel_b = self.reg.get('memory.write.nsel.bias')
2305
+ and_old_w = self.reg.get('memory.write.and_old.weight')
2306
+ and_old_b = self.reg.get('memory.write.and_old.bias')
2307
+ and_new_w = self.reg.get('memory.write.and_new.weight')
2308
+ and_new_b = self.reg.get('memory.write.and_new.bias')
2309
+ or_w = self.reg.get('memory.write.or.weight')
2310
+ or_b = self.reg.get('memory.write.or.bias')
2311
+
2312
  for write_data, write_addr, write_en in test_cases:
2313
+ addr_bits = torch.tensor([(write_addr >> (15 - i)) & 1 for i in range(16)],
2314
  device=self.device, dtype=torch.float32)
2315
 
2316
+ sample_addrs = [write_addr, (write_addr + 1) & 0xFFFF, 0x0000, 0xFFFF]
2317
+ decodes = {}
2318
+ for out_idx in sample_addrs:
2319
+ decodes[out_idx] = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
 
2320
 
2321
+ for out_idx in sample_addrs:
2322
  sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
2323
+ sel = heaviside((sel_inp * sel_w[out_idx]).sum() + sel_b[out_idx]).item()
 
 
2324
 
2325
+ nsel = heaviside(sel * nsel_w[out_idx] + nsel_b[out_idx]).item()
 
 
2326
 
2327
  for bit in range(8):
2328
  old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
2329
  data_bit = float((write_data >> (7 - bit)) & 1)
2330
 
2331
  inp_old = torch.tensor([old_bit, nsel], device=self.device)
2332
+ w_old = and_old_w[out_idx, bit]
2333
+ b_old = and_old_b[out_idx, bit]
2334
  and_old = heaviside((inp_old * w_old).sum() + b_old).item()
2335
 
2336
  inp_new = torch.tensor([data_bit, sel], device=self.device)
2337
+ w_new = and_new_w[out_idx, bit]
2338
+ b_new = and_new_b[out_idx, bit]
2339
  and_new = heaviside((inp_new * w_new).sum() + b_new).item()
2340
 
2341
  inp_or = torch.tensor([and_old, and_new], device=self.device)
2342
+ w_or = or_w[out_idx, bit]
2343
+ b_or = or_b[out_idx, bit]
2344
  output = heaviside((inp_or * w_or).sum() + b_or).item()
2345
 
2346
  expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
 
2366
  passed += 2
2367
 
2368
  for bit in range(8):
2369
+ for name in ['control.load', 'control.store']:
2370
  total += 2
2371
  if self.reg.has(f'{name}.bit{bit}.weight'):
2372
  self.reg.get(f'{name}.bit{bit}.weight')
2373
  self.reg.get(f'{name}.bit{bit}.bias')
2374
  passed += 2
2375
 
2376
+ for bit in range(16):
2377
+ total += 2
2378
+ if self.reg.has(f'control.mem_addr.bit{bit}.weight'):
2379
+ self.reg.get(f'control.mem_addr.bit{bit}.weight')
2380
+ self.reg.get(f'control.mem_addr.bit{bit}.bias')
2381
+ passed += 2
2382
+
2383
  return TestResult('control.fetch_load_store', passed, total, [])
2384
 
2385
+ def test_packed_memory_routing(self) -> TestResult:
2386
+ """Validate packed memory tensor routing and shapes."""
2387
+ failures = []
2388
+ passed = 0
2389
+ total = 0
2390
+
2391
+ circuits = ["memory.addr_decode", "memory.read", "memory.write"]
2392
+ routing = self.routing_eval.routing.get("circuits", {})
2393
+ routing_keys = set()
2394
+
2395
+ for circuit in circuits:
2396
+ total += 1
2397
+ if circuit not in routing:
2398
+ failures.append((circuit, "routing", "missing"))
2399
+ continue
2400
+ passed += 1
2401
+ internal = routing[circuit].get("internal", {})
2402
+ for value in internal.values():
2403
+ if isinstance(value, list):
2404
+ routing_keys.update(value)
2405
+
2406
+ total += 1
2407
+ if routing_keys and all(key for key in routing_keys):
2408
+ passed += 1
2409
+ else:
2410
+ failures.append(("packed_keys", "non-empty", "empty"))
2411
+
2412
+ mem_bytes = int(self.reg.get("manifest.memory_bytes").item()) if self.reg.has("manifest.memory_bytes") else 65536
2413
+ pc_width = int(self.reg.get("manifest.pc_width").item()) if self.reg.has("manifest.pc_width") else 16
2414
+ reg_width = int(self.reg.get("manifest.register_width").item()) if self.reg.has("manifest.register_width") else 8
2415
+
2416
+ expected_shapes = {
2417
+ "memory.addr_decode.weight": (mem_bytes, pc_width),
2418
+ "memory.addr_decode.bias": (mem_bytes,),
2419
+ "memory.read.and.weight": (reg_width, mem_bytes, 2),
2420
+ "memory.read.and.bias": (reg_width, mem_bytes),
2421
+ "memory.read.or.weight": (reg_width, mem_bytes),
2422
+ "memory.read.or.bias": (reg_width,),
2423
+ "memory.write.sel.weight": (mem_bytes, 2),
2424
+ "memory.write.sel.bias": (mem_bytes,),
2425
+ "memory.write.nsel.weight": (mem_bytes, 1),
2426
+ "memory.write.nsel.bias": (mem_bytes,),
2427
+ "memory.write.and_old.weight": (mem_bytes, reg_width, 2),
2428
+ "memory.write.and_old.bias": (mem_bytes, reg_width),
2429
+ "memory.write.and_new.weight": (mem_bytes, reg_width, 2),
2430
+ "memory.write.and_new.bias": (mem_bytes, reg_width),
2431
+ "memory.write.or.weight": (mem_bytes, reg_width, 2),
2432
+ "memory.write.or.bias": (mem_bytes, reg_width),
2433
+ }
2434
+
2435
+ for key, expected in expected_shapes.items():
2436
+ total += 1
2437
+ if key not in routing_keys:
2438
+ failures.append((key, "routing_ref", "missing"))
2439
+ continue
2440
+ if not self.reg.has(key):
2441
+ failures.append((key, "tensor_exists", "missing"))
2442
+ continue
2443
+ actual = tuple(self.reg.get(key).shape)
2444
+ if actual == expected:
2445
+ passed += 1
2446
+ else:
2447
+ failures.append((key, expected, actual))
2448
+
2449
+ return TestResult('memory.packed_routing', passed, total, failures)
2450
+
2451
  # =========================================================================
2452
  # ARITHMETIC - ADDITIONAL CIRCUITS
2453
  # =========================================================================
 
3110
  # Memory
3111
  if verbose:
3112
  print("\n=== MEMORY ===")
3113
+ self._run_test(self.evaluator.test_memory_decoder_16to65536, verbose)
3114
  self._run_test(self.evaluator.test_memory_read_mux, verbose)
3115
  self._run_test(self.evaluator.test_memory_write_cells, verbose)
3116
  self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
3117
+ self._run_test(self.evaluator.test_packed_memory_routing, verbose)
3118
 
3119
  # Error detection
3120
  if verbose:
eval/cpu_cycle_test.py CHANGED
@@ -7,8 +7,11 @@ from pathlib import Path
7
 
8
  sys.path.append(str(Path(__file__).resolve().parent.parent))
9
 
 
 
10
  from cpu.cycle import run_until_halt
11
- from cpu.state import CPUState
 
12
 
13
 
14
  def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
@@ -16,28 +19,36 @@ def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
16
 
17
 
18
  def write_instr(mem, addr, instr):
19
- mem[addr & 0xFF] = (instr >> 8) & 0xFF
20
- mem[(addr + 1) & 0xFF] = instr & 0xFF
 
 
 
 
 
21
 
22
 
23
  def main() -> None:
24
- mem = [0] * 256
25
 
26
- write_instr(mem, 0x00, encode(0xA, 0, 0, 0x10)) # LOAD R0, [0x10]
27
- write_instr(mem, 0x02, encode(0xA, 1, 0, 0x11)) # LOAD R1, [0x11]
28
- write_instr(mem, 0x04, encode(0x0, 0, 1, 0x00)) # ADD R0, R1
29
- write_instr(mem, 0x06, encode(0xB, 0, 0, 0x12)) # STORE R0 -> [0x12]
30
- write_instr(mem, 0x08, encode(0xF, 0, 0, 0x00)) # HALT
 
 
 
31
 
32
- mem[0x10] = 5
33
- mem[0x11] = 7
34
 
35
  state = CPUState(
36
  pc=0,
37
  ir=0,
38
  regs=[0, 0, 0, 0],
39
  flags=[0, 0, 0, 0],
40
- sp=0xFF,
41
  ctrl=[0, 0, 0, 0],
42
  mem=mem,
43
  )
@@ -46,9 +57,29 @@ def main() -> None:
46
 
47
  assert final.ctrl[0] == 1, "HALT flag not set"
48
  assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
49
- assert final.mem[0x12] == 12, f"MEM[0x12] expected 12, got {final.mem[0x12]}"
50
  assert cycles <= 10, f"Unexpected cycle count: {cycles}"
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  print("cpu_cycle_test: ok")
53
 
54
 
 
7
 
8
  sys.path.append(str(Path(__file__).resolve().parent.parent))
9
 
10
+ import torch
11
+
12
  from cpu.cycle import run_until_halt
13
+ from cpu.state import CPUState, pack_state, unpack_state
14
+ from cpu.threshold_cpu import ThresholdCPU
15
 
16
 
17
  def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
 
19
 
20
 
21
  def write_instr(mem, addr, instr):
22
+ mem[addr & 0xFFFF] = (instr >> 8) & 0xFF
23
+ mem[(addr + 1) & 0xFFFF] = instr & 0xFF
24
+
25
+
26
+ def write_addr(mem, addr, value):
27
+ mem[addr & 0xFFFF] = (value >> 8) & 0xFF
28
+ mem[(addr + 1) & 0xFFFF] = value & 0xFF
29
 
30
 
31
  def main() -> None:
32
+ mem = [0] * 65536
33
 
34
+ write_instr(mem, 0x0000, encode(0xA, 0, 0, 0x00)) # LOAD R0, [addr]
35
+ write_addr(mem, 0x0002, 0x0100)
36
+ write_instr(mem, 0x0004, encode(0xA, 1, 0, 0x00)) # LOAD R1, [addr]
37
+ write_addr(mem, 0x0006, 0x0101)
38
+ write_instr(mem, 0x0008, encode(0x0, 0, 1, 0x00)) # ADD R0, R1
39
+ write_instr(mem, 0x000A, encode(0xB, 0, 0, 0x00)) # STORE R0 -> [addr]
40
+ write_addr(mem, 0x000C, 0x0102)
41
+ write_instr(mem, 0x000E, encode(0xF, 0, 0, 0x00)) # HALT
42
 
43
+ mem[0x0100] = 5
44
+ mem[0x0101] = 7
45
 
46
  state = CPUState(
47
  pc=0,
48
  ir=0,
49
  regs=[0, 0, 0, 0],
50
  flags=[0, 0, 0, 0],
51
+ sp=0xFFFE,
52
  ctrl=[0, 0, 0, 0],
53
  mem=mem,
54
  )
 
57
 
58
  assert final.ctrl[0] == 1, "HALT flag not set"
59
  assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
60
+ assert final.mem[0x0102] == 12, f"MEM[0x0102] expected 12, got {final.mem[0x0102]}"
61
  assert cycles <= 10, f"Unexpected cycle count: {cycles}"
62
 
63
+ # Threshold-weight runtime should match reference behavior.
64
+ threshold_cpu = ThresholdCPU()
65
+ t_final, t_cycles = threshold_cpu.run_until_halt(state, max_cycles=20)
66
+
67
+ assert t_final.ctrl[0] == 1, "Threshold HALT flag not set"
68
+ assert t_final.regs[0] == final.regs[0], f"Threshold R0 mismatch: {t_final.regs[0]} != {final.regs[0]}"
69
+ assert t_final.mem[0x0102] == final.mem[0x0102], (
70
+ f"Threshold MEM[0x0102] mismatch: {t_final.mem[0x0102]} != {final.mem[0x0102]}"
71
+ )
72
+ assert t_cycles == cycles, f"Threshold cycle count mismatch: {t_cycles} != {cycles}"
73
+
74
+ # Validate forward() state I/O.
75
+ bits = torch.tensor(pack_state(state), dtype=torch.float32)
76
+ out_bits = threshold_cpu.forward(bits, max_cycles=20)
77
+ out_state = unpack_state([int(b) for b in out_bits.tolist()])
78
+ assert out_state.regs[0] == final.regs[0], f"Forward R0 mismatch: {out_state.regs[0]} != {final.regs[0]}"
79
+ assert out_state.mem[0x0102] == final.mem[0x0102], (
80
+ f"Forward MEM[0x0102] mismatch: {out_state.mem[0x0102]} != {final.mem[0x0102]}"
81
+ )
82
+
83
  print("cpu_cycle_test: ok")
84
 
85
 
eval/iron_eval.py CHANGED
@@ -8,9 +8,11 @@ GPU-optimized for population-based evolution.
8
  Target: ~40GB VRAM on RTX 6000 Ada (4M population)
9
  """
10
 
11
- import torch
12
- from typing import Dict, Tuple
13
- from safetensors import safe_open
 
 
14
 
15
 
16
  def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
@@ -32,10 +34,20 @@ class BatchedFitnessEvaluator:
32
  GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
33
  """
34
 
35
- def __init__(self, device='cuda'):
36
- self.device = device
37
- self._setup_tests()
38
-
 
 
 
 
 
 
 
 
 
 
39
  def _setup_tests(self):
40
  """Pre-compute all test vectors."""
41
  d = self.device
@@ -3146,10 +3158,10 @@ class BatchedFitnessEvaluator:
3146
 
3147
  return scores, total_tests
3148
 
3149
- def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
3150
- """
3151
- MANIFEST - Verify manifest values are preserved.
3152
- """
3153
  pop_size = next(iter(pop.values())).shape[0]
3154
  scores = torch.zeros(pop_size, device=self.device)
3155
  total_tests = 0
@@ -3158,12 +3170,12 @@ class BatchedFitnessEvaluator:
3158
  ('manifest.alu_operations', 16),
3159
  ('manifest.flags', 4),
3160
  ('manifest.instruction_width', 16),
3161
- ('manifest.memory_bytes', 256),
3162
- ('manifest.pc_width', 8),
3163
  ('manifest.register_width', 8),
3164
  ('manifest.registers', 4),
3165
  ('manifest.turing_complete', 1),
3166
- ('manifest.version', 2),
3167
  ]
3168
 
3169
  for tensor_name, expected_value in manifest_tensors:
@@ -3175,7 +3187,79 @@ class BatchedFitnessEvaluator:
3175
  if debug and pop_size == 1:
3176
  print(f" Manifest: {int(scores[0].item())}/{total_tests}")
3177
 
3178
- return scores, total_tests
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3179
 
3180
  def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
3181
  """
@@ -3328,13 +3412,17 @@ class BatchedFitnessEvaluator:
3328
  total_scores += incdec_scores
3329
  total_tests += incdec_tests
3330
 
3331
- manifest_scores, manifest_tests = self._test_manifest(pop, debug)
3332
- total_scores += manifest_scores
3333
- total_tests += manifest_tests
3334
-
3335
- eq_scores, eq_tests = self._test_equality_circuit(pop, debug)
3336
- total_scores += eq_scores
3337
- total_tests += eq_tests
 
 
 
 
3338
 
3339
  minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
3340
  total_scores += minmax_scores
 
8
  Target: ~40GB VRAM on RTX 6000 Ada (4M population)
9
  """
10
 
11
+ import json
12
+ import os
13
+ import torch
14
+ from typing import Dict, Tuple
15
+ from safetensors import safe_open
16
 
17
 
18
  def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
 
34
  GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
35
  """
36
 
37
+ def __init__(self, device='cuda'):
38
+ self.device = device
39
+ self.routing = self._load_routing()
40
+ self._setup_tests()
41
+
42
+ def _load_routing(self) -> Dict:
43
+ """Load routing.json for packed memory validation."""
44
+ root = os.path.dirname(os.path.dirname(__file__))
45
+ path = os.path.join(root, "routing.json")
46
+ if os.path.exists(path):
47
+ with open(path, "r", encoding="utf-8") as fh:
48
+ return json.load(fh)
49
+ return {"circuits": {}}
50
+
51
  def _setup_tests(self):
52
  """Pre-compute all test vectors."""
53
  d = self.device
 
3158
 
3159
  return scores, total_tests
3160
 
3161
+ def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
3162
+ """
3163
+ MANIFEST - Verify manifest values are preserved.
3164
+ """
3165
  pop_size = next(iter(pop.values())).shape[0]
3166
  scores = torch.zeros(pop_size, device=self.device)
3167
  total_tests = 0
 
3170
  ('manifest.alu_operations', 16),
3171
  ('manifest.flags', 4),
3172
  ('manifest.instruction_width', 16),
3173
+ ('manifest.memory_bytes', 65536),
3174
+ ('manifest.pc_width', 16),
3175
  ('manifest.register_width', 8),
3176
  ('manifest.registers', 4),
3177
  ('manifest.turing_complete', 1),
3178
+ ('manifest.version', 3),
3179
  ]
3180
 
3181
  for tensor_name, expected_value in manifest_tensors:
 
3187
  if debug and pop_size == 1:
3188
  print(f" Manifest: {int(scores[0].item())}/{total_tests}")
3189
 
3190
+ return scores, total_tests
3191
+
3192
+ def _test_packed_memory_routing(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
3193
+ """
3194
+ PACKED MEMORY ROUTING - Validate routing references and tensor shapes.
3195
+ """
3196
+ pop_size = next(iter(pop.values())).shape[0]
3197
+ scores = torch.zeros(pop_size, device=self.device)
3198
+ total_tests = 0
3199
+
3200
+ routing = self.routing.get("circuits", {})
3201
+ circuits = ["memory.addr_decode", "memory.read", "memory.write"]
3202
+ routing_keys = set()
3203
+
3204
+ for circuit in circuits:
3205
+ total_tests += 1
3206
+ if circuit not in routing:
3207
+ continue
3208
+ scores += 1
3209
+ internal = routing[circuit].get("internal", {})
3210
+ for value in internal.values():
3211
+ if isinstance(value, list):
3212
+ routing_keys.update(value)
3213
+
3214
+ total_tests += 1
3215
+ if routing_keys and all(key for key in routing_keys):
3216
+ scores += 1
3217
+
3218
+ if "manifest.memory_bytes" in pop:
3219
+ mem_bytes = int(pop["manifest.memory_bytes"][0].item())
3220
+ else:
3221
+ mem_bytes = 65536
3222
+ if "manifest.pc_width" in pop:
3223
+ pc_width = int(pop["manifest.pc_width"][0].item())
3224
+ else:
3225
+ pc_width = 16
3226
+ if "manifest.register_width" in pop:
3227
+ reg_width = int(pop["manifest.register_width"][0].item())
3228
+ else:
3229
+ reg_width = 8
3230
+
3231
+ expected_shapes = {
3232
+ "memory.addr_decode.weight": (pop_size, mem_bytes, pc_width),
3233
+ "memory.addr_decode.bias": (pop_size, mem_bytes),
3234
+ "memory.read.and.weight": (pop_size, reg_width, mem_bytes, 2),
3235
+ "memory.read.and.bias": (pop_size, reg_width, mem_bytes),
3236
+ "memory.read.or.weight": (pop_size, reg_width, mem_bytes),
3237
+ "memory.read.or.bias": (pop_size, reg_width),
3238
+ "memory.write.sel.weight": (pop_size, mem_bytes, 2),
3239
+ "memory.write.sel.bias": (pop_size, mem_bytes),
3240
+ "memory.write.nsel.weight": (pop_size, mem_bytes, 1),
3241
+ "memory.write.nsel.bias": (pop_size, mem_bytes),
3242
+ "memory.write.and_old.weight": (pop_size, mem_bytes, reg_width, 2),
3243
+ "memory.write.and_old.bias": (pop_size, mem_bytes, reg_width),
3244
+ "memory.write.and_new.weight": (pop_size, mem_bytes, reg_width, 2),
3245
+ "memory.write.and_new.bias": (pop_size, mem_bytes, reg_width),
3246
+ "memory.write.or.weight": (pop_size, mem_bytes, reg_width, 2),
3247
+ "memory.write.or.bias": (pop_size, mem_bytes, reg_width),
3248
+ }
3249
+
3250
+ for key, expected in expected_shapes.items():
3251
+ total_tests += 1
3252
+ if key not in routing_keys:
3253
+ continue
3254
+ if key not in pop:
3255
+ continue
3256
+ if tuple(pop[key].shape) == expected:
3257
+ scores += 1
3258
+
3259
+ if debug and pop_size == 1:
3260
+ print(f" Packed Memory Routing: {int(scores[0].item())}/{total_tests}")
3261
+
3262
+ return scores, total_tests
3263
 
3264
  def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
3265
  """
 
3412
  total_scores += incdec_scores
3413
  total_tests += incdec_tests
3414
 
3415
+ manifest_scores, manifest_tests = self._test_manifest(pop, debug)
3416
+ total_scores += manifest_scores
3417
+ total_tests += manifest_tests
3418
+
3419
+ packed_scores, packed_tests = self._test_packed_memory_routing(pop, debug)
3420
+ total_scores += packed_scores
3421
+ total_tests += packed_tests
3422
+
3423
+ eq_scores, eq_tests = self._test_equality_circuit(pop, debug)
3424
+ total_scores += eq_scores
3425
+ total_tests += eq_tests
3426
 
3427
  minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
3428
  total_scores += minmax_scores
neural_computer.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9c40d35edb9ed7d37c0454b3aacde3d8effc68dfbc707b68ae1feb585836581f
3
- size 2525316
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba0c0e7e6286bc5a55d66ecbda8a1d43084a72e6a960d898b268fb6558c473a4
3
+ size 33725820
routing.json CHANGED
The diff for this file is too large to render. See raw diff
 
routing/generate_routing.py CHANGED
@@ -5,7 +5,10 @@ Maps each gate to its input sources.
5
 
6
  import json
7
  from safetensors import safe_open
8
- from collections import defaultdict
 
 
 
9
 
10
  def get_all_gates(tensors_path):
11
  """Extract all unique gate paths from tensors file."""
@@ -423,12 +426,12 @@ def generate_manifest_routing():
423
  'manifest.alu_operations': {'type': 'constant', 'value': 16},
424
  'manifest.flags': {'type': 'constant', 'value': 4},
425
  'manifest.instruction_width': {'type': 'constant', 'value': 16},
426
- 'manifest.memory_bytes': {'type': 'constant', 'value': 256},
427
- 'manifest.pc_width': {'type': 'constant', 'value': 8},
428
  'manifest.register_width': {'type': 'constant', 'value': 8},
429
  'manifest.registers': {'type': 'constant', 'value': 4},
430
  'manifest.turing_complete': {'type': 'constant', 'value': 1},
431
- 'manifest.version': {'type': 'constant', 'value': 2}
432
  }
433
 
434
 
@@ -1032,9 +1035,9 @@ def generate_control_routing():
1032
  'internal': internal_store
1033
  }
1034
 
1035
- internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(8)}
1036
  routing['control.mem_addr'] = {
1037
- 'inputs': ['$addr[0:7]'],
1038
  'type': 'buffer',
1039
  'internal': internal_mem_addr
1040
  }
@@ -1043,52 +1046,38 @@ def generate_control_routing():
1043
 
1044
 
1045
  def generate_memory_routing():
1046
- """Generate routing for memory decoder, read mux, and write cell update."""
1047
  routing = {}
1048
 
1049
- addr_bits = [f'$addr[{i}]' for i in range(8)]
1050
- internal_dec = {f'out{addr}': addr_bits for addr in range(256)}
1051
  routing['memory.addr_decode'] = {
1052
- 'inputs': ['$addr[0:7]'],
1053
- 'type': 'decoder',
1054
- 'internal': internal_dec
 
 
 
1055
  }
1056
 
1057
- internal_read = {}
1058
- for bit in range(8):
1059
- for addr in range(256):
1060
- internal_read[f'bit{bit}.and{addr}'] = [f'$mem[{addr}][{bit}]', f'$sel[{addr}]']
1061
- internal_read[f'bit{bit}.or'] = [f'bit{bit}.and{i}' for i in range(256)]
1062
-
1063
  routing['memory.read'] = {
1064
- 'inputs': ['$mem[0:255][0:7]', '$sel[0:255]'],
1065
- 'type': 'read_mux',
1066
- 'internal': internal_read,
1067
- 'outputs': {f'bit{bit}': f'bit{bit}.or' for bit in range(8)}
1068
- }
1069
-
1070
- internal_write = {}
1071
- for addr in range(256):
1072
- internal_write[f'sel.addr{addr}'] = [f'$sel[{addr}]', '$we']
1073
- internal_write[f'nsel.addr{addr}'] = [f'sel.addr{addr}']
1074
- for bit in range(8):
1075
- internal_write[f'addr{addr}.bit{bit}.and_old'] = [f'$mem[{addr}][{bit}]', f'nsel.addr{addr}']
1076
- internal_write[f'addr{addr}.bit{bit}.and_new'] = [f'$write_data[{bit}]', f'sel.addr{addr}']
1077
- internal_write[f'addr{addr}.bit{bit}.or'] = [
1078
- f'addr{addr}.bit{bit}.and_old',
1079
- f'addr{addr}.bit{bit}.and_new'
1080
- ]
1081
-
1082
- outputs = {
1083
- f'mem[{addr}][{bit}]': f'addr{addr}.bit{bit}.or'
1084
- for addr in range(256) for bit in range(8)
1085
  }
1086
 
1087
  routing['memory.write'] = {
1088
- 'inputs': ['$mem[0:255][0:7]', '$write_data[0:7]', '$sel[0:255]', '$we'],
1089
- 'type': 'write_mux',
1090
- 'internal': internal_write,
1091
- 'outputs': outputs
 
 
 
 
 
1092
  }
1093
 
1094
  return routing
 
5
 
6
  import json
7
  from safetensors import safe_open
8
+ from collections import defaultdict
9
+
10
+ ADDR_BITS = 16
11
+ MEM_BYTES = 1 << ADDR_BITS
12
 
13
  def get_all_gates(tensors_path):
14
  """Extract all unique gate paths from tensors file."""
 
426
  'manifest.alu_operations': {'type': 'constant', 'value': 16},
427
  'manifest.flags': {'type': 'constant', 'value': 4},
428
  'manifest.instruction_width': {'type': 'constant', 'value': 16},
429
+ 'manifest.memory_bytes': {'type': 'constant', 'value': 65536},
430
+ 'manifest.pc_width': {'type': 'constant', 'value': 16},
431
  'manifest.register_width': {'type': 'constant', 'value': 8},
432
  'manifest.registers': {'type': 'constant', 'value': 4},
433
  'manifest.turing_complete': {'type': 'constant', 'value': 1},
434
+ 'manifest.version': {'type': 'constant', 'value': 3}
435
  }
436
 
437
 
 
1035
  'internal': internal_store
1036
  }
1037
 
1038
+ internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(ADDR_BITS)}
1039
  routing['control.mem_addr'] = {
1040
+ 'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
1041
  'type': 'buffer',
1042
  'internal': internal_mem_addr
1043
  }
 
1046
 
1047
 
1048
  def generate_memory_routing():
1049
+ """Generate routing for packed memory decoder, read mux, and write cell update."""
1050
  routing = {}
1051
 
 
 
1052
  routing['memory.addr_decode'] = {
1053
+ 'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
1054
+ 'type': 'decoder_packed',
1055
+ 'internal': {
1056
+ 'weight': ['memory.addr_decode.weight'],
1057
+ 'bias': ['memory.addr_decode.bias'],
1058
+ }
1059
  }
1060
 
 
 
 
 
 
 
1061
  routing['memory.read'] = {
1062
+ 'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', f'$sel[0:{MEM_BYTES - 1}]'],
1063
+ 'type': 'read_mux_packed',
1064
+ 'internal': {
1065
+ 'and': ['memory.read.and.weight', 'memory.read.and.bias'],
1066
+ 'or': ['memory.read.or.weight', 'memory.read.or.bias'],
1067
+ },
1068
+ 'outputs': {f'bit{bit}': f'bit{bit}' for bit in range(8)}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1069
  }
1070
 
1071
  routing['memory.write'] = {
1072
+ 'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', '$write_data[0:7]', f'$sel[0:{MEM_BYTES - 1}]', '$we'],
1073
+ 'type': 'write_mux_packed',
1074
+ 'internal': {
1075
+ 'sel': ['memory.write.sel.weight', 'memory.write.sel.bias'],
1076
+ 'nsel': ['memory.write.nsel.weight', 'memory.write.nsel.bias'],
1077
+ 'and_old': ['memory.write.and_old.weight', 'memory.write.and_old.bias'],
1078
+ 'and_new': ['memory.write.and_new.weight', 'memory.write.and_new.bias'],
1079
+ 'or': ['memory.write.or.weight', 'memory.write.or.bias'],
1080
+ }
1081
  }
1082
 
1083
  return routing
routing/routing.json CHANGED
The diff for this file is too large to render. See raw diff
 
routing/routing_schema.md CHANGED
@@ -37,8 +37,11 @@ The routing file (`routing.json`) defines how gates are interconnected. Each ent
37
 
38
  6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
39
  - Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
40
-
41
- ## Circuit Types
 
 
 
42
 
43
  ### Single-Layer Gates
44
  Gates with just `.weight` and `.bias`:
@@ -77,30 +80,92 @@ Complex circuits with sub-components:
77
  }
78
  ```
79
 
80
- ### Bit-Indexed Circuits
81
- Circuits operating on multi-bit values:
82
- ```json
83
- "arithmetic.ripplecarry8bit": {
84
- "external_inputs": ["$a[0:7]", "$b[0:7]"],
85
- "gates": {
86
- "fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
87
- "fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
88
- ...
89
- }
90
- }
91
- ```
92
-
93
- ## Naming Conventions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  - External inputs: `$name` or `$name[bit]`
96
  - Constants: `#0`, `#1`
97
  - Internal gates: relative path from circuit root
98
  - Outputs: named in `outputs` section
99
 
100
- ## Validation Rules
101
-
102
- 1. Every gate in routing must exist in tensors file
103
- 2. Every tensor must have routing entry
104
- 3. Input count must match weight dimensions
105
- 4. No circular dependencies (DAG only)
106
- 5. All referenced sources must exist
 
 
37
 
38
  6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
39
  - Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
40
+
41
+ 7. **Packed memory tensors**: For 64KB memory, routing uses packed tensor blocks instead of per-gate entries.
42
+ - Example: `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`
43
+
44
+ ## Circuit Types
45
 
46
  ### Single-Layer Gates
47
  Gates with just `.weight` and `.bias`:
 
80
  }
81
  ```
82
 
83
+ ### Bit-Indexed Circuits
84
+ Circuits operating on multi-bit values:
85
+ ```json
86
+ "arithmetic.ripplecarry8bit": {
87
+ "external_inputs": ["$a[0:7]", "$b[0:7]"],
88
+ "gates": {
89
+ "fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
90
+ "fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
91
+ ...
92
+ }
93
+ }
94
+ ```
95
+
96
+ ### Packed Memory Circuits
97
+ 64KB memory routing uses packed tensors to avoid exploding the header size. The routing entry
98
+ declares a packed type and lists the tensor blocks used for the operation.
99
+
100
+ ```json
101
+ "memory.addr_decode": {
102
+ "inputs": ["$addr[0:15]"],
103
+ "type": "decoder_packed",
104
+ "internal": {
105
+ "weight": ["memory.addr_decode.weight"],
106
+ "bias": ["memory.addr_decode.bias"]
107
+ }
108
+ }
109
+
110
+ "memory.read": {
111
+ "inputs": ["$mem[0:65535][0:7]", "$sel[0:65535]"],
112
+ "type": "read_mux_packed",
113
+ "internal": {
114
+ "and": ["memory.read.and.weight", "memory.read.and.bias"],
115
+ "or": ["memory.read.or.weight", "memory.read.or.bias"]
116
+ },
117
+ "outputs": { "bit0": "bit0", "bit1": "bit1", "bit2": "bit2", "bit3": "bit3",
118
+ "bit4": "bit4", "bit5": "bit5", "bit6": "bit6", "bit7": "bit7" }
119
+ }
120
+
121
+ "memory.write": {
122
+ "inputs": ["$mem[0:65535][0:7]", "$write_data[0:7]", "$sel[0:65535]", "$we"],
123
+ "type": "write_mux_packed",
124
+ "internal": {
125
+ "sel": ["memory.write.sel.weight", "memory.write.sel.bias"],
126
+ "nsel": ["memory.write.nsel.weight", "memory.write.nsel.bias"],
127
+ "and_old": ["memory.write.and_old.weight", "memory.write.and_old.bias"],
128
+ "and_new": ["memory.write.and_new.weight", "memory.write.and_new.bias"],
129
+ "or": ["memory.write.or.weight", "memory.write.or.bias"]
130
+ }
131
+ }
132
+ ```
133
+
134
+ Packed tensor mapping (shapes assume 16-bit address, 8-bit data):
135
+ - `memory.addr_decode.weight`: [65536, 16]
136
+ - `memory.addr_decode.bias`: [65536]
137
+ - `memory.read.and.weight`: [8, 65536, 2]
138
+ - `memory.read.and.bias`: [8, 65536]
139
+ - `memory.read.or.weight`: [8, 65536]
140
+ - `memory.read.or.bias`: [8]
141
+ - `memory.write.sel.weight`: [65536, 2]
142
+ - `memory.write.sel.bias`: [65536]
143
+ - `memory.write.nsel.weight`: [65536, 1]
144
+ - `memory.write.nsel.bias`: [65536]
145
+ - `memory.write.and_old.weight`: [65536, 8, 2]
146
+ - `memory.write.and_old.bias`: [65536, 8]
147
+ - `memory.write.and_new.weight`: [65536, 8, 2]
148
+ - `memory.write.and_new.bias`: [65536, 8]
149
+ - `memory.write.or.weight`: [65536, 8, 2]
150
+ - `memory.write.or.bias`: [65536, 8]
151
+
152
+ Semantics are the same as the unrolled circuits, but computed in bulk:
153
+ - decode: `sel[i] = H(sum(addr_bits * weight[i]) + bias[i])`
154
+ - read: `bit[b] = H(sum(H([mem_bit, sel] * and_w[b,i] + and_b[b,i]) * or_w[b]) + or_b[b])`
155
+ - write: `new_bit = H(H([old_bit, nsel] * and_old_w + and_old_b) + H([data_bit, sel] * and_new_w + and_new_b) - 1)`
156
+
157
+ ## Naming Conventions
158
 
159
  - External inputs: `$name` or `$name[bit]`
160
  - Constants: `#0`, `#1`
161
  - Internal gates: relative path from circuit root
162
  - Outputs: named in `outputs` section
163
 
164
+ ## Validation Rules
165
+
166
+ 1. Every gate in routing must exist in tensors file
167
+ 2. Every tensor must have routing entry
168
+ 3. Input count must match weight dimensions
169
+ 4. No circular dependencies (DAG only)
170
+ 5. All referenced sources must exist
171
+ 6. Packed memory circuits are valid when the packed tensor blocks exist and match the expected shapes
routing/validate_packed_memory.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Validate packed memory tensor references in routing.json against safetensors.
3
+ """
4
+
5
+ from __future__ import annotations
6
+
7
+ import argparse
8
+ import json
9
+ import sys
10
+ from pathlib import Path
11
+ from typing import Dict, Iterable, List, Tuple
12
+
13
+ from safetensors import safe_open
14
+
15
+
16
+ def _load_json(path: Path) -> Dict:
17
+ with path.open("r", encoding="utf-8") as fh:
18
+ return json.load(fh)
19
+
20
+
21
+ def _get_scalar_tensor(f, name: str, default: int) -> int:
22
+ if name not in f.keys():
23
+ return default
24
+ tensor = f.get_tensor(name)
25
+ return int(tensor.item())
26
+
27
+
28
+ def _gather_internal_keys(routing: Dict, circuit_name: str) -> List[str]:
29
+ circuit = routing.get("circuits", {}).get(circuit_name)
30
+ if circuit is None:
31
+ return []
32
+ internal = circuit.get("internal", {})
33
+ keys: List[str] = []
34
+ for value in internal.values():
35
+ if isinstance(value, list):
36
+ keys.extend(value)
37
+ return keys
38
+
39
+
40
+ def _shape_matches(actual: Iterable[int], expected: Iterable[int]) -> bool:
41
+ return tuple(actual) == tuple(expected)
42
+
43
+
44
+ def main() -> int:
45
+ parser = argparse.ArgumentParser(description="Validate packed memory routing tensors.")
46
+ parser.add_argument(
47
+ "--routing",
48
+ type=Path,
49
+ default=Path(__file__).resolve().parent / "routing.json",
50
+ help="Path to routing.json",
51
+ )
52
+ parser.add_argument(
53
+ "--model",
54
+ type=Path,
55
+ default=Path(__file__).resolve().parent.parent / "neural_computer.safetensors",
56
+ help="Path to neural_computer.safetensors",
57
+ )
58
+ args = parser.parse_args()
59
+
60
+ routing = _load_json(args.routing)
61
+ routing_keys = set()
62
+ for name in ("memory.addr_decode", "memory.read", "memory.write"):
63
+ routing_keys.update(_gather_internal_keys(routing, name))
64
+
65
+ missing_routing = [k for k in routing_keys if not k]
66
+ if missing_routing:
67
+ print("routing.json contains empty packed tensor entries.", file=sys.stderr)
68
+ return 1
69
+
70
+ with safe_open(str(args.model), framework="pt") as f:
71
+ mem_bytes = _get_scalar_tensor(f, "manifest.memory_bytes", 65536)
72
+ pc_width = _get_scalar_tensor(f, "manifest.pc_width", 16)
73
+ reg_width = _get_scalar_tensor(f, "manifest.register_width", 8)
74
+
75
+ expected_shapes: Dict[str, Tuple[int, ...]] = {
76
+ "memory.addr_decode.weight": (mem_bytes, pc_width),
77
+ "memory.addr_decode.bias": (mem_bytes,),
78
+ "memory.read.and.weight": (reg_width, mem_bytes, 2),
79
+ "memory.read.and.bias": (reg_width, mem_bytes),
80
+ "memory.read.or.weight": (reg_width, mem_bytes),
81
+ "memory.read.or.bias": (reg_width,),
82
+ "memory.write.sel.weight": (mem_bytes, 2),
83
+ "memory.write.sel.bias": (mem_bytes,),
84
+ "memory.write.nsel.weight": (mem_bytes, 1),
85
+ "memory.write.nsel.bias": (mem_bytes,),
86
+ "memory.write.and_old.weight": (mem_bytes, reg_width, 2),
87
+ "memory.write.and_old.bias": (mem_bytes, reg_width),
88
+ "memory.write.and_new.weight": (mem_bytes, reg_width, 2),
89
+ "memory.write.and_new.bias": (mem_bytes, reg_width),
90
+ "memory.write.or.weight": (mem_bytes, reg_width, 2),
91
+ "memory.write.or.bias": (mem_bytes, reg_width),
92
+ }
93
+
94
+ errors = []
95
+ for key, expected in expected_shapes.items():
96
+ if key not in routing_keys:
97
+ errors.append(f"routing.json missing key: {key}")
98
+ continue
99
+ if key not in f.keys():
100
+ errors.append(f"safetensors missing key: {key}")
101
+ continue
102
+ actual = f.get_tensor(key).shape
103
+ if not _shape_matches(actual, expected):
104
+ errors.append(f"{key} shape {tuple(actual)} != {expected}")
105
+
106
+ if errors:
107
+ print("Packed memory validation failed:", file=sys.stderr)
108
+ for err in errors:
109
+ print(f" - {err}", file=sys.stderr)
110
+ return 1
111
+
112
+ print("Packed memory routing validation: ok")
113
+ return 0
114
+
115
+
116
+ if __name__ == "__main__":
117
+ raise SystemExit(main())
tensors.txt CHANGED
The diff for this file is too large to render. See raw diff
 
todo.md CHANGED
@@ -56,54 +56,55 @@ The machine runs. Callers just provide initial state and collect results.
56
  ### State Tensor Layout
57
  ```
58
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
59
- β”‚ PC [8] β”‚ Regs[32] β”‚Flags[4β”‚Ctrl[4] β”‚ Memory [N Γ— 8] β”‚
60
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
61
- 8 + 32 + 4 + 4 + N Γ— 8 bits
62
  ```
63
 
64
  ### Memory Hierarchy
65
  | Level | Size | Tensors | Access |
66
  |-------|------|---------|--------|
67
  | Registers | 4 Γ— 8-bit | Direct wiring | Immediate |
68
- | Hot cache | 256 bytes | ~6,400 | 8-bit addressed |
69
- | Cold bank | 64KB | ~1.6M | 16-bit addressed |
70
 
71
  ### Full 64KB Configuration
72
  - Address space: 0x0000 - 0xFFFF
73
  - Routing circuits: ~1.64M tensors
74
- - State tensor: 48 + 524,288 = 524,336 bits per instance
75
 
76
  ## Phase 1: Memory Infrastructure
77
 
 
 
78
  | Component | Description | Tensors | Status |
79
  |-----------|-------------|---------|--------|
80
- | Address Decoder 8-bit | 8-bit β†’ 256 one-hot | ~520 | Pending |
81
- | Address Decoder 16-bit | 16-bit β†’ 65536 one-hot | ~65,600 | Pending |
82
- | Memory Read MUX 256 | 256-to-1 Γ— 8 bits | ~2,048 | Pending |
83
- | Memory Read MUX 64K | 65536-to-1 Γ— 8 bits | ~524,288 | Pending |
84
- | Memory Write Demux | Route write to address | ~524,288 | Pending |
85
- | Memory Cell Logic | Conditional update | ~524,288 | Pending |
86
 
87
  ## Phase 2: Execution Engine
88
 
89
  | Component | Description | Status |
90
  |-----------|-------------|--------|
91
- | Instruction Fetch | PC β†’ Memory β†’ IR | Pending |
92
- | Operand Fetch | Decode β†’ Register/Memory Read | Pending |
93
- | ALU Dispatch | Opcode β†’ Operation Select | Pending |
94
- | Result Writeback | Route to destination | Pending |
95
- | Flag Update | Compute Z/N/C/V | Partial |
96
  | PC Advance | Increment or Jump | Done |
97
  | Halt Detection | HALT opcode β†’ stop | Done |
98
 
99
  ## Phase 3: ACT Integration
100
 
 
 
101
  | Component | Description | Status |
102
  |-----------|-------------|--------|
103
- | Cycle Block | All Phase 2 as single layer | Pending |
104
- | Recurrence Wrapper | Loop until halt signal | Pending |
105
- | Max Cycles Guard | Prevent infinite loops | Pending |
106
- | State I/O | Pack/unpack state tensor | Pending |
107
 
108
  ## Instruction Set
109
 
@@ -119,11 +120,11 @@ The machine runs. Callers just provide initial state and collect results.
119
  | 0x7 | MUL | R[d] = R[a] * R[b] | Done |
120
  | 0x8 | DIV | R[d] = R[a] / R[b] | Done |
121
  | 0x9 | CMP | flags = R[a] - R[b] | Done |
122
- | 0xA | LOAD | R[d] = M[addr] | Pending |
123
- | 0xB | STORE | M[addr] = R[s] | Pending |
124
- | 0xC | JMP | PC = addr | Partial |
125
  | 0xD | JZ/JNZ | PC = addr if flag | Done |
126
- | 0xE | CALL | push PC; PC = addr | Partial |
127
  | 0xF | HALT | stop execution | Done |
128
 
129
  ## Completed Circuits
@@ -151,8 +152,8 @@ The machine runs. Callers just provide initial state and collect results.
151
  - Comparators, threshold gates
152
  - Conditional jumps
153
 
154
- **Current: 24,200 tensors**
155
- **Projected: ~1.65M tensors (with 64KB memory)**
156
 
157
  ## Applications
158
 
 
56
  ### State Tensor Layout
57
  ```
58
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
59
+ β”‚ PC [16] β”‚ Regs[32] β”‚Flags[4β”‚Ctrl[4] β”‚ Memory [N Γ— 8] β”‚
60
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
61
+ 16 + 32 + 4 + 4 + N Γ— 8 bits
62
  ```
63
 
64
  ### Memory Hierarchy
65
  | Level | Size | Tensors | Access |
66
  |-------|------|---------|--------|
67
  | Registers | 4 Γ— 8-bit | Direct wiring | Immediate |
68
+ | Main memory | 64KB | ~1.6M | 16-bit addressed |
 
69
 
70
  ### Full 64KB Configuration
71
  - Address space: 0x0000 - 0xFFFF
72
  - Routing circuits: ~1.64M tensors
73
+ - State tensor: 88 + 524,288 = 524,376 bits per instance
74
 
75
  ## Phase 1: Memory Infrastructure
76
 
77
+ 64KB memory circuits are implemented and pass comprehensive eval.
78
+
79
  | Component | Description | Tensors | Status |
80
  |-----------|-------------|---------|--------|
81
+ | Address Decoder 16-bit | 16-bit β†’ 65536 one-hot | 2 (packed) | Done |
82
+ | Memory Read MUX 64K | 65536-to-1 Γ— 8 bits | 4 (packed) | Done |
83
+ | Memory Write Demux | Route write to address | 4 (packed) | Done |
84
+ | Memory Cell Logic | Conditional update | 6 (packed) | Done |
 
 
85
 
86
  ## Phase 2: Execution Engine
87
 
88
  | Component | Description | Status |
89
  |-----------|-------------|--------|
90
+ | Instruction Fetch | PC β†’ Memory β†’ IR | Done |
91
+ | Operand Fetch | Decode β†’ Register/Memory Read | Done |
92
+ | ALU Dispatch | Opcode β†’ Operation Select | Done |
93
+ | Result Writeback | Route to destination | Done |
94
+ | Flag Update | Compute Z/N/C/V | Done |
95
  | PC Advance | Increment or Jump | Done |
96
  | Halt Detection | HALT opcode β†’ stop | Done |
97
 
98
  ## Phase 3: ACT Integration
99
 
100
+ Threshold runtime available in cpu/threshold_cpu.py (cycle + ACT loop + state I/O).
101
+
102
  | Component | Description | Status |
103
  |-----------|-------------|--------|
104
+ | Cycle Block | All Phase 2 as single layer | Done |
105
+ | Recurrence Wrapper | Loop until halt signal | Done |
106
+ | Max Cycles Guard | Prevent infinite loops | Done |
107
+ | State I/O | Pack/unpack state tensor | Done |
108
 
109
  ## Instruction Set
110
 
 
120
  | 0x7 | MUL | R[d] = R[a] * R[b] | Done |
121
  | 0x8 | DIV | R[d] = R[a] / R[b] | Done |
122
  | 0x9 | CMP | flags = R[a] - R[b] | Done |
123
+ | 0xA | LOAD | R[d] = M[addr] | Done |
124
+ | 0xB | STORE | M[addr] = R[s] | Done |
125
+ | 0xC | JMP | PC = addr | Done |
126
  | 0xD | JZ/JNZ | PC = addr if flag | Done |
127
+ | 0xE | CALL | push PC; PC = addr | Done |
128
  | 0xF | HALT | stop execution | Done |
129
 
130
  ## Completed Circuits
 
152
  - Comparators, threshold gates
153
  - Conditional jumps
154
 
155
+ **Current: 6,296 tensors (packed memory)**
156
+ **Parameters: 8,267,667**
157
 
158
  ## Applications
159