Sync packed memory + 16-bit addressing
Browse files- .gitattributes +1 -0
- README.md +17 -15
- cpu/cycle.py +22 -11
- cpu/state.py +3 -3
- cpu/threshold_cpu.py +435 -0
- eval/build_memory.py +69 -28
- eval/comprehensive_eval.py +155 -54
- eval/cpu_cycle_test.py +44 -13
- eval/iron_eval.py +110 -22
- neural_computer.safetensors +2 -2
- routing.json +0 -0
- routing/generate_routing.py +32 -43
- routing/routing.json +0 -0
- routing/routing_schema.md +88 -23
- routing/validate_packed_memory.py +117 -0
- tensors.txt +0 -0
- todo.md +27 -26
.gitattributes
CHANGED
|
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 36 |
__pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
|
| 37 |
__pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
|
| 38 |
eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 36 |
__pycache__/iron_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
|
| 37 |
__pycache__/iron_eval.cpython-311.pyc filter=lfs diff=lfs merge=lfs -text
|
| 38 |
eval/__pycache__/comprehensive_eval.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
tensors.txt filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -17,8 +17,8 @@ tags:
|
|
| 17 |
Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β₯ 0 else 0`
|
| 18 |
|
| 19 |
```
|
| 20 |
-
Tensors:
|
| 21 |
-
Parameters:
|
| 22 |
```
|
| 23 |
|
| 24 |
---
|
|
@@ -30,7 +30,7 @@ A complete 8-bit processor where every operationβfrom Boolean logic to arithme
|
|
| 30 |
| Component | Specification |
|
| 31 |
|-----------|---------------|
|
| 32 |
| Registers | 4 Γ 8-bit general purpose |
|
| 33 |
-
| Memory |
|
| 34 |
| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
|
| 35 |
| Flags | Zero, Negative, Carry, Overflow |
|
| 36 |
| Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
|
|
@@ -90,7 +90,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
|
|
| 90 |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
|
| 91 |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
|
| 92 |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
|
| 93 |
-
| Memory | 3 |
|
| 94 |
|
| 95 |
---
|
| 96 |
|
|
@@ -122,14 +122,14 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
|
|
| 122 |
All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
|
| 123 |
|
| 124 |
```
|
| 125 |
-
[ PC[
|
| 126 |
```
|
| 127 |
|
| 128 |
Flags are ordered as: `Z, N, C, V`.
|
| 129 |
|
| 130 |
Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
|
| 131 |
|
| 132 |
-
Total state size: `
|
| 133 |
|
| 134 |
---
|
| 135 |
|
|
@@ -145,8 +145,7 @@ opcode rd rs imm8
|
|
| 145 |
Interpretation:
|
| 146 |
- **R-type**: `rd = rd op rs` (imm8 ignored).
|
| 147 |
- **I-type**: `rd = op rd, imm8` (rs ignored).
|
| 148 |
-
- **
|
| 149 |
-
- **LOAD/STORE**: `imm8` is the absolute memory address.
|
| 150 |
|
| 151 |
---
|
| 152 |
|
|
@@ -185,12 +184,15 @@ All circuits pass exhaustive testing over their full input domains.
|
|
| 185 |
```
|
| 186 |
{category}.{circuit}[.{layer}][.{component}].{weight|bias}
|
| 187 |
|
| 188 |
-
Examples:
|
| 189 |
-
boolean.and.weight
|
| 190 |
-
boolean.xor.layer1.neuron1.weight
|
| 191 |
-
arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
|
| 192 |
-
modular.mod5.layer2.eq3.weight
|
| 193 |
-
error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
|
|
|
|
|
|
|
|
|
|
| 194 |
```
|
| 195 |
|
| 196 |
---
|
|
@@ -209,7 +211,7 @@ All weights are integers. All activations are Heaviside step. Designed for:
|
|
| 209 |
|
| 210 |
| File | Description |
|
| 211 |
|------|-------------|
|
| 212 |
-
| `neural_computer.safetensors` |
|
| 213 |
| `iron_eval.py` | Comprehensive test suite |
|
| 214 |
| `prune_weights.py` | Weight optimization tool |
|
| 215 |
|
|
|
|
| 17 |
Every logic gate is a threshold neuron: `output = 1 if (Ξ£ wα΅’xα΅’ + b) β₯ 0 else 0`
|
| 18 |
|
| 19 |
```
|
| 20 |
+
Tensors: 6,296
|
| 21 |
+
Parameters: 8,267,667
|
| 22 |
```
|
| 23 |
|
| 24 |
---
|
|
|
|
| 30 |
| Component | Specification |
|
| 31 |
|-----------|---------------|
|
| 32 |
| Registers | 4 Γ 8-bit general purpose |
|
| 33 |
+
| Memory | 64KB addressable |
|
| 34 |
| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
|
| 35 |
| Flags | Zero, Negative, Carry, Overflow |
|
| 36 |
| Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
|
|
|
|
| 90 |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
|
| 91 |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
|
| 92 |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
|
| 93 |
+
| Memory | 3 | 16-bit addr decoder, 65536x8 read mux, write cell update (packed) |
|
| 94 |
|
| 95 |
---
|
| 96 |
|
|
|
|
| 122 |
All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
|
| 123 |
|
| 124 |
```
|
| 125 |
+
[ PC[16] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[16] | CTRL[4] | MEM[65536][8] ]
|
| 126 |
```
|
| 127 |
|
| 128 |
Flags are ordered as: `Z, N, C, V`.
|
| 129 |
|
| 130 |
Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
|
| 131 |
|
| 132 |
+
Total state size: `524376` bits.
|
| 133 |
|
| 134 |
---
|
| 135 |
|
|
|
|
| 145 |
Interpretation:
|
| 146 |
- **R-type**: `rd = rd op rs` (imm8 ignored).
|
| 147 |
- **I-type**: `rd = op rd, imm8` (rs ignored).
|
| 148 |
+
- **Address-extended**: `LOAD`, `STORE`, `JMP`, `JZ`, `CALL` consume the next word as a 16-bit address (big-endian). `imm8` is reserved, and the PC skips 4 bytes when the jump is not taken.
|
|
|
|
| 149 |
|
| 150 |
---
|
| 151 |
|
|
|
|
| 184 |
```
|
| 185 |
{category}.{circuit}[.{layer}][.{component}].{weight|bias}
|
| 186 |
|
| 187 |
+
Examples:
|
| 188 |
+
boolean.and.weight
|
| 189 |
+
boolean.xor.layer1.neuron1.weight
|
| 190 |
+
arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
|
| 191 |
+
modular.mod5.layer2.eq3.weight
|
| 192 |
+
error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
|
| 193 |
+
|
| 194 |
+
Memory circuits are stored as packed tensors to keep the safetensors header size manageable
|
| 195 |
+
(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).
|
| 196 |
```
|
| 197 |
|
| 198 |
---
|
|
|
|
| 211 |
|
| 212 |
| File | Description |
|
| 213 |
|------|-------------|
|
| 214 |
+
| `neural_computer.safetensors` | 6,296 tensors, 8,267,667 parameters |
|
| 215 |
| `iron_eval.py` | Comprehensive test suite |
|
| 216 |
| `prune_weights.py` | Weight optimization tool |
|
| 217 |
|
cpu/cycle.py
CHANGED
|
@@ -50,14 +50,22 @@ def step(state: CPUState) -> CPUState:
|
|
| 50 |
|
| 51 |
# Fetch: two bytes, big-endian
|
| 52 |
hi = s.mem[s.pc]
|
| 53 |
-
lo = s.mem[(s.pc + 1) &
|
| 54 |
s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
|
| 55 |
-
next_pc = (s.pc + 2) &
|
| 56 |
|
| 57 |
opcode, rd, rs, imm8 = decode_ir(s.ir)
|
| 58 |
a = s.regs[rd]
|
| 59 |
b = s.regs[rs]
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
write_result = True
|
| 62 |
result = a
|
| 63 |
carry = 0
|
|
@@ -94,23 +102,26 @@ def step(state: CPUState) -> CPUState:
|
|
| 94 |
result, carry, overflow = _alu_sub(a, b)
|
| 95 |
write_result = False
|
| 96 |
elif opcode == 0xA: # LOAD
|
| 97 |
-
result = s.mem[
|
| 98 |
elif opcode == 0xB: # STORE
|
| 99 |
-
s.mem[
|
| 100 |
write_result = False
|
| 101 |
elif opcode == 0xC: # JMP
|
| 102 |
-
s.pc =
|
| 103 |
write_result = False
|
| 104 |
elif opcode == 0xD: # JZ
|
| 105 |
if s.flags[0] == 1:
|
| 106 |
-
s.pc =
|
| 107 |
else:
|
| 108 |
-
s.pc =
|
| 109 |
write_result = False
|
| 110 |
elif opcode == 0xE: # CALL
|
| 111 |
-
|
| 112 |
-
s.
|
| 113 |
-
s.
|
|
|
|
|
|
|
|
|
|
| 114 |
write_result = False
|
| 115 |
elif opcode == 0xF: # HALT
|
| 116 |
s.ctrl[0] = 1
|
|
@@ -123,7 +134,7 @@ def step(state: CPUState) -> CPUState:
|
|
| 123 |
s.regs[rd] = result & 0xFF
|
| 124 |
|
| 125 |
if opcode not in (0xC, 0xD, 0xE):
|
| 126 |
-
s.pc =
|
| 127 |
|
| 128 |
return s
|
| 129 |
|
|
|
|
| 50 |
|
| 51 |
# Fetch: two bytes, big-endian
|
| 52 |
hi = s.mem[s.pc]
|
| 53 |
+
lo = s.mem[(s.pc + 1) & 0xFFFF]
|
| 54 |
s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
|
| 55 |
+
next_pc = (s.pc + 2) & 0xFFFF
|
| 56 |
|
| 57 |
opcode, rd, rs, imm8 = decode_ir(s.ir)
|
| 58 |
a = s.regs[rd]
|
| 59 |
b = s.regs[rs]
|
| 60 |
|
| 61 |
+
addr16 = None
|
| 62 |
+
next_pc_ext = next_pc
|
| 63 |
+
if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
|
| 64 |
+
addr_hi = s.mem[next_pc]
|
| 65 |
+
addr_lo = s.mem[(next_pc + 1) & 0xFFFF]
|
| 66 |
+
addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
|
| 67 |
+
next_pc_ext = (next_pc + 2) & 0xFFFF
|
| 68 |
+
|
| 69 |
write_result = True
|
| 70 |
result = a
|
| 71 |
carry = 0
|
|
|
|
| 102 |
result, carry, overflow = _alu_sub(a, b)
|
| 103 |
write_result = False
|
| 104 |
elif opcode == 0xA: # LOAD
|
| 105 |
+
result = s.mem[addr16]
|
| 106 |
elif opcode == 0xB: # STORE
|
| 107 |
+
s.mem[addr16] = b & 0xFF
|
| 108 |
write_result = False
|
| 109 |
elif opcode == 0xC: # JMP
|
| 110 |
+
s.pc = addr16 & 0xFFFF
|
| 111 |
write_result = False
|
| 112 |
elif opcode == 0xD: # JZ
|
| 113 |
if s.flags[0] == 1:
|
| 114 |
+
s.pc = addr16 & 0xFFFF
|
| 115 |
else:
|
| 116 |
+
s.pc = next_pc_ext
|
| 117 |
write_result = False
|
| 118 |
elif opcode == 0xE: # CALL
|
| 119 |
+
ret_addr = next_pc_ext & 0xFFFF
|
| 120 |
+
s.sp = (s.sp - 1) & 0xFFFF
|
| 121 |
+
s.mem[s.sp] = (ret_addr >> 8) & 0xFF
|
| 122 |
+
s.sp = (s.sp - 1) & 0xFFFF
|
| 123 |
+
s.mem[s.sp] = ret_addr & 0xFF
|
| 124 |
+
s.pc = addr16 & 0xFFFF
|
| 125 |
write_result = False
|
| 126 |
elif opcode == 0xF: # HALT
|
| 127 |
s.ctrl[0] = 1
|
|
|
|
| 134 |
s.regs[rd] = result & 0xFF
|
| 135 |
|
| 136 |
if opcode not in (0xC, 0xD, 0xE):
|
| 137 |
+
s.pc = next_pc_ext
|
| 138 |
|
| 139 |
return s
|
| 140 |
|
cpu/state.py
CHANGED
|
@@ -11,14 +11,14 @@ from typing import List
|
|
| 11 |
FLAG_NAMES = ["Z", "N", "C", "V"]
|
| 12 |
CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
|
| 13 |
|
| 14 |
-
PC_BITS =
|
| 15 |
IR_BITS = 16
|
| 16 |
REG_BITS = 8
|
| 17 |
REG_COUNT = 4
|
| 18 |
FLAG_BITS = 4
|
| 19 |
-
SP_BITS =
|
| 20 |
CTRL_BITS = 4
|
| 21 |
-
MEM_BYTES =
|
| 22 |
MEM_BITS = MEM_BYTES * 8
|
| 23 |
|
| 24 |
STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS
|
|
|
|
| 11 |
FLAG_NAMES = ["Z", "N", "C", "V"]
|
| 12 |
CTRL_NAMES = ["HALT", "MEM_WE", "MEM_RE", "RESERVED"]
|
| 13 |
|
| 14 |
+
PC_BITS = 16
|
| 15 |
IR_BITS = 16
|
| 16 |
REG_BITS = 8
|
| 17 |
REG_COUNT = 4
|
| 18 |
FLAG_BITS = 4
|
| 19 |
+
SP_BITS = 16
|
| 20 |
CTRL_BITS = 4
|
| 21 |
+
MEM_BYTES = 65536
|
| 22 |
MEM_BITS = MEM_BYTES * 8
|
| 23 |
|
| 24 |
STATE_BITS = PC_BITS + IR_BITS + (REG_BITS * REG_COUNT) + FLAG_BITS + SP_BITS + CTRL_BITS + MEM_BITS
|
cpu/threshold_cpu.py
ADDED
|
@@ -0,0 +1,435 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Threshold-weight runtime for the 8-bit CPU.
|
| 3 |
+
|
| 4 |
+
Implements a reference cycle using the frozen circuit weights for core ALU ops.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
from typing import List, Tuple
|
| 11 |
+
|
| 12 |
+
import torch
|
| 13 |
+
from safetensors.torch import load_file
|
| 14 |
+
|
| 15 |
+
from .state import CPUState, pack_state, unpack_state, REG_BITS, PC_BITS, MEM_BYTES
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def heaviside(x: torch.Tensor) -> torch.Tensor:
|
| 19 |
+
return (x >= 0).float()
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def int_to_bits_msb(value: int, width: int) -> List[int]:
|
| 23 |
+
return [(value >> (width - 1 - i)) & 1 for i in range(width)]
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def bits_to_int_msb(bits: List[int]) -> int:
|
| 27 |
+
value = 0
|
| 28 |
+
for bit in bits:
|
| 29 |
+
value = (value << 1) | int(bit)
|
| 30 |
+
return value
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def bits_msb_to_lsb(bits: List[int]) -> List[int]:
|
| 34 |
+
return list(reversed(bits))
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
DEFAULT_MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
class ThresholdALU:
|
| 41 |
+
def __init__(self, model_path: str, device: str = "cpu") -> None:
|
| 42 |
+
self.device = device
|
| 43 |
+
self.tensors = {k: v.float().to(device) for k, v in load_file(model_path).items()}
|
| 44 |
+
|
| 45 |
+
def _get(self, name: str) -> torch.Tensor:
|
| 46 |
+
return self.tensors[name]
|
| 47 |
+
|
| 48 |
+
def _eval_gate(self, weight_key: str, bias_key: str, inputs: List[float]) -> float:
|
| 49 |
+
w = self._get(weight_key)
|
| 50 |
+
b = self._get(bias_key)
|
| 51 |
+
inp = torch.tensor(inputs, device=self.device)
|
| 52 |
+
return heaviside((inp * w).sum() + b).item()
|
| 53 |
+
|
| 54 |
+
def _eval_xor(self, prefix: str, inputs: List[float]) -> float:
|
| 55 |
+
inp = torch.tensor(inputs, device=self.device)
|
| 56 |
+
w_or = self._get(f"{prefix}.layer1.or.weight")
|
| 57 |
+
b_or = self._get(f"{prefix}.layer1.or.bias")
|
| 58 |
+
w_nand = self._get(f"{prefix}.layer1.nand.weight")
|
| 59 |
+
b_nand = self._get(f"{prefix}.layer1.nand.bias")
|
| 60 |
+
w2 = self._get(f"{prefix}.layer2.weight")
|
| 61 |
+
b2 = self._get(f"{prefix}.layer2.bias")
|
| 62 |
+
|
| 63 |
+
h_or = heaviside((inp * w_or).sum() + b_or).item()
|
| 64 |
+
h_nand = heaviside((inp * w_nand).sum() + b_nand).item()
|
| 65 |
+
hidden = torch.tensor([h_or, h_nand], device=self.device)
|
| 66 |
+
return heaviside((hidden * w2).sum() + b2).item()
|
| 67 |
+
|
| 68 |
+
def _eval_full_adder(self, prefix: str, a: float, b: float, cin: float) -> Tuple[float, float]:
|
| 69 |
+
ha1_sum = self._eval_xor(f"{prefix}.ha1.sum", [a, b])
|
| 70 |
+
ha1_carry = self._eval_gate(f"{prefix}.ha1.carry.weight", f"{prefix}.ha1.carry.bias", [a, b])
|
| 71 |
+
|
| 72 |
+
ha2_sum = self._eval_xor(f"{prefix}.ha2.sum", [ha1_sum, cin])
|
| 73 |
+
ha2_carry = self._eval_gate(
|
| 74 |
+
f"{prefix}.ha2.carry.weight", f"{prefix}.ha2.carry.bias", [ha1_sum, cin]
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
cout = self._eval_gate(f"{prefix}.carry_or.weight", f"{prefix}.carry_or.bias", [ha1_carry, ha2_carry])
|
| 78 |
+
return ha2_sum, cout
|
| 79 |
+
|
| 80 |
+
def add(self, a: int, b: int) -> Tuple[int, int, int]:
|
| 81 |
+
a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
|
| 82 |
+
b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
|
| 83 |
+
|
| 84 |
+
carry = 0.0
|
| 85 |
+
sum_bits: List[int] = []
|
| 86 |
+
for bit in range(REG_BITS):
|
| 87 |
+
sum_bit, carry = self._eval_full_adder(
|
| 88 |
+
f"arithmetic.ripplecarry8bit.fa{bit}", float(a_bits[bit]), float(b_bits[bit]), carry
|
| 89 |
+
)
|
| 90 |
+
sum_bits.append(int(sum_bit))
|
| 91 |
+
|
| 92 |
+
result = bits_to_int_msb(list(reversed(sum_bits)))
|
| 93 |
+
carry_out = int(carry)
|
| 94 |
+
overflow = 1 if (((a ^ result) & (b ^ result)) & 0x80) else 0
|
| 95 |
+
return result, carry_out, overflow
|
| 96 |
+
|
| 97 |
+
def sub(self, a: int, b: int) -> Tuple[int, int, int]:
|
| 98 |
+
a_bits = bits_msb_to_lsb(int_to_bits_msb(a, REG_BITS))
|
| 99 |
+
b_bits = bits_msb_to_lsb(int_to_bits_msb(b, REG_BITS))
|
| 100 |
+
|
| 101 |
+
carry = 1.0 # two's complement carry-in
|
| 102 |
+
sum_bits: List[int] = []
|
| 103 |
+
for bit in range(REG_BITS):
|
| 104 |
+
notb = self._eval_gate(
|
| 105 |
+
f"arithmetic.sub8bit.notb{bit}.weight",
|
| 106 |
+
f"arithmetic.sub8bit.notb{bit}.bias",
|
| 107 |
+
[float(b_bits[bit])],
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
xor1 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor1", [float(a_bits[bit]), notb])
|
| 111 |
+
xor2 = self._eval_xor(f"arithmetic.sub8bit.fa{bit}.xor2", [xor1, carry])
|
| 112 |
+
|
| 113 |
+
and1 = self._eval_gate(
|
| 114 |
+
f"arithmetic.sub8bit.fa{bit}.and1.weight",
|
| 115 |
+
f"arithmetic.sub8bit.fa{bit}.and1.bias",
|
| 116 |
+
[float(a_bits[bit]), notb],
|
| 117 |
+
)
|
| 118 |
+
and2 = self._eval_gate(
|
| 119 |
+
f"arithmetic.sub8bit.fa{bit}.and2.weight",
|
| 120 |
+
f"arithmetic.sub8bit.fa{bit}.and2.bias",
|
| 121 |
+
[xor1, carry],
|
| 122 |
+
)
|
| 123 |
+
carry = self._eval_gate(
|
| 124 |
+
f"arithmetic.sub8bit.fa{bit}.or_carry.weight",
|
| 125 |
+
f"arithmetic.sub8bit.fa{bit}.or_carry.bias",
|
| 126 |
+
[and1, and2],
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
sum_bits.append(int(xor2))
|
| 130 |
+
|
| 131 |
+
result = bits_to_int_msb(list(reversed(sum_bits)))
|
| 132 |
+
carry_out = int(carry)
|
| 133 |
+
overflow = 1 if (((a ^ b) & (a ^ result)) & 0x80) else 0
|
| 134 |
+
return result, carry_out, overflow
|
| 135 |
+
|
| 136 |
+
def bitwise_and(self, a: int, b: int) -> int:
|
| 137 |
+
a_bits = int_to_bits_msb(a, REG_BITS)
|
| 138 |
+
b_bits = int_to_bits_msb(b, REG_BITS)
|
| 139 |
+
w = self._get("alu.alu8bit.and.weight")
|
| 140 |
+
bias = self._get("alu.alu8bit.and.bias")
|
| 141 |
+
|
| 142 |
+
out_bits = []
|
| 143 |
+
for bit in range(REG_BITS):
|
| 144 |
+
inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
|
| 145 |
+
out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
|
| 146 |
+
out_bits.append(int(out))
|
| 147 |
+
|
| 148 |
+
return bits_to_int_msb(out_bits)
|
| 149 |
+
|
| 150 |
+
def bitwise_or(self, a: int, b: int) -> int:
|
| 151 |
+
a_bits = int_to_bits_msb(a, REG_BITS)
|
| 152 |
+
b_bits = int_to_bits_msb(b, REG_BITS)
|
| 153 |
+
w = self._get("alu.alu8bit.or.weight")
|
| 154 |
+
bias = self._get("alu.alu8bit.or.bias")
|
| 155 |
+
|
| 156 |
+
out_bits = []
|
| 157 |
+
for bit in range(REG_BITS):
|
| 158 |
+
inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
|
| 159 |
+
out = heaviside((inp * w[bit * 2:bit * 2 + 2]).sum() + bias[bit]).item()
|
| 160 |
+
out_bits.append(int(out))
|
| 161 |
+
|
| 162 |
+
return bits_to_int_msb(out_bits)
|
| 163 |
+
|
| 164 |
+
def bitwise_not(self, a: int) -> int:
|
| 165 |
+
a_bits = int_to_bits_msb(a, REG_BITS)
|
| 166 |
+
w = self._get("alu.alu8bit.not.weight")
|
| 167 |
+
bias = self._get("alu.alu8bit.not.bias")
|
| 168 |
+
|
| 169 |
+
out_bits = []
|
| 170 |
+
for bit in range(REG_BITS):
|
| 171 |
+
inp = torch.tensor([float(a_bits[bit])], device=self.device)
|
| 172 |
+
out = heaviside((inp * w[bit]).sum() + bias[bit]).item()
|
| 173 |
+
out_bits.append(int(out))
|
| 174 |
+
|
| 175 |
+
return bits_to_int_msb(out_bits)
|
| 176 |
+
|
| 177 |
+
def bitwise_xor(self, a: int, b: int) -> int:
|
| 178 |
+
a_bits = int_to_bits_msb(a, REG_BITS)
|
| 179 |
+
b_bits = int_to_bits_msb(b, REG_BITS)
|
| 180 |
+
|
| 181 |
+
w_or = self._get("alu.alu8bit.xor.layer1.or.weight")
|
| 182 |
+
b_or = self._get("alu.alu8bit.xor.layer1.or.bias")
|
| 183 |
+
w_nand = self._get("alu.alu8bit.xor.layer1.nand.weight")
|
| 184 |
+
b_nand = self._get("alu.alu8bit.xor.layer1.nand.bias")
|
| 185 |
+
w2 = self._get("alu.alu8bit.xor.layer2.weight")
|
| 186 |
+
b2 = self._get("alu.alu8bit.xor.layer2.bias")
|
| 187 |
+
|
| 188 |
+
out_bits = []
|
| 189 |
+
for bit in range(REG_BITS):
|
| 190 |
+
inp = torch.tensor([float(a_bits[bit]), float(b_bits[bit])], device=self.device)
|
| 191 |
+
h_or = heaviside((inp * w_or[bit * 2:bit * 2 + 2]).sum() + b_or[bit])
|
| 192 |
+
h_nand = heaviside((inp * w_nand[bit * 2:bit * 2 + 2]).sum() + b_nand[bit])
|
| 193 |
+
hidden = torch.stack([h_or, h_nand])
|
| 194 |
+
out = heaviside((hidden * w2[bit * 2:bit * 2 + 2]).sum() + b2[bit]).item()
|
| 195 |
+
out_bits.append(int(out))
|
| 196 |
+
|
| 197 |
+
return bits_to_int_msb(out_bits)
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
class ThresholdCPU:
|
| 201 |
+
def __init__(self, model_path: str | Path = DEFAULT_MODEL_PATH, device: str = "cpu") -> None:
|
| 202 |
+
self.device = device
|
| 203 |
+
self.alu = ThresholdALU(str(model_path), device=device)
|
| 204 |
+
|
| 205 |
+
@staticmethod
|
| 206 |
+
def decode_ir(ir: int) -> Tuple[int, int, int, int]:
|
| 207 |
+
opcode = (ir >> 12) & 0xF
|
| 208 |
+
rd = (ir >> 10) & 0x3
|
| 209 |
+
rs = (ir >> 8) & 0x3
|
| 210 |
+
imm8 = ir & 0xFF
|
| 211 |
+
return opcode, rd, rs, imm8
|
| 212 |
+
|
| 213 |
+
@staticmethod
|
| 214 |
+
def flags_from_result(result: int, carry: int, overflow: int) -> List[int]:
|
| 215 |
+
z = 1 if result == 0 else 0
|
| 216 |
+
n = 1 if (result & 0x80) else 0
|
| 217 |
+
c = 1 if carry else 0
|
| 218 |
+
v = 1 if overflow else 0
|
| 219 |
+
return [z, n, c, v]
|
| 220 |
+
|
| 221 |
+
def _addr_decode(self, addr: int) -> torch.Tensor:
|
| 222 |
+
bits = torch.tensor(int_to_bits_msb(addr, PC_BITS), device=self.device, dtype=torch.float32)
|
| 223 |
+
w = self.alu._get("memory.addr_decode.weight")
|
| 224 |
+
b = self.alu._get("memory.addr_decode.bias")
|
| 225 |
+
return heaviside((w * bits).sum(dim=1) + b)
|
| 226 |
+
|
| 227 |
+
def _memory_read(self, mem: List[int], addr: int) -> int:
|
| 228 |
+
sel = self._addr_decode(addr)
|
| 229 |
+
mem_bits = torch.tensor(
|
| 230 |
+
[int_to_bits_msb(byte, REG_BITS) for byte in mem],
|
| 231 |
+
device=self.device,
|
| 232 |
+
dtype=torch.float32,
|
| 233 |
+
)
|
| 234 |
+
and_w = self.alu._get("memory.read.and.weight")
|
| 235 |
+
and_b = self.alu._get("memory.read.and.bias")
|
| 236 |
+
or_w = self.alu._get("memory.read.or.weight")
|
| 237 |
+
or_b = self.alu._get("memory.read.or.bias")
|
| 238 |
+
|
| 239 |
+
out_bits: List[int] = []
|
| 240 |
+
for bit in range(REG_BITS):
|
| 241 |
+
inp = torch.stack([mem_bits[:, bit], sel], dim=1)
|
| 242 |
+
and_out = heaviside((inp * and_w[bit]).sum(dim=1) + and_b[bit])
|
| 243 |
+
out_bit = heaviside((and_out * or_w[bit]).sum() + or_b[bit]).item()
|
| 244 |
+
out_bits.append(int(out_bit))
|
| 245 |
+
|
| 246 |
+
return bits_to_int_msb(out_bits)
|
| 247 |
+
|
| 248 |
+
def _memory_write(self, mem: List[int], addr: int, value: int) -> List[int]:
|
| 249 |
+
sel = self._addr_decode(addr)
|
| 250 |
+
data_bits = torch.tensor(int_to_bits_msb(value, REG_BITS), device=self.device, dtype=torch.float32)
|
| 251 |
+
mem_bits = torch.tensor(
|
| 252 |
+
[int_to_bits_msb(byte, REG_BITS) for byte in mem],
|
| 253 |
+
device=self.device,
|
| 254 |
+
dtype=torch.float32,
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
sel_w = self.alu._get("memory.write.sel.weight")
|
| 258 |
+
sel_b = self.alu._get("memory.write.sel.bias")
|
| 259 |
+
nsel_w = self.alu._get("memory.write.nsel.weight").squeeze(1)
|
| 260 |
+
nsel_b = self.alu._get("memory.write.nsel.bias")
|
| 261 |
+
and_old_w = self.alu._get("memory.write.and_old.weight")
|
| 262 |
+
and_old_b = self.alu._get("memory.write.and_old.bias")
|
| 263 |
+
and_new_w = self.alu._get("memory.write.and_new.weight")
|
| 264 |
+
and_new_b = self.alu._get("memory.write.and_new.bias")
|
| 265 |
+
or_w = self.alu._get("memory.write.or.weight")
|
| 266 |
+
or_b = self.alu._get("memory.write.or.bias")
|
| 267 |
+
|
| 268 |
+
we = torch.ones_like(sel)
|
| 269 |
+
sel_inp = torch.stack([sel, we], dim=1)
|
| 270 |
+
write_sel = heaviside((sel_inp * sel_w).sum(dim=1) + sel_b)
|
| 271 |
+
nsel = heaviside((write_sel * nsel_w) + nsel_b)
|
| 272 |
+
|
| 273 |
+
new_mem_bits = torch.zeros((MEM_BYTES, REG_BITS), device=self.device)
|
| 274 |
+
for bit in range(REG_BITS):
|
| 275 |
+
old_bit = mem_bits[:, bit]
|
| 276 |
+
data_bit = data_bits[bit].expand(MEM_BYTES)
|
| 277 |
+
inp_old = torch.stack([old_bit, nsel], dim=1)
|
| 278 |
+
inp_new = torch.stack([data_bit, write_sel], dim=1)
|
| 279 |
+
|
| 280 |
+
and_old = heaviside((inp_old * and_old_w[:, bit]).sum(dim=1) + and_old_b[:, bit])
|
| 281 |
+
and_new = heaviside((inp_new * and_new_w[:, bit]).sum(dim=1) + and_new_b[:, bit])
|
| 282 |
+
or_inp = torch.stack([and_old, and_new], dim=1)
|
| 283 |
+
out_bit = heaviside((or_inp * or_w[:, bit]).sum(dim=1) + or_b[:, bit])
|
| 284 |
+
new_mem_bits[:, bit] = out_bit
|
| 285 |
+
|
| 286 |
+
return [bits_to_int_msb([int(b) for b in new_mem_bits[i].tolist()]) for i in range(MEM_BYTES)]
|
| 287 |
+
|
| 288 |
+
def _conditional_jump_byte(self, prefix: str, pc_byte: int, target_byte: int, flag: int) -> int:
|
| 289 |
+
pc_bits = int_to_bits_msb(pc_byte, REG_BITS)
|
| 290 |
+
target_bits = int_to_bits_msb(target_byte, REG_BITS)
|
| 291 |
+
|
| 292 |
+
out_bits: List[int] = []
|
| 293 |
+
for bit in range(REG_BITS):
|
| 294 |
+
not_sel = self.alu._eval_gate(
|
| 295 |
+
f"{prefix}.bit{bit}.not_sel.weight",
|
| 296 |
+
f"{prefix}.bit{bit}.not_sel.bias",
|
| 297 |
+
[float(flag)],
|
| 298 |
+
)
|
| 299 |
+
and_a = self.alu._eval_gate(
|
| 300 |
+
f"{prefix}.bit{bit}.and_a.weight",
|
| 301 |
+
f"{prefix}.bit{bit}.and_a.bias",
|
| 302 |
+
[float(pc_bits[bit]), not_sel],
|
| 303 |
+
)
|
| 304 |
+
and_b = self.alu._eval_gate(
|
| 305 |
+
f"{prefix}.bit{bit}.and_b.weight",
|
| 306 |
+
f"{prefix}.bit{bit}.and_b.bias",
|
| 307 |
+
[float(target_bits[bit]), float(flag)],
|
| 308 |
+
)
|
| 309 |
+
out_bit = self.alu._eval_gate(
|
| 310 |
+
f"{prefix}.bit{bit}.or.weight",
|
| 311 |
+
f"{prefix}.bit{bit}.or.bias",
|
| 312 |
+
[and_a, and_b],
|
| 313 |
+
)
|
| 314 |
+
out_bits.append(int(out_bit))
|
| 315 |
+
|
| 316 |
+
return bits_to_int_msb(out_bits)
|
| 317 |
+
|
| 318 |
+
def step(self, state: CPUState) -> CPUState:
|
| 319 |
+
if state.ctrl[0] == 1: # HALT
|
| 320 |
+
return state.copy()
|
| 321 |
+
|
| 322 |
+
s = state.copy()
|
| 323 |
+
|
| 324 |
+
# Fetch: two bytes, big-endian
|
| 325 |
+
hi = self._memory_read(s.mem, s.pc)
|
| 326 |
+
lo = self._memory_read(s.mem, (s.pc + 1) & 0xFFFF)
|
| 327 |
+
s.ir = ((hi & 0xFF) << 8) | (lo & 0xFF)
|
| 328 |
+
next_pc = (s.pc + 2) & 0xFFFF
|
| 329 |
+
|
| 330 |
+
opcode, rd, rs, imm8 = self.decode_ir(s.ir)
|
| 331 |
+
a = s.regs[rd]
|
| 332 |
+
b = s.regs[rs]
|
| 333 |
+
|
| 334 |
+
addr16 = None
|
| 335 |
+
next_pc_ext = next_pc
|
| 336 |
+
if opcode in (0xA, 0xB, 0xC, 0xD, 0xE):
|
| 337 |
+
addr_hi = self._memory_read(s.mem, next_pc)
|
| 338 |
+
addr_lo = self._memory_read(s.mem, (next_pc + 1) & 0xFFFF)
|
| 339 |
+
addr16 = ((addr_hi & 0xFF) << 8) | (addr_lo & 0xFF)
|
| 340 |
+
next_pc_ext = (next_pc + 2) & 0xFFFF
|
| 341 |
+
|
| 342 |
+
write_result = True
|
| 343 |
+
result = a
|
| 344 |
+
carry = 0
|
| 345 |
+
overflow = 0
|
| 346 |
+
|
| 347 |
+
if opcode == 0x0: # ADD
|
| 348 |
+
result, carry, overflow = self.alu.add(a, b)
|
| 349 |
+
elif opcode == 0x1: # SUB
|
| 350 |
+
result, carry, overflow = self.alu.sub(a, b)
|
| 351 |
+
elif opcode == 0x2: # AND
|
| 352 |
+
result = self.alu.bitwise_and(a, b)
|
| 353 |
+
elif opcode == 0x3: # OR
|
| 354 |
+
result = self.alu.bitwise_or(a, b)
|
| 355 |
+
elif opcode == 0x4: # XOR
|
| 356 |
+
result = self.alu.bitwise_xor(a, b)
|
| 357 |
+
elif opcode == 0x5: # SHL
|
| 358 |
+
carry = 1 if (a & 0x80) else 0
|
| 359 |
+
result = (a << 1) & 0xFF
|
| 360 |
+
elif opcode == 0x6: # SHR
|
| 361 |
+
carry = 1 if (a & 0x01) else 0
|
| 362 |
+
result = (a >> 1) & 0xFF
|
| 363 |
+
elif opcode == 0x7: # MUL
|
| 364 |
+
full = a * b
|
| 365 |
+
result = full & 0xFF
|
| 366 |
+
carry = 1 if full > 0xFF else 0
|
| 367 |
+
elif opcode == 0x8: # DIV
|
| 368 |
+
if b == 0:
|
| 369 |
+
result = 0
|
| 370 |
+
carry = 1
|
| 371 |
+
overflow = 1
|
| 372 |
+
else:
|
| 373 |
+
result = (a // b) & 0xFF
|
| 374 |
+
elif opcode == 0x9: # CMP
|
| 375 |
+
result, carry, overflow = self.alu.sub(a, b)
|
| 376 |
+
write_result = False
|
| 377 |
+
elif opcode == 0xA: # LOAD
|
| 378 |
+
result = self._memory_read(s.mem, addr16)
|
| 379 |
+
elif opcode == 0xB: # STORE
|
| 380 |
+
s.mem = self._memory_write(s.mem, addr16, b & 0xFF)
|
| 381 |
+
write_result = False
|
| 382 |
+
elif opcode == 0xC: # JMP
|
| 383 |
+
s.pc = addr16 & 0xFFFF
|
| 384 |
+
write_result = False
|
| 385 |
+
elif opcode == 0xD: # JZ
|
| 386 |
+
hi_pc = self._conditional_jump_byte(
|
| 387 |
+
"control.jz",
|
| 388 |
+
(next_pc_ext >> 8) & 0xFF,
|
| 389 |
+
(addr16 >> 8) & 0xFF,
|
| 390 |
+
s.flags[0],
|
| 391 |
+
)
|
| 392 |
+
lo_pc = self._conditional_jump_byte(
|
| 393 |
+
"control.jz",
|
| 394 |
+
next_pc_ext & 0xFF,
|
| 395 |
+
addr16 & 0xFF,
|
| 396 |
+
s.flags[0],
|
| 397 |
+
)
|
| 398 |
+
s.pc = ((hi_pc & 0xFF) << 8) | (lo_pc & 0xFF)
|
| 399 |
+
write_result = False
|
| 400 |
+
elif opcode == 0xE: # CALL
|
| 401 |
+
ret_addr = next_pc_ext & 0xFFFF
|
| 402 |
+
s.sp = (s.sp - 1) & 0xFFFF
|
| 403 |
+
s.mem = self._memory_write(s.mem, s.sp, (ret_addr >> 8) & 0xFF)
|
| 404 |
+
s.sp = (s.sp - 1) & 0xFFFF
|
| 405 |
+
s.mem = self._memory_write(s.mem, s.sp, ret_addr & 0xFF)
|
| 406 |
+
s.pc = addr16 & 0xFFFF
|
| 407 |
+
write_result = False
|
| 408 |
+
elif opcode == 0xF: # HALT
|
| 409 |
+
s.ctrl[0] = 1
|
| 410 |
+
write_result = False
|
| 411 |
+
|
| 412 |
+
if opcode <= 0x9 or opcode == 0xA:
|
| 413 |
+
s.flags = self.flags_from_result(result, carry, overflow)
|
| 414 |
+
|
| 415 |
+
if write_result:
|
| 416 |
+
s.regs[rd] = result & 0xFF
|
| 417 |
+
|
| 418 |
+
if opcode not in (0xC, 0xD, 0xE):
|
| 419 |
+
s.pc = next_pc_ext
|
| 420 |
+
|
| 421 |
+
return s
|
| 422 |
+
|
| 423 |
+
def run_until_halt(self, state: CPUState, max_cycles: int = 256) -> Tuple[CPUState, int]:
|
| 424 |
+
s = state.copy()
|
| 425 |
+
for i in range(max_cycles):
|
| 426 |
+
if s.ctrl[0] == 1:
|
| 427 |
+
return s, i
|
| 428 |
+
s = self.step(s)
|
| 429 |
+
return s, max_cycles
|
| 430 |
+
|
| 431 |
+
def forward(self, state_bits: torch.Tensor, max_cycles: int = 256) -> torch.Tensor:
|
| 432 |
+
bits_list = [int(b) for b in state_bits.detach().cpu().flatten().tolist()]
|
| 433 |
+
state = unpack_state(bits_list)
|
| 434 |
+
final, _ = self.run_until_halt(state, max_cycles=max_cycles)
|
| 435 |
+
return torch.tensor(pack_state(final), dtype=torch.float32)
|
eval/build_memory.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
"""
|
| 2 |
-
Generate memory and fetch/load/store buffers for the 8-bit threshold computer.
|
| 3 |
Updates neural_computer.safetensors and tensors.txt in-place.
|
| 4 |
"""
|
| 5 |
|
|
@@ -16,6 +16,9 @@ from safetensors.torch import save_file
|
|
| 16 |
MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
|
| 17 |
MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
|
| 21 |
tensors: Dict[str, torch.Tensor] = {}
|
|
@@ -34,32 +37,59 @@ def add_gate(tensors: Dict[str, torch.Tensor], name: str, weight: Iterable[float
|
|
| 34 |
tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
|
| 35 |
|
| 36 |
|
| 37 |
-
def
|
| 38 |
-
for
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
|
| 45 |
def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
|
| 46 |
-
# AND
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
|
| 53 |
def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
|
| 54 |
-
#
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
|
|
@@ -69,16 +99,15 @@ def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
|
|
| 69 |
for bit in range(8):
|
| 70 |
add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
|
| 71 |
add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
|
|
|
|
| 72 |
add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
|
| 73 |
|
| 74 |
|
| 75 |
def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
|
| 76 |
-
#
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
return
|
| 81 |
-
tensors[key] = torch.tensor([2.0], dtype=torch.float32)
|
| 82 |
|
| 83 |
|
| 84 |
def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
|
|
@@ -94,7 +123,19 @@ def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
|
|
| 94 |
|
| 95 |
def main() -> None:
|
| 96 |
tensors = load_tensors(MODEL_PATH)
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
add_memory_read_mux(tensors)
|
| 99 |
add_memory_write_cells(tensors)
|
| 100 |
add_fetch_load_store_buffers(tensors)
|
|
|
|
| 1 |
"""
|
| 2 |
+
Generate 64KB memory circuits and fetch/load/store buffers for the 8-bit threshold computer.
|
| 3 |
Updates neural_computer.safetensors and tensors.txt in-place.
|
| 4 |
"""
|
| 5 |
|
|
|
|
| 16 |
MODEL_PATH = Path(__file__).resolve().parent.parent / "neural_computer.safetensors"
|
| 17 |
MANIFEST_PATH = Path(__file__).resolve().parent.parent / "tensors.txt"
|
| 18 |
|
| 19 |
+
ADDR_BITS = 16
|
| 20 |
+
MEM_BYTES = 1 << ADDR_BITS
|
| 21 |
+
|
| 22 |
|
| 23 |
def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
|
| 24 |
tensors: Dict[str, torch.Tensor] = {}
|
|
|
|
| 37 |
tensors[b_key] = torch.tensor(list(bias), dtype=torch.float32)
|
| 38 |
|
| 39 |
|
| 40 |
+
def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None:
|
| 41 |
+
for key in list(tensors.keys()):
|
| 42 |
+
if any(key.startswith(prefix) for prefix in prefixes):
|
| 43 |
+
del tensors[key]
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
|
| 47 |
+
weights = torch.empty((MEM_BYTES, ADDR_BITS), dtype=torch.float32)
|
| 48 |
+
bias = torch.empty((MEM_BYTES,), dtype=torch.float32)
|
| 49 |
+
for addr in range(MEM_BYTES):
|
| 50 |
+
bits = [(addr >> (ADDR_BITS - 1 - i)) & 1 for i in range(ADDR_BITS)] # MSB-first
|
| 51 |
+
weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
|
| 52 |
+
bias[addr] = -float(sum(bits))
|
| 53 |
+
tensors["memory.addr_decode.weight"] = weights
|
| 54 |
+
tensors["memory.addr_decode.bias"] = bias
|
| 55 |
|
| 56 |
|
| 57 |
def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
|
| 58 |
+
# Packed AND/OR weights for read mux.
|
| 59 |
+
and_weight = torch.ones((8, MEM_BYTES, 2), dtype=torch.float32)
|
| 60 |
+
and_bias = torch.full((8, MEM_BYTES), -2.0, dtype=torch.float32)
|
| 61 |
+
or_weight = torch.ones((8, MEM_BYTES), dtype=torch.float32)
|
| 62 |
+
or_bias = torch.full((8,), -1.0, dtype=torch.float32)
|
| 63 |
+
tensors["memory.read.and.weight"] = and_weight
|
| 64 |
+
tensors["memory.read.and.bias"] = and_bias
|
| 65 |
+
tensors["memory.read.or.weight"] = or_weight
|
| 66 |
+
tensors["memory.read.or.bias"] = or_bias
|
| 67 |
|
| 68 |
|
| 69 |
def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
|
| 70 |
+
# Packed write gate weights.
|
| 71 |
+
sel_weight = torch.ones((MEM_BYTES, 2), dtype=torch.float32)
|
| 72 |
+
sel_bias = torch.full((MEM_BYTES,), -2.0, dtype=torch.float32)
|
| 73 |
+
nsel_weight = torch.full((MEM_BYTES, 1), -1.0, dtype=torch.float32)
|
| 74 |
+
nsel_bias = torch.zeros((MEM_BYTES,), dtype=torch.float32)
|
| 75 |
+
|
| 76 |
+
and_old_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
|
| 77 |
+
and_old_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
|
| 78 |
+
and_new_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
|
| 79 |
+
and_new_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
|
| 80 |
+
or_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
|
| 81 |
+
or_bias = torch.full((MEM_BYTES, 8), -1.0, dtype=torch.float32)
|
| 82 |
+
|
| 83 |
+
tensors["memory.write.sel.weight"] = sel_weight
|
| 84 |
+
tensors["memory.write.sel.bias"] = sel_bias
|
| 85 |
+
tensors["memory.write.nsel.weight"] = nsel_weight
|
| 86 |
+
tensors["memory.write.nsel.bias"] = nsel_bias
|
| 87 |
+
tensors["memory.write.and_old.weight"] = and_old_weight
|
| 88 |
+
tensors["memory.write.and_old.bias"] = and_old_bias
|
| 89 |
+
tensors["memory.write.and_new.weight"] = and_new_weight
|
| 90 |
+
tensors["memory.write.and_new.bias"] = and_new_bias
|
| 91 |
+
tensors["memory.write.or.weight"] = or_weight
|
| 92 |
+
tensors["memory.write.or.bias"] = or_bias
|
| 93 |
|
| 94 |
|
| 95 |
def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
|
|
|
|
| 99 |
for bit in range(8):
|
| 100 |
add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
|
| 101 |
add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
|
| 102 |
+
for bit in range(ADDR_BITS):
|
| 103 |
add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
|
| 104 |
|
| 105 |
|
| 106 |
def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
|
| 107 |
+
# Update manifest constants to reflect 16-bit address space.
|
| 108 |
+
tensors["manifest.memory_bytes"] = torch.tensor([float(MEM_BYTES)], dtype=torch.float32)
|
| 109 |
+
tensors["manifest.pc_width"] = torch.tensor([float(ADDR_BITS)], dtype=torch.float32)
|
| 110 |
+
tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
|
|
|
|
|
|
|
| 111 |
|
| 112 |
|
| 113 |
def write_manifest(path: Path, tensors: Dict[str, torch.Tensor]) -> None:
|
|
|
|
| 123 |
|
| 124 |
def main() -> None:
|
| 125 |
tensors = load_tensors(MODEL_PATH)
|
| 126 |
+
drop_prefixes(
|
| 127 |
+
tensors,
|
| 128 |
+
[
|
| 129 |
+
"memory.addr_decode.",
|
| 130 |
+
"memory.read.",
|
| 131 |
+
"memory.write.",
|
| 132 |
+
"control.fetch.ir.",
|
| 133 |
+
"control.load.",
|
| 134 |
+
"control.store.",
|
| 135 |
+
"control.mem_addr.",
|
| 136 |
+
],
|
| 137 |
+
)
|
| 138 |
+
add_decoder(tensors)
|
| 139 |
add_memory_read_mux(tensors)
|
| 140 |
add_memory_write_cells(tensors)
|
| 141 |
add_fetch_load_store_buffers(tensors)
|
eval/comprehensive_eval.py
CHANGED
|
@@ -1900,12 +1900,12 @@ class CircuitEvaluator:
|
|
| 1900 |
('manifest.alu_operations', 16),
|
| 1901 |
('manifest.flags', 4),
|
| 1902 |
('manifest.instruction_width', 16),
|
| 1903 |
-
('manifest.memory_bytes',
|
| 1904 |
-
('manifest.pc_width',
|
| 1905 |
('manifest.register_width', 8),
|
| 1906 |
('manifest.registers', 4),
|
| 1907 |
('manifest.turing_complete', 1),
|
| 1908 |
-
('manifest.version',
|
| 1909 |
]
|
| 1910 |
|
| 1911 |
failures = []
|
|
@@ -2200,61 +2200,79 @@ class CircuitEvaluator:
|
|
| 2200 |
# MEMORY CIRCUITS
|
| 2201 |
# =========================================================================
|
| 2202 |
|
| 2203 |
-
def
|
| 2204 |
-
"""Test
|
| 2205 |
failures = []
|
| 2206 |
passed = 0
|
| 2207 |
-
|
|
|
|
| 2208 |
|
| 2209 |
-
|
| 2210 |
-
|
|
|
|
|
|
|
|
|
|
| 2211 |
device=self.device, dtype=torch.float32)
|
| 2212 |
|
| 2213 |
-
|
| 2214 |
-
|
| 2215 |
-
|
| 2216 |
-
|
| 2217 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2218 |
|
| 2219 |
-
|
| 2220 |
-
|
| 2221 |
-
|
| 2222 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2223 |
|
| 2224 |
return TestResult('memory.addr_decode', passed, total, failures)
|
| 2225 |
|
| 2226 |
def test_memory_read_mux(self) -> TestResult:
|
| 2227 |
-
"""Test
|
| 2228 |
failures = []
|
| 2229 |
passed = 0
|
| 2230 |
total = 0
|
| 2231 |
|
| 2232 |
-
|
| 2233 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2234 |
|
| 2235 |
for addr in test_addrs:
|
| 2236 |
-
addr_bits = torch.tensor([(addr >> (
|
| 2237 |
device=self.device, dtype=torch.float32)
|
| 2238 |
|
| 2239 |
selects = []
|
| 2240 |
-
for out_idx in range(
|
| 2241 |
-
|
| 2242 |
-
|
| 2243 |
-
selects.append(heaviside((addr_bits * w).sum() + b).item())
|
| 2244 |
|
| 2245 |
for bit in range(8):
|
| 2246 |
and_vals = []
|
| 2247 |
-
for out_idx in range(
|
| 2248 |
mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
|
| 2249 |
inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
|
| 2250 |
-
w =
|
| 2251 |
-
b =
|
| 2252 |
and_vals.append(heaviside((inp * w).sum() + b).item())
|
| 2253 |
|
| 2254 |
or_inp = torch.tensor(and_vals, device=self.device)
|
| 2255 |
-
|
| 2256 |
-
b_or = self.reg.get(f'memory.read.bit{bit}.or.bias')
|
| 2257 |
-
output = heaviside((or_inp * w_or).sum() + b_or).item()
|
| 2258 |
expected = float((mem[addr] >> (7 - bit)) & 1)
|
| 2259 |
|
| 2260 |
total += 1
|
|
@@ -2271,49 +2289,58 @@ class CircuitEvaluator:
|
|
| 2271 |
passed = 0
|
| 2272 |
total = 0
|
| 2273 |
|
| 2274 |
-
|
|
|
|
| 2275 |
test_cases = [
|
| 2276 |
(0xA5, 42, 1.0),
|
| 2277 |
-
(0x3C,
|
| 2278 |
]
|
| 2279 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2280 |
for write_data, write_addr, write_en in test_cases:
|
| 2281 |
-
addr_bits = torch.tensor([(write_addr >> (
|
| 2282 |
device=self.device, dtype=torch.float32)
|
| 2283 |
|
| 2284 |
-
|
| 2285 |
-
|
| 2286 |
-
|
| 2287 |
-
|
| 2288 |
-
decodes.append(heaviside((addr_bits * w).sum() + b).item())
|
| 2289 |
|
| 2290 |
-
for out_idx in
|
| 2291 |
sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
|
| 2292 |
-
|
| 2293 |
-
b_sel = self.reg.get(f'memory.write.sel.addr{out_idx}.bias')
|
| 2294 |
-
sel = heaviside((sel_inp * w_sel).sum() + b_sel).item()
|
| 2295 |
|
| 2296 |
-
|
| 2297 |
-
b_nsel = self.reg.get(f'memory.write.nsel.addr{out_idx}.bias')
|
| 2298 |
-
nsel = heaviside(sel * w_nsel + b_nsel).item()
|
| 2299 |
|
| 2300 |
for bit in range(8):
|
| 2301 |
old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
|
| 2302 |
data_bit = float((write_data >> (7 - bit)) & 1)
|
| 2303 |
|
| 2304 |
inp_old = torch.tensor([old_bit, nsel], device=self.device)
|
| 2305 |
-
w_old =
|
| 2306 |
-
b_old =
|
| 2307 |
and_old = heaviside((inp_old * w_old).sum() + b_old).item()
|
| 2308 |
|
| 2309 |
inp_new = torch.tensor([data_bit, sel], device=self.device)
|
| 2310 |
-
w_new =
|
| 2311 |
-
b_new =
|
| 2312 |
and_new = heaviside((inp_new * w_new).sum() + b_new).item()
|
| 2313 |
|
| 2314 |
inp_or = torch.tensor([and_old, and_new], device=self.device)
|
| 2315 |
-
w_or =
|
| 2316 |
-
b_or =
|
| 2317 |
output = heaviside((inp_or * w_or).sum() + b_or).item()
|
| 2318 |
|
| 2319 |
expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
|
|
@@ -2339,15 +2366,88 @@ class CircuitEvaluator:
|
|
| 2339 |
passed += 2
|
| 2340 |
|
| 2341 |
for bit in range(8):
|
| 2342 |
-
for name in ['control.load', 'control.store'
|
| 2343 |
total += 2
|
| 2344 |
if self.reg.has(f'{name}.bit{bit}.weight'):
|
| 2345 |
self.reg.get(f'{name}.bit{bit}.weight')
|
| 2346 |
self.reg.get(f'{name}.bit{bit}.bias')
|
| 2347 |
passed += 2
|
| 2348 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2349 |
return TestResult('control.fetch_load_store', passed, total, [])
|
| 2350 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2351 |
# =========================================================================
|
| 2352 |
# ARITHMETIC - ADDITIONAL CIRCUITS
|
| 2353 |
# =========================================================================
|
|
@@ -3010,10 +3110,11 @@ class ComprehensiveEvaluator:
|
|
| 3010 |
# Memory
|
| 3011 |
if verbose:
|
| 3012 |
print("\n=== MEMORY ===")
|
| 3013 |
-
self._run_test(self.evaluator.
|
| 3014 |
self._run_test(self.evaluator.test_memory_read_mux, verbose)
|
| 3015 |
self._run_test(self.evaluator.test_memory_write_cells, verbose)
|
| 3016 |
self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
|
|
|
|
| 3017 |
|
| 3018 |
# Error detection
|
| 3019 |
if verbose:
|
|
|
|
| 1900 |
('manifest.alu_operations', 16),
|
| 1901 |
('manifest.flags', 4),
|
| 1902 |
('manifest.instruction_width', 16),
|
| 1903 |
+
('manifest.memory_bytes', 65536),
|
| 1904 |
+
('manifest.pc_width', 16),
|
| 1905 |
('manifest.register_width', 8),
|
| 1906 |
('manifest.registers', 4),
|
| 1907 |
('manifest.turing_complete', 1),
|
| 1908 |
+
('manifest.version', 3),
|
| 1909 |
]
|
| 1910 |
|
| 1911 |
failures = []
|
|
|
|
| 2200 |
# MEMORY CIRCUITS
|
| 2201 |
# =========================================================================
|
| 2202 |
|
| 2203 |
+
def test_memory_decoder_16to65536(self) -> TestResult:
|
| 2204 |
+
"""Test 16-to-65536 address decoder with full-address coverage."""
|
| 2205 |
failures = []
|
| 2206 |
passed = 0
|
| 2207 |
+
mem_size = 1 << 16
|
| 2208 |
+
total = mem_size * 2
|
| 2209 |
|
| 2210 |
+
w_all = self.reg.get('memory.addr_decode.weight')
|
| 2211 |
+
b_all = self.reg.get('memory.addr_decode.bias')
|
| 2212 |
+
|
| 2213 |
+
for addr in range(mem_size):
|
| 2214 |
+
addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
|
| 2215 |
device=self.device, dtype=torch.float32)
|
| 2216 |
|
| 2217 |
+
out_idx = addr
|
| 2218 |
+
w = w_all[out_idx]
|
| 2219 |
+
b = b_all[out_idx]
|
| 2220 |
+
output = heaviside((addr_bits * w).sum() + b).item()
|
| 2221 |
+
expected = 1.0
|
| 2222 |
+
if output == expected:
|
| 2223 |
+
passed += 1
|
| 2224 |
+
elif len(failures) < 20:
|
| 2225 |
+
failures.append(((addr, out_idx), expected, output))
|
| 2226 |
|
| 2227 |
+
out_idx = (addr + 1) & 0xFFFF
|
| 2228 |
+
w = w_all[out_idx]
|
| 2229 |
+
b = b_all[out_idx]
|
| 2230 |
+
output = heaviside((addr_bits * w).sum() + b).item()
|
| 2231 |
+
expected = 0.0
|
| 2232 |
+
if output == expected:
|
| 2233 |
+
passed += 1
|
| 2234 |
+
elif len(failures) < 20:
|
| 2235 |
+
failures.append(((addr, out_idx), expected, output))
|
| 2236 |
|
| 2237 |
return TestResult('memory.addr_decode', passed, total, failures)
|
| 2238 |
|
| 2239 |
def test_memory_read_mux(self) -> TestResult:
|
| 2240 |
+
"""Test 64KB memory read mux for a few representative addresses."""
|
| 2241 |
failures = []
|
| 2242 |
passed = 0
|
| 2243 |
total = 0
|
| 2244 |
|
| 2245 |
+
mem_size = 1 << 16
|
| 2246 |
+
mem = [(addr * 37) & 0xFF for addr in range(mem_size)]
|
| 2247 |
+
test_addrs = [0x0000, 0x1234, 0xFFFF]
|
| 2248 |
+
|
| 2249 |
+
dec_w = self.reg.get('memory.addr_decode.weight')
|
| 2250 |
+
dec_b = self.reg.get('memory.addr_decode.bias')
|
| 2251 |
+
and_w = self.reg.get('memory.read.and.weight')
|
| 2252 |
+
and_b = self.reg.get('memory.read.and.bias')
|
| 2253 |
+
or_w = self.reg.get('memory.read.or.weight')
|
| 2254 |
+
or_b = self.reg.get('memory.read.or.bias')
|
| 2255 |
|
| 2256 |
for addr in test_addrs:
|
| 2257 |
+
addr_bits = torch.tensor([(addr >> (15 - i)) & 1 for i in range(16)],
|
| 2258 |
device=self.device, dtype=torch.float32)
|
| 2259 |
|
| 2260 |
selects = []
|
| 2261 |
+
for out_idx in range(mem_size):
|
| 2262 |
+
output = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
|
| 2263 |
+
selects.append(output)
|
|
|
|
| 2264 |
|
| 2265 |
for bit in range(8):
|
| 2266 |
and_vals = []
|
| 2267 |
+
for out_idx in range(mem_size):
|
| 2268 |
mem_bit = float((mem[out_idx] >> (7 - bit)) & 1)
|
| 2269 |
inp = torch.tensor([mem_bit, selects[out_idx]], device=self.device)
|
| 2270 |
+
w = and_w[bit, out_idx]
|
| 2271 |
+
b = and_b[bit, out_idx]
|
| 2272 |
and_vals.append(heaviside((inp * w).sum() + b).item())
|
| 2273 |
|
| 2274 |
or_inp = torch.tensor(and_vals, device=self.device)
|
| 2275 |
+
output = heaviside((or_inp * or_w[bit]).sum() + or_b[bit]).item()
|
|
|
|
|
|
|
| 2276 |
expected = float((mem[addr] >> (7 - bit)) & 1)
|
| 2277 |
|
| 2278 |
total += 1
|
|
|
|
| 2289 |
passed = 0
|
| 2290 |
total = 0
|
| 2291 |
|
| 2292 |
+
mem_size = 1 << 16
|
| 2293 |
+
mem = [(addr * 13 + 7) & 0xFF for addr in range(mem_size)]
|
| 2294 |
test_cases = [
|
| 2295 |
(0xA5, 42, 1.0),
|
| 2296 |
+
(0x3C, 0xBEEF, 0.0),
|
| 2297 |
]
|
| 2298 |
|
| 2299 |
+
dec_w = self.reg.get('memory.addr_decode.weight')
|
| 2300 |
+
dec_b = self.reg.get('memory.addr_decode.bias')
|
| 2301 |
+
sel_w = self.reg.get('memory.write.sel.weight')
|
| 2302 |
+
sel_b = self.reg.get('memory.write.sel.bias')
|
| 2303 |
+
nsel_w = self.reg.get('memory.write.nsel.weight')
|
| 2304 |
+
nsel_b = self.reg.get('memory.write.nsel.bias')
|
| 2305 |
+
and_old_w = self.reg.get('memory.write.and_old.weight')
|
| 2306 |
+
and_old_b = self.reg.get('memory.write.and_old.bias')
|
| 2307 |
+
and_new_w = self.reg.get('memory.write.and_new.weight')
|
| 2308 |
+
and_new_b = self.reg.get('memory.write.and_new.bias')
|
| 2309 |
+
or_w = self.reg.get('memory.write.or.weight')
|
| 2310 |
+
or_b = self.reg.get('memory.write.or.bias')
|
| 2311 |
+
|
| 2312 |
for write_data, write_addr, write_en in test_cases:
|
| 2313 |
+
addr_bits = torch.tensor([(write_addr >> (15 - i)) & 1 for i in range(16)],
|
| 2314 |
device=self.device, dtype=torch.float32)
|
| 2315 |
|
| 2316 |
+
sample_addrs = [write_addr, (write_addr + 1) & 0xFFFF, 0x0000, 0xFFFF]
|
| 2317 |
+
decodes = {}
|
| 2318 |
+
for out_idx in sample_addrs:
|
| 2319 |
+
decodes[out_idx] = heaviside((addr_bits * dec_w[out_idx]).sum() + dec_b[out_idx]).item()
|
|
|
|
| 2320 |
|
| 2321 |
+
for out_idx in sample_addrs:
|
| 2322 |
sel_inp = torch.tensor([decodes[out_idx], write_en], device=self.device)
|
| 2323 |
+
sel = heaviside((sel_inp * sel_w[out_idx]).sum() + sel_b[out_idx]).item()
|
|
|
|
|
|
|
| 2324 |
|
| 2325 |
+
nsel = heaviside(sel * nsel_w[out_idx] + nsel_b[out_idx]).item()
|
|
|
|
|
|
|
| 2326 |
|
| 2327 |
for bit in range(8):
|
| 2328 |
old_bit = float((mem[out_idx] >> (7 - bit)) & 1)
|
| 2329 |
data_bit = float((write_data >> (7 - bit)) & 1)
|
| 2330 |
|
| 2331 |
inp_old = torch.tensor([old_bit, nsel], device=self.device)
|
| 2332 |
+
w_old = and_old_w[out_idx, bit]
|
| 2333 |
+
b_old = and_old_b[out_idx, bit]
|
| 2334 |
and_old = heaviside((inp_old * w_old).sum() + b_old).item()
|
| 2335 |
|
| 2336 |
inp_new = torch.tensor([data_bit, sel], device=self.device)
|
| 2337 |
+
w_new = and_new_w[out_idx, bit]
|
| 2338 |
+
b_new = and_new_b[out_idx, bit]
|
| 2339 |
and_new = heaviside((inp_new * w_new).sum() + b_new).item()
|
| 2340 |
|
| 2341 |
inp_or = torch.tensor([and_old, and_new], device=self.device)
|
| 2342 |
+
w_or = or_w[out_idx, bit]
|
| 2343 |
+
b_or = or_b[out_idx, bit]
|
| 2344 |
output = heaviside((inp_or * w_or).sum() + b_or).item()
|
| 2345 |
|
| 2346 |
expected = data_bit if (write_en == 1.0 and out_idx == write_addr) else old_bit
|
|
|
|
| 2366 |
passed += 2
|
| 2367 |
|
| 2368 |
for bit in range(8):
|
| 2369 |
+
for name in ['control.load', 'control.store']:
|
| 2370 |
total += 2
|
| 2371 |
if self.reg.has(f'{name}.bit{bit}.weight'):
|
| 2372 |
self.reg.get(f'{name}.bit{bit}.weight')
|
| 2373 |
self.reg.get(f'{name}.bit{bit}.bias')
|
| 2374 |
passed += 2
|
| 2375 |
|
| 2376 |
+
for bit in range(16):
|
| 2377 |
+
total += 2
|
| 2378 |
+
if self.reg.has(f'control.mem_addr.bit{bit}.weight'):
|
| 2379 |
+
self.reg.get(f'control.mem_addr.bit{bit}.weight')
|
| 2380 |
+
self.reg.get(f'control.mem_addr.bit{bit}.bias')
|
| 2381 |
+
passed += 2
|
| 2382 |
+
|
| 2383 |
return TestResult('control.fetch_load_store', passed, total, [])
|
| 2384 |
|
| 2385 |
+
def test_packed_memory_routing(self) -> TestResult:
|
| 2386 |
+
"""Validate packed memory tensor routing and shapes."""
|
| 2387 |
+
failures = []
|
| 2388 |
+
passed = 0
|
| 2389 |
+
total = 0
|
| 2390 |
+
|
| 2391 |
+
circuits = ["memory.addr_decode", "memory.read", "memory.write"]
|
| 2392 |
+
routing = self.routing_eval.routing.get("circuits", {})
|
| 2393 |
+
routing_keys = set()
|
| 2394 |
+
|
| 2395 |
+
for circuit in circuits:
|
| 2396 |
+
total += 1
|
| 2397 |
+
if circuit not in routing:
|
| 2398 |
+
failures.append((circuit, "routing", "missing"))
|
| 2399 |
+
continue
|
| 2400 |
+
passed += 1
|
| 2401 |
+
internal = routing[circuit].get("internal", {})
|
| 2402 |
+
for value in internal.values():
|
| 2403 |
+
if isinstance(value, list):
|
| 2404 |
+
routing_keys.update(value)
|
| 2405 |
+
|
| 2406 |
+
total += 1
|
| 2407 |
+
if routing_keys and all(key for key in routing_keys):
|
| 2408 |
+
passed += 1
|
| 2409 |
+
else:
|
| 2410 |
+
failures.append(("packed_keys", "non-empty", "empty"))
|
| 2411 |
+
|
| 2412 |
+
mem_bytes = int(self.reg.get("manifest.memory_bytes").item()) if self.reg.has("manifest.memory_bytes") else 65536
|
| 2413 |
+
pc_width = int(self.reg.get("manifest.pc_width").item()) if self.reg.has("manifest.pc_width") else 16
|
| 2414 |
+
reg_width = int(self.reg.get("manifest.register_width").item()) if self.reg.has("manifest.register_width") else 8
|
| 2415 |
+
|
| 2416 |
+
expected_shapes = {
|
| 2417 |
+
"memory.addr_decode.weight": (mem_bytes, pc_width),
|
| 2418 |
+
"memory.addr_decode.bias": (mem_bytes,),
|
| 2419 |
+
"memory.read.and.weight": (reg_width, mem_bytes, 2),
|
| 2420 |
+
"memory.read.and.bias": (reg_width, mem_bytes),
|
| 2421 |
+
"memory.read.or.weight": (reg_width, mem_bytes),
|
| 2422 |
+
"memory.read.or.bias": (reg_width,),
|
| 2423 |
+
"memory.write.sel.weight": (mem_bytes, 2),
|
| 2424 |
+
"memory.write.sel.bias": (mem_bytes,),
|
| 2425 |
+
"memory.write.nsel.weight": (mem_bytes, 1),
|
| 2426 |
+
"memory.write.nsel.bias": (mem_bytes,),
|
| 2427 |
+
"memory.write.and_old.weight": (mem_bytes, reg_width, 2),
|
| 2428 |
+
"memory.write.and_old.bias": (mem_bytes, reg_width),
|
| 2429 |
+
"memory.write.and_new.weight": (mem_bytes, reg_width, 2),
|
| 2430 |
+
"memory.write.and_new.bias": (mem_bytes, reg_width),
|
| 2431 |
+
"memory.write.or.weight": (mem_bytes, reg_width, 2),
|
| 2432 |
+
"memory.write.or.bias": (mem_bytes, reg_width),
|
| 2433 |
+
}
|
| 2434 |
+
|
| 2435 |
+
for key, expected in expected_shapes.items():
|
| 2436 |
+
total += 1
|
| 2437 |
+
if key not in routing_keys:
|
| 2438 |
+
failures.append((key, "routing_ref", "missing"))
|
| 2439 |
+
continue
|
| 2440 |
+
if not self.reg.has(key):
|
| 2441 |
+
failures.append((key, "tensor_exists", "missing"))
|
| 2442 |
+
continue
|
| 2443 |
+
actual = tuple(self.reg.get(key).shape)
|
| 2444 |
+
if actual == expected:
|
| 2445 |
+
passed += 1
|
| 2446 |
+
else:
|
| 2447 |
+
failures.append((key, expected, actual))
|
| 2448 |
+
|
| 2449 |
+
return TestResult('memory.packed_routing', passed, total, failures)
|
| 2450 |
+
|
| 2451 |
# =========================================================================
|
| 2452 |
# ARITHMETIC - ADDITIONAL CIRCUITS
|
| 2453 |
# =========================================================================
|
|
|
|
| 3110 |
# Memory
|
| 3111 |
if verbose:
|
| 3112 |
print("\n=== MEMORY ===")
|
| 3113 |
+
self._run_test(self.evaluator.test_memory_decoder_16to65536, verbose)
|
| 3114 |
self._run_test(self.evaluator.test_memory_read_mux, verbose)
|
| 3115 |
self._run_test(self.evaluator.test_memory_write_cells, verbose)
|
| 3116 |
self._run_test(self.evaluator.test_control_fetch_load_store, verbose)
|
| 3117 |
+
self._run_test(self.evaluator.test_packed_memory_routing, verbose)
|
| 3118 |
|
| 3119 |
# Error detection
|
| 3120 |
if verbose:
|
eval/cpu_cycle_test.py
CHANGED
|
@@ -7,8 +7,11 @@ from pathlib import Path
|
|
| 7 |
|
| 8 |
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
| 9 |
|
|
|
|
|
|
|
| 10 |
from cpu.cycle import run_until_halt
|
| 11 |
-
from cpu.state import CPUState
|
|
|
|
| 12 |
|
| 13 |
|
| 14 |
def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
|
|
@@ -16,28 +19,36 @@ def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
|
|
| 16 |
|
| 17 |
|
| 18 |
def write_instr(mem, addr, instr):
|
| 19 |
-
mem[addr &
|
| 20 |
-
mem[(addr + 1) &
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
|
| 23 |
def main() -> None:
|
| 24 |
-
mem = [0] *
|
| 25 |
|
| 26 |
-
write_instr(mem,
|
| 27 |
-
|
| 28 |
-
write_instr(mem,
|
| 29 |
-
|
| 30 |
-
write_instr(mem,
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
mem[
|
| 33 |
-
mem[
|
| 34 |
|
| 35 |
state = CPUState(
|
| 36 |
pc=0,
|
| 37 |
ir=0,
|
| 38 |
regs=[0, 0, 0, 0],
|
| 39 |
flags=[0, 0, 0, 0],
|
| 40 |
-
sp=
|
| 41 |
ctrl=[0, 0, 0, 0],
|
| 42 |
mem=mem,
|
| 43 |
)
|
|
@@ -46,9 +57,29 @@ def main() -> None:
|
|
| 46 |
|
| 47 |
assert final.ctrl[0] == 1, "HALT flag not set"
|
| 48 |
assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
|
| 49 |
-
assert final.mem[
|
| 50 |
assert cycles <= 10, f"Unexpected cycle count: {cycles}"
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
print("cpu_cycle_test: ok")
|
| 53 |
|
| 54 |
|
|
|
|
| 7 |
|
| 8 |
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
| 9 |
|
| 10 |
+
import torch
|
| 11 |
+
|
| 12 |
from cpu.cycle import run_until_halt
|
| 13 |
+
from cpu.state import CPUState, pack_state, unpack_state
|
| 14 |
+
from cpu.threshold_cpu import ThresholdCPU
|
| 15 |
|
| 16 |
|
| 17 |
def encode(opcode: int, rd: int, rs: int, imm8: int) -> int:
|
|
|
|
| 19 |
|
| 20 |
|
| 21 |
def write_instr(mem, addr, instr):
|
| 22 |
+
mem[addr & 0xFFFF] = (instr >> 8) & 0xFF
|
| 23 |
+
mem[(addr + 1) & 0xFFFF] = instr & 0xFF
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def write_addr(mem, addr, value):
|
| 27 |
+
mem[addr & 0xFFFF] = (value >> 8) & 0xFF
|
| 28 |
+
mem[(addr + 1) & 0xFFFF] = value & 0xFF
|
| 29 |
|
| 30 |
|
| 31 |
def main() -> None:
|
| 32 |
+
mem = [0] * 65536
|
| 33 |
|
| 34 |
+
write_instr(mem, 0x0000, encode(0xA, 0, 0, 0x00)) # LOAD R0, [addr]
|
| 35 |
+
write_addr(mem, 0x0002, 0x0100)
|
| 36 |
+
write_instr(mem, 0x0004, encode(0xA, 1, 0, 0x00)) # LOAD R1, [addr]
|
| 37 |
+
write_addr(mem, 0x0006, 0x0101)
|
| 38 |
+
write_instr(mem, 0x0008, encode(0x0, 0, 1, 0x00)) # ADD R0, R1
|
| 39 |
+
write_instr(mem, 0x000A, encode(0xB, 0, 0, 0x00)) # STORE R0 -> [addr]
|
| 40 |
+
write_addr(mem, 0x000C, 0x0102)
|
| 41 |
+
write_instr(mem, 0x000E, encode(0xF, 0, 0, 0x00)) # HALT
|
| 42 |
|
| 43 |
+
mem[0x0100] = 5
|
| 44 |
+
mem[0x0101] = 7
|
| 45 |
|
| 46 |
state = CPUState(
|
| 47 |
pc=0,
|
| 48 |
ir=0,
|
| 49 |
regs=[0, 0, 0, 0],
|
| 50 |
flags=[0, 0, 0, 0],
|
| 51 |
+
sp=0xFFFE,
|
| 52 |
ctrl=[0, 0, 0, 0],
|
| 53 |
mem=mem,
|
| 54 |
)
|
|
|
|
| 57 |
|
| 58 |
assert final.ctrl[0] == 1, "HALT flag not set"
|
| 59 |
assert final.regs[0] == 12, f"R0 expected 12, got {final.regs[0]}"
|
| 60 |
+
assert final.mem[0x0102] == 12, f"MEM[0x0102] expected 12, got {final.mem[0x0102]}"
|
| 61 |
assert cycles <= 10, f"Unexpected cycle count: {cycles}"
|
| 62 |
|
| 63 |
+
# Threshold-weight runtime should match reference behavior.
|
| 64 |
+
threshold_cpu = ThresholdCPU()
|
| 65 |
+
t_final, t_cycles = threshold_cpu.run_until_halt(state, max_cycles=20)
|
| 66 |
+
|
| 67 |
+
assert t_final.ctrl[0] == 1, "Threshold HALT flag not set"
|
| 68 |
+
assert t_final.regs[0] == final.regs[0], f"Threshold R0 mismatch: {t_final.regs[0]} != {final.regs[0]}"
|
| 69 |
+
assert t_final.mem[0x0102] == final.mem[0x0102], (
|
| 70 |
+
f"Threshold MEM[0x0102] mismatch: {t_final.mem[0x0102]} != {final.mem[0x0102]}"
|
| 71 |
+
)
|
| 72 |
+
assert t_cycles == cycles, f"Threshold cycle count mismatch: {t_cycles} != {cycles}"
|
| 73 |
+
|
| 74 |
+
# Validate forward() state I/O.
|
| 75 |
+
bits = torch.tensor(pack_state(state), dtype=torch.float32)
|
| 76 |
+
out_bits = threshold_cpu.forward(bits, max_cycles=20)
|
| 77 |
+
out_state = unpack_state([int(b) for b in out_bits.tolist()])
|
| 78 |
+
assert out_state.regs[0] == final.regs[0], f"Forward R0 mismatch: {out_state.regs[0]} != {final.regs[0]}"
|
| 79 |
+
assert out_state.mem[0x0102] == final.mem[0x0102], (
|
| 80 |
+
f"Forward MEM[0x0102] mismatch: {out_state.mem[0x0102]} != {final.mem[0x0102]}"
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
print("cpu_cycle_test: ok")
|
| 84 |
|
| 85 |
|
eval/iron_eval.py
CHANGED
|
@@ -8,9 +8,11 @@ GPU-optimized for population-based evolution.
|
|
| 8 |
Target: ~40GB VRAM on RTX 6000 Ada (4M population)
|
| 9 |
"""
|
| 10 |
|
| 11 |
-
import
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
| 14 |
|
| 15 |
|
| 16 |
def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
|
|
@@ -32,10 +34,20 @@ class BatchedFitnessEvaluator:
|
|
| 32 |
GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
|
| 33 |
"""
|
| 34 |
|
| 35 |
-
def __init__(self, device='cuda'):
|
| 36 |
-
self.device = device
|
| 37 |
-
self.
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
def _setup_tests(self):
|
| 40 |
"""Pre-compute all test vectors."""
|
| 41 |
d = self.device
|
|
@@ -3146,10 +3158,10 @@ class BatchedFitnessEvaluator:
|
|
| 3146 |
|
| 3147 |
return scores, total_tests
|
| 3148 |
|
| 3149 |
-
def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
|
| 3150 |
-
"""
|
| 3151 |
-
MANIFEST - Verify manifest values are preserved.
|
| 3152 |
-
"""
|
| 3153 |
pop_size = next(iter(pop.values())).shape[0]
|
| 3154 |
scores = torch.zeros(pop_size, device=self.device)
|
| 3155 |
total_tests = 0
|
|
@@ -3158,12 +3170,12 @@ class BatchedFitnessEvaluator:
|
|
| 3158 |
('manifest.alu_operations', 16),
|
| 3159 |
('manifest.flags', 4),
|
| 3160 |
('manifest.instruction_width', 16),
|
| 3161 |
-
('manifest.memory_bytes',
|
| 3162 |
-
('manifest.pc_width',
|
| 3163 |
('manifest.register_width', 8),
|
| 3164 |
('manifest.registers', 4),
|
| 3165 |
('manifest.turing_complete', 1),
|
| 3166 |
-
('manifest.version',
|
| 3167 |
]
|
| 3168 |
|
| 3169 |
for tensor_name, expected_value in manifest_tensors:
|
|
@@ -3175,7 +3187,79 @@ class BatchedFitnessEvaluator:
|
|
| 3175 |
if debug and pop_size == 1:
|
| 3176 |
print(f" Manifest: {int(scores[0].item())}/{total_tests}")
|
| 3177 |
|
| 3178 |
-
return scores, total_tests
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3179 |
|
| 3180 |
def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
|
| 3181 |
"""
|
|
@@ -3328,13 +3412,17 @@ class BatchedFitnessEvaluator:
|
|
| 3328 |
total_scores += incdec_scores
|
| 3329 |
total_tests += incdec_tests
|
| 3330 |
|
| 3331 |
-
manifest_scores, manifest_tests = self._test_manifest(pop, debug)
|
| 3332 |
-
total_scores += manifest_scores
|
| 3333 |
-
total_tests += manifest_tests
|
| 3334 |
-
|
| 3335 |
-
|
| 3336 |
-
total_scores +=
|
| 3337 |
-
total_tests +=
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3338 |
|
| 3339 |
minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
|
| 3340 |
total_scores += minmax_scores
|
|
|
|
| 8 |
Target: ~40GB VRAM on RTX 6000 Ada (4M population)
|
| 9 |
"""
|
| 10 |
|
| 11 |
+
import json
|
| 12 |
+
import os
|
| 13 |
+
import torch
|
| 14 |
+
from typing import Dict, Tuple
|
| 15 |
+
from safetensors import safe_open
|
| 16 |
|
| 17 |
|
| 18 |
def load_model(base_path: str = ".") -> Dict[str, torch.Tensor]:
|
|
|
|
| 34 |
GPU-batched fitness evaluator. Tests ALL circuits comprehensively.
|
| 35 |
"""
|
| 36 |
|
| 37 |
+
def __init__(self, device='cuda'):
|
| 38 |
+
self.device = device
|
| 39 |
+
self.routing = self._load_routing()
|
| 40 |
+
self._setup_tests()
|
| 41 |
+
|
| 42 |
+
def _load_routing(self) -> Dict:
|
| 43 |
+
"""Load routing.json for packed memory validation."""
|
| 44 |
+
root = os.path.dirname(os.path.dirname(__file__))
|
| 45 |
+
path = os.path.join(root, "routing.json")
|
| 46 |
+
if os.path.exists(path):
|
| 47 |
+
with open(path, "r", encoding="utf-8") as fh:
|
| 48 |
+
return json.load(fh)
|
| 49 |
+
return {"circuits": {}}
|
| 50 |
+
|
| 51 |
def _setup_tests(self):
|
| 52 |
"""Pre-compute all test vectors."""
|
| 53 |
d = self.device
|
|
|
|
| 3158 |
|
| 3159 |
return scores, total_tests
|
| 3160 |
|
| 3161 |
+
def _test_manifest(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
|
| 3162 |
+
"""
|
| 3163 |
+
MANIFEST - Verify manifest values are preserved.
|
| 3164 |
+
"""
|
| 3165 |
pop_size = next(iter(pop.values())).shape[0]
|
| 3166 |
scores = torch.zeros(pop_size, device=self.device)
|
| 3167 |
total_tests = 0
|
|
|
|
| 3170 |
('manifest.alu_operations', 16),
|
| 3171 |
('manifest.flags', 4),
|
| 3172 |
('manifest.instruction_width', 16),
|
| 3173 |
+
('manifest.memory_bytes', 65536),
|
| 3174 |
+
('manifest.pc_width', 16),
|
| 3175 |
('manifest.register_width', 8),
|
| 3176 |
('manifest.registers', 4),
|
| 3177 |
('manifest.turing_complete', 1),
|
| 3178 |
+
('manifest.version', 3),
|
| 3179 |
]
|
| 3180 |
|
| 3181 |
for tensor_name, expected_value in manifest_tensors:
|
|
|
|
| 3187 |
if debug and pop_size == 1:
|
| 3188 |
print(f" Manifest: {int(scores[0].item())}/{total_tests}")
|
| 3189 |
|
| 3190 |
+
return scores, total_tests
|
| 3191 |
+
|
| 3192 |
+
def _test_packed_memory_routing(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
|
| 3193 |
+
"""
|
| 3194 |
+
PACKED MEMORY ROUTING - Validate routing references and tensor shapes.
|
| 3195 |
+
"""
|
| 3196 |
+
pop_size = next(iter(pop.values())).shape[0]
|
| 3197 |
+
scores = torch.zeros(pop_size, device=self.device)
|
| 3198 |
+
total_tests = 0
|
| 3199 |
+
|
| 3200 |
+
routing = self.routing.get("circuits", {})
|
| 3201 |
+
circuits = ["memory.addr_decode", "memory.read", "memory.write"]
|
| 3202 |
+
routing_keys = set()
|
| 3203 |
+
|
| 3204 |
+
for circuit in circuits:
|
| 3205 |
+
total_tests += 1
|
| 3206 |
+
if circuit not in routing:
|
| 3207 |
+
continue
|
| 3208 |
+
scores += 1
|
| 3209 |
+
internal = routing[circuit].get("internal", {})
|
| 3210 |
+
for value in internal.values():
|
| 3211 |
+
if isinstance(value, list):
|
| 3212 |
+
routing_keys.update(value)
|
| 3213 |
+
|
| 3214 |
+
total_tests += 1
|
| 3215 |
+
if routing_keys and all(key for key in routing_keys):
|
| 3216 |
+
scores += 1
|
| 3217 |
+
|
| 3218 |
+
if "manifest.memory_bytes" in pop:
|
| 3219 |
+
mem_bytes = int(pop["manifest.memory_bytes"][0].item())
|
| 3220 |
+
else:
|
| 3221 |
+
mem_bytes = 65536
|
| 3222 |
+
if "manifest.pc_width" in pop:
|
| 3223 |
+
pc_width = int(pop["manifest.pc_width"][0].item())
|
| 3224 |
+
else:
|
| 3225 |
+
pc_width = 16
|
| 3226 |
+
if "manifest.register_width" in pop:
|
| 3227 |
+
reg_width = int(pop["manifest.register_width"][0].item())
|
| 3228 |
+
else:
|
| 3229 |
+
reg_width = 8
|
| 3230 |
+
|
| 3231 |
+
expected_shapes = {
|
| 3232 |
+
"memory.addr_decode.weight": (pop_size, mem_bytes, pc_width),
|
| 3233 |
+
"memory.addr_decode.bias": (pop_size, mem_bytes),
|
| 3234 |
+
"memory.read.and.weight": (pop_size, reg_width, mem_bytes, 2),
|
| 3235 |
+
"memory.read.and.bias": (pop_size, reg_width, mem_bytes),
|
| 3236 |
+
"memory.read.or.weight": (pop_size, reg_width, mem_bytes),
|
| 3237 |
+
"memory.read.or.bias": (pop_size, reg_width),
|
| 3238 |
+
"memory.write.sel.weight": (pop_size, mem_bytes, 2),
|
| 3239 |
+
"memory.write.sel.bias": (pop_size, mem_bytes),
|
| 3240 |
+
"memory.write.nsel.weight": (pop_size, mem_bytes, 1),
|
| 3241 |
+
"memory.write.nsel.bias": (pop_size, mem_bytes),
|
| 3242 |
+
"memory.write.and_old.weight": (pop_size, mem_bytes, reg_width, 2),
|
| 3243 |
+
"memory.write.and_old.bias": (pop_size, mem_bytes, reg_width),
|
| 3244 |
+
"memory.write.and_new.weight": (pop_size, mem_bytes, reg_width, 2),
|
| 3245 |
+
"memory.write.and_new.bias": (pop_size, mem_bytes, reg_width),
|
| 3246 |
+
"memory.write.or.weight": (pop_size, mem_bytes, reg_width, 2),
|
| 3247 |
+
"memory.write.or.bias": (pop_size, mem_bytes, reg_width),
|
| 3248 |
+
}
|
| 3249 |
+
|
| 3250 |
+
for key, expected in expected_shapes.items():
|
| 3251 |
+
total_tests += 1
|
| 3252 |
+
if key not in routing_keys:
|
| 3253 |
+
continue
|
| 3254 |
+
if key not in pop:
|
| 3255 |
+
continue
|
| 3256 |
+
if tuple(pop[key].shape) == expected:
|
| 3257 |
+
scores += 1
|
| 3258 |
+
|
| 3259 |
+
if debug and pop_size == 1:
|
| 3260 |
+
print(f" Packed Memory Routing: {int(scores[0].item())}/{total_tests}")
|
| 3261 |
+
|
| 3262 |
+
return scores, total_tests
|
| 3263 |
|
| 3264 |
def _test_equality_circuit(self, pop: Dict, debug: bool = False) -> Tuple[torch.Tensor, int]:
|
| 3265 |
"""
|
|
|
|
| 3412 |
total_scores += incdec_scores
|
| 3413 |
total_tests += incdec_tests
|
| 3414 |
|
| 3415 |
+
manifest_scores, manifest_tests = self._test_manifest(pop, debug)
|
| 3416 |
+
total_scores += manifest_scores
|
| 3417 |
+
total_tests += manifest_tests
|
| 3418 |
+
|
| 3419 |
+
packed_scores, packed_tests = self._test_packed_memory_routing(pop, debug)
|
| 3420 |
+
total_scores += packed_scores
|
| 3421 |
+
total_tests += packed_tests
|
| 3422 |
+
|
| 3423 |
+
eq_scores, eq_tests = self._test_equality_circuit(pop, debug)
|
| 3424 |
+
total_scores += eq_scores
|
| 3425 |
+
total_tests += eq_tests
|
| 3426 |
|
| 3427 |
minmax_scores, minmax_tests = self._test_minmax_circuits(pop, debug)
|
| 3428 |
total_scores += minmax_scores
|
neural_computer.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ba0c0e7e6286bc5a55d66ecbda8a1d43084a72e6a960d898b268fb6558c473a4
|
| 3 |
+
size 33725820
|
routing.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
routing/generate_routing.py
CHANGED
|
@@ -5,7 +5,10 @@ Maps each gate to its input sources.
|
|
| 5 |
|
| 6 |
import json
|
| 7 |
from safetensors import safe_open
|
| 8 |
-
from collections import defaultdict
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
def get_all_gates(tensors_path):
|
| 11 |
"""Extract all unique gate paths from tensors file."""
|
|
@@ -423,12 +426,12 @@ def generate_manifest_routing():
|
|
| 423 |
'manifest.alu_operations': {'type': 'constant', 'value': 16},
|
| 424 |
'manifest.flags': {'type': 'constant', 'value': 4},
|
| 425 |
'manifest.instruction_width': {'type': 'constant', 'value': 16},
|
| 426 |
-
'manifest.memory_bytes': {'type': 'constant', 'value':
|
| 427 |
-
'manifest.pc_width': {'type': 'constant', 'value':
|
| 428 |
'manifest.register_width': {'type': 'constant', 'value': 8},
|
| 429 |
'manifest.registers': {'type': 'constant', 'value': 4},
|
| 430 |
'manifest.turing_complete': {'type': 'constant', 'value': 1},
|
| 431 |
-
'manifest.version': {'type': 'constant', 'value':
|
| 432 |
}
|
| 433 |
|
| 434 |
|
|
@@ -1032,9 +1035,9 @@ def generate_control_routing():
|
|
| 1032 |
'internal': internal_store
|
| 1033 |
}
|
| 1034 |
|
| 1035 |
-
internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(
|
| 1036 |
routing['control.mem_addr'] = {
|
| 1037 |
-
'inputs': ['$addr[0:
|
| 1038 |
'type': 'buffer',
|
| 1039 |
'internal': internal_mem_addr
|
| 1040 |
}
|
|
@@ -1043,52 +1046,38 @@ def generate_control_routing():
|
|
| 1043 |
|
| 1044 |
|
| 1045 |
def generate_memory_routing():
|
| 1046 |
-
"""Generate routing for memory decoder, read mux, and write cell update."""
|
| 1047 |
routing = {}
|
| 1048 |
|
| 1049 |
-
addr_bits = [f'$addr[{i}]' for i in range(8)]
|
| 1050 |
-
internal_dec = {f'out{addr}': addr_bits for addr in range(256)}
|
| 1051 |
routing['memory.addr_decode'] = {
|
| 1052 |
-
'inputs': ['$addr[0:
|
| 1053 |
-
'type': '
|
| 1054 |
-
'internal':
|
|
|
|
|
|
|
|
|
|
| 1055 |
}
|
| 1056 |
|
| 1057 |
-
internal_read = {}
|
| 1058 |
-
for bit in range(8):
|
| 1059 |
-
for addr in range(256):
|
| 1060 |
-
internal_read[f'bit{bit}.and{addr}'] = [f'$mem[{addr}][{bit}]', f'$sel[{addr}]']
|
| 1061 |
-
internal_read[f'bit{bit}.or'] = [f'bit{bit}.and{i}' for i in range(256)]
|
| 1062 |
-
|
| 1063 |
routing['memory.read'] = {
|
| 1064 |
-
'inputs': ['$mem[0:
|
| 1065 |
-
'type': '
|
| 1066 |
-
'internal':
|
| 1067 |
-
|
| 1068 |
-
|
| 1069 |
-
|
| 1070 |
-
|
| 1071 |
-
for addr in range(256):
|
| 1072 |
-
internal_write[f'sel.addr{addr}'] = [f'$sel[{addr}]', '$we']
|
| 1073 |
-
internal_write[f'nsel.addr{addr}'] = [f'sel.addr{addr}']
|
| 1074 |
-
for bit in range(8):
|
| 1075 |
-
internal_write[f'addr{addr}.bit{bit}.and_old'] = [f'$mem[{addr}][{bit}]', f'nsel.addr{addr}']
|
| 1076 |
-
internal_write[f'addr{addr}.bit{bit}.and_new'] = [f'$write_data[{bit}]', f'sel.addr{addr}']
|
| 1077 |
-
internal_write[f'addr{addr}.bit{bit}.or'] = [
|
| 1078 |
-
f'addr{addr}.bit{bit}.and_old',
|
| 1079 |
-
f'addr{addr}.bit{bit}.and_new'
|
| 1080 |
-
]
|
| 1081 |
-
|
| 1082 |
-
outputs = {
|
| 1083 |
-
f'mem[{addr}][{bit}]': f'addr{addr}.bit{bit}.or'
|
| 1084 |
-
for addr in range(256) for bit in range(8)
|
| 1085 |
}
|
| 1086 |
|
| 1087 |
routing['memory.write'] = {
|
| 1088 |
-
'inputs': ['$mem[0:
|
| 1089 |
-
'type': '
|
| 1090 |
-
'internal':
|
| 1091 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1092 |
}
|
| 1093 |
|
| 1094 |
return routing
|
|
|
|
| 5 |
|
| 6 |
import json
|
| 7 |
from safetensors import safe_open
|
| 8 |
+
from collections import defaultdict
|
| 9 |
+
|
| 10 |
+
ADDR_BITS = 16
|
| 11 |
+
MEM_BYTES = 1 << ADDR_BITS
|
| 12 |
|
| 13 |
def get_all_gates(tensors_path):
|
| 14 |
"""Extract all unique gate paths from tensors file."""
|
|
|
|
| 426 |
'manifest.alu_operations': {'type': 'constant', 'value': 16},
|
| 427 |
'manifest.flags': {'type': 'constant', 'value': 4},
|
| 428 |
'manifest.instruction_width': {'type': 'constant', 'value': 16},
|
| 429 |
+
'manifest.memory_bytes': {'type': 'constant', 'value': 65536},
|
| 430 |
+
'manifest.pc_width': {'type': 'constant', 'value': 16},
|
| 431 |
'manifest.register_width': {'type': 'constant', 'value': 8},
|
| 432 |
'manifest.registers': {'type': 'constant', 'value': 4},
|
| 433 |
'manifest.turing_complete': {'type': 'constant', 'value': 1},
|
| 434 |
+
'manifest.version': {'type': 'constant', 'value': 3}
|
| 435 |
}
|
| 436 |
|
| 437 |
|
|
|
|
| 1035 |
'internal': internal_store
|
| 1036 |
}
|
| 1037 |
|
| 1038 |
+
internal_mem_addr = {f'bit{bit}': [f'$addr[{bit}]'] for bit in range(ADDR_BITS)}
|
| 1039 |
routing['control.mem_addr'] = {
|
| 1040 |
+
'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
|
| 1041 |
'type': 'buffer',
|
| 1042 |
'internal': internal_mem_addr
|
| 1043 |
}
|
|
|
|
| 1046 |
|
| 1047 |
|
| 1048 |
def generate_memory_routing():
|
| 1049 |
+
"""Generate routing for packed memory decoder, read mux, and write cell update."""
|
| 1050 |
routing = {}
|
| 1051 |
|
|
|
|
|
|
|
| 1052 |
routing['memory.addr_decode'] = {
|
| 1053 |
+
'inputs': [f'$addr[0:{ADDR_BITS - 1}]'],
|
| 1054 |
+
'type': 'decoder_packed',
|
| 1055 |
+
'internal': {
|
| 1056 |
+
'weight': ['memory.addr_decode.weight'],
|
| 1057 |
+
'bias': ['memory.addr_decode.bias'],
|
| 1058 |
+
}
|
| 1059 |
}
|
| 1060 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1061 |
routing['memory.read'] = {
|
| 1062 |
+
'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', f'$sel[0:{MEM_BYTES - 1}]'],
|
| 1063 |
+
'type': 'read_mux_packed',
|
| 1064 |
+
'internal': {
|
| 1065 |
+
'and': ['memory.read.and.weight', 'memory.read.and.bias'],
|
| 1066 |
+
'or': ['memory.read.or.weight', 'memory.read.or.bias'],
|
| 1067 |
+
},
|
| 1068 |
+
'outputs': {f'bit{bit}': f'bit{bit}' for bit in range(8)}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1069 |
}
|
| 1070 |
|
| 1071 |
routing['memory.write'] = {
|
| 1072 |
+
'inputs': [f'$mem[0:{MEM_BYTES - 1}][0:7]', '$write_data[0:7]', f'$sel[0:{MEM_BYTES - 1}]', '$we'],
|
| 1073 |
+
'type': 'write_mux_packed',
|
| 1074 |
+
'internal': {
|
| 1075 |
+
'sel': ['memory.write.sel.weight', 'memory.write.sel.bias'],
|
| 1076 |
+
'nsel': ['memory.write.nsel.weight', 'memory.write.nsel.bias'],
|
| 1077 |
+
'and_old': ['memory.write.and_old.weight', 'memory.write.and_old.bias'],
|
| 1078 |
+
'and_new': ['memory.write.and_new.weight', 'memory.write.and_new.bias'],
|
| 1079 |
+
'or': ['memory.write.or.weight', 'memory.write.or.bias'],
|
| 1080 |
+
}
|
| 1081 |
}
|
| 1082 |
|
| 1083 |
return routing
|
routing/routing.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
routing/routing_schema.md
CHANGED
|
@@ -37,8 +37,11 @@ The routing file (`routing.json`) defines how gates are interconnected. Each ent
|
|
| 37 |
|
| 38 |
6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
|
| 39 |
- Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
### Single-Layer Gates
|
| 44 |
Gates with just `.weight` and `.bias`:
|
|
@@ -77,30 +80,92 @@ Complex circuits with sub-components:
|
|
| 77 |
}
|
| 78 |
```
|
| 79 |
|
| 80 |
-
### Bit-Indexed Circuits
|
| 81 |
-
Circuits operating on multi-bit values:
|
| 82 |
-
```json
|
| 83 |
-
"arithmetic.ripplecarry8bit": {
|
| 84 |
-
"external_inputs": ["$a[0:7]", "$b[0:7]"],
|
| 85 |
-
"gates": {
|
| 86 |
-
"fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
|
| 87 |
-
"fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
|
| 88 |
-
...
|
| 89 |
-
}
|
| 90 |
-
}
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
- External inputs: `$name` or `$name[bit]`
|
| 96 |
- Constants: `#0`, `#1`
|
| 97 |
- Internal gates: relative path from circuit root
|
| 98 |
- Outputs: named in `outputs` section
|
| 99 |
|
| 100 |
-
## Validation Rules
|
| 101 |
-
|
| 102 |
-
1. Every gate in routing must exist in tensors file
|
| 103 |
-
2. Every tensor must have routing entry
|
| 104 |
-
3. Input count must match weight dimensions
|
| 105 |
-
4. No circular dependencies (DAG only)
|
| 106 |
-
5. All referenced sources must exist
|
|
|
|
|
|
| 37 |
|
| 38 |
6. **Memory indexing**: `"$mem[addr][bit]"` or `"$sel[addr]"` - Addressed memory bit or one-hot select line
|
| 39 |
- Example: `"$mem[42][3]"` (addr 42, bit 3), `"$sel[42]"`
|
| 40 |
+
|
| 41 |
+
7. **Packed memory tensors**: For 64KB memory, routing uses packed tensor blocks instead of per-gate entries.
|
| 42 |
+
- Example: `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`
|
| 43 |
+
|
| 44 |
+
## Circuit Types
|
| 45 |
|
| 46 |
### Single-Layer Gates
|
| 47 |
Gates with just `.weight` and `.bias`:
|
|
|
|
| 80 |
}
|
| 81 |
```
|
| 82 |
|
| 83 |
+
### Bit-Indexed Circuits
|
| 84 |
+
Circuits operating on multi-bit values:
|
| 85 |
+
```json
|
| 86 |
+
"arithmetic.ripplecarry8bit": {
|
| 87 |
+
"external_inputs": ["$a[0:7]", "$b[0:7]"],
|
| 88 |
+
"gates": {
|
| 89 |
+
"fa0": {"inputs": ["$a[0]", "$b[0]", "#0"], "type": "fulladder"},
|
| 90 |
+
"fa1": {"inputs": ["$a[1]", "$b[1]", "fa0.cout"], "type": "fulladder"},
|
| 91 |
+
...
|
| 92 |
+
}
|
| 93 |
+
}
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### Packed Memory Circuits
|
| 97 |
+
64KB memory routing uses packed tensors to avoid exploding the header size. The routing entry
|
| 98 |
+
declares a packed type and lists the tensor blocks used for the operation.
|
| 99 |
+
|
| 100 |
+
```json
|
| 101 |
+
"memory.addr_decode": {
|
| 102 |
+
"inputs": ["$addr[0:15]"],
|
| 103 |
+
"type": "decoder_packed",
|
| 104 |
+
"internal": {
|
| 105 |
+
"weight": ["memory.addr_decode.weight"],
|
| 106 |
+
"bias": ["memory.addr_decode.bias"]
|
| 107 |
+
}
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
"memory.read": {
|
| 111 |
+
"inputs": ["$mem[0:65535][0:7]", "$sel[0:65535]"],
|
| 112 |
+
"type": "read_mux_packed",
|
| 113 |
+
"internal": {
|
| 114 |
+
"and": ["memory.read.and.weight", "memory.read.and.bias"],
|
| 115 |
+
"or": ["memory.read.or.weight", "memory.read.or.bias"]
|
| 116 |
+
},
|
| 117 |
+
"outputs": { "bit0": "bit0", "bit1": "bit1", "bit2": "bit2", "bit3": "bit3",
|
| 118 |
+
"bit4": "bit4", "bit5": "bit5", "bit6": "bit6", "bit7": "bit7" }
|
| 119 |
+
}
|
| 120 |
+
|
| 121 |
+
"memory.write": {
|
| 122 |
+
"inputs": ["$mem[0:65535][0:7]", "$write_data[0:7]", "$sel[0:65535]", "$we"],
|
| 123 |
+
"type": "write_mux_packed",
|
| 124 |
+
"internal": {
|
| 125 |
+
"sel": ["memory.write.sel.weight", "memory.write.sel.bias"],
|
| 126 |
+
"nsel": ["memory.write.nsel.weight", "memory.write.nsel.bias"],
|
| 127 |
+
"and_old": ["memory.write.and_old.weight", "memory.write.and_old.bias"],
|
| 128 |
+
"and_new": ["memory.write.and_new.weight", "memory.write.and_new.bias"],
|
| 129 |
+
"or": ["memory.write.or.weight", "memory.write.or.bias"]
|
| 130 |
+
}
|
| 131 |
+
}
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
Packed tensor mapping (shapes assume 16-bit address, 8-bit data):
|
| 135 |
+
- `memory.addr_decode.weight`: [65536, 16]
|
| 136 |
+
- `memory.addr_decode.bias`: [65536]
|
| 137 |
+
- `memory.read.and.weight`: [8, 65536, 2]
|
| 138 |
+
- `memory.read.and.bias`: [8, 65536]
|
| 139 |
+
- `memory.read.or.weight`: [8, 65536]
|
| 140 |
+
- `memory.read.or.bias`: [8]
|
| 141 |
+
- `memory.write.sel.weight`: [65536, 2]
|
| 142 |
+
- `memory.write.sel.bias`: [65536]
|
| 143 |
+
- `memory.write.nsel.weight`: [65536, 1]
|
| 144 |
+
- `memory.write.nsel.bias`: [65536]
|
| 145 |
+
- `memory.write.and_old.weight`: [65536, 8, 2]
|
| 146 |
+
- `memory.write.and_old.bias`: [65536, 8]
|
| 147 |
+
- `memory.write.and_new.weight`: [65536, 8, 2]
|
| 148 |
+
- `memory.write.and_new.bias`: [65536, 8]
|
| 149 |
+
- `memory.write.or.weight`: [65536, 8, 2]
|
| 150 |
+
- `memory.write.or.bias`: [65536, 8]
|
| 151 |
+
|
| 152 |
+
Semantics are the same as the unrolled circuits, but computed in bulk:
|
| 153 |
+
- decode: `sel[i] = H(sum(addr_bits * weight[i]) + bias[i])`
|
| 154 |
+
- read: `bit[b] = H(sum(H([mem_bit, sel] * and_w[b,i] + and_b[b,i]) * or_w[b]) + or_b[b])`
|
| 155 |
+
- write: `new_bit = H(H([old_bit, nsel] * and_old_w + and_old_b) + H([data_bit, sel] * and_new_w + and_new_b) - 1)`
|
| 156 |
+
|
| 157 |
+
## Naming Conventions
|
| 158 |
|
| 159 |
- External inputs: `$name` or `$name[bit]`
|
| 160 |
- Constants: `#0`, `#1`
|
| 161 |
- Internal gates: relative path from circuit root
|
| 162 |
- Outputs: named in `outputs` section
|
| 163 |
|
| 164 |
+
## Validation Rules
|
| 165 |
+
|
| 166 |
+
1. Every gate in routing must exist in tensors file
|
| 167 |
+
2. Every tensor must have routing entry
|
| 168 |
+
3. Input count must match weight dimensions
|
| 169 |
+
4. No circular dependencies (DAG only)
|
| 170 |
+
5. All referenced sources must exist
|
| 171 |
+
6. Packed memory circuits are valid when the packed tensor blocks exist and match the expected shapes
|
routing/validate_packed_memory.py
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Validate packed memory tensor references in routing.json against safetensors.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from __future__ import annotations
|
| 6 |
+
|
| 7 |
+
import argparse
|
| 8 |
+
import json
|
| 9 |
+
import sys
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
from typing import Dict, Iterable, List, Tuple
|
| 12 |
+
|
| 13 |
+
from safetensors import safe_open
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def _load_json(path: Path) -> Dict:
|
| 17 |
+
with path.open("r", encoding="utf-8") as fh:
|
| 18 |
+
return json.load(fh)
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def _get_scalar_tensor(f, name: str, default: int) -> int:
|
| 22 |
+
if name not in f.keys():
|
| 23 |
+
return default
|
| 24 |
+
tensor = f.get_tensor(name)
|
| 25 |
+
return int(tensor.item())
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def _gather_internal_keys(routing: Dict, circuit_name: str) -> List[str]:
|
| 29 |
+
circuit = routing.get("circuits", {}).get(circuit_name)
|
| 30 |
+
if circuit is None:
|
| 31 |
+
return []
|
| 32 |
+
internal = circuit.get("internal", {})
|
| 33 |
+
keys: List[str] = []
|
| 34 |
+
for value in internal.values():
|
| 35 |
+
if isinstance(value, list):
|
| 36 |
+
keys.extend(value)
|
| 37 |
+
return keys
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
def _shape_matches(actual: Iterable[int], expected: Iterable[int]) -> bool:
|
| 41 |
+
return tuple(actual) == tuple(expected)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def main() -> int:
|
| 45 |
+
parser = argparse.ArgumentParser(description="Validate packed memory routing tensors.")
|
| 46 |
+
parser.add_argument(
|
| 47 |
+
"--routing",
|
| 48 |
+
type=Path,
|
| 49 |
+
default=Path(__file__).resolve().parent / "routing.json",
|
| 50 |
+
help="Path to routing.json",
|
| 51 |
+
)
|
| 52 |
+
parser.add_argument(
|
| 53 |
+
"--model",
|
| 54 |
+
type=Path,
|
| 55 |
+
default=Path(__file__).resolve().parent.parent / "neural_computer.safetensors",
|
| 56 |
+
help="Path to neural_computer.safetensors",
|
| 57 |
+
)
|
| 58 |
+
args = parser.parse_args()
|
| 59 |
+
|
| 60 |
+
routing = _load_json(args.routing)
|
| 61 |
+
routing_keys = set()
|
| 62 |
+
for name in ("memory.addr_decode", "memory.read", "memory.write"):
|
| 63 |
+
routing_keys.update(_gather_internal_keys(routing, name))
|
| 64 |
+
|
| 65 |
+
missing_routing = [k for k in routing_keys if not k]
|
| 66 |
+
if missing_routing:
|
| 67 |
+
print("routing.json contains empty packed tensor entries.", file=sys.stderr)
|
| 68 |
+
return 1
|
| 69 |
+
|
| 70 |
+
with safe_open(str(args.model), framework="pt") as f:
|
| 71 |
+
mem_bytes = _get_scalar_tensor(f, "manifest.memory_bytes", 65536)
|
| 72 |
+
pc_width = _get_scalar_tensor(f, "manifest.pc_width", 16)
|
| 73 |
+
reg_width = _get_scalar_tensor(f, "manifest.register_width", 8)
|
| 74 |
+
|
| 75 |
+
expected_shapes: Dict[str, Tuple[int, ...]] = {
|
| 76 |
+
"memory.addr_decode.weight": (mem_bytes, pc_width),
|
| 77 |
+
"memory.addr_decode.bias": (mem_bytes,),
|
| 78 |
+
"memory.read.and.weight": (reg_width, mem_bytes, 2),
|
| 79 |
+
"memory.read.and.bias": (reg_width, mem_bytes),
|
| 80 |
+
"memory.read.or.weight": (reg_width, mem_bytes),
|
| 81 |
+
"memory.read.or.bias": (reg_width,),
|
| 82 |
+
"memory.write.sel.weight": (mem_bytes, 2),
|
| 83 |
+
"memory.write.sel.bias": (mem_bytes,),
|
| 84 |
+
"memory.write.nsel.weight": (mem_bytes, 1),
|
| 85 |
+
"memory.write.nsel.bias": (mem_bytes,),
|
| 86 |
+
"memory.write.and_old.weight": (mem_bytes, reg_width, 2),
|
| 87 |
+
"memory.write.and_old.bias": (mem_bytes, reg_width),
|
| 88 |
+
"memory.write.and_new.weight": (mem_bytes, reg_width, 2),
|
| 89 |
+
"memory.write.and_new.bias": (mem_bytes, reg_width),
|
| 90 |
+
"memory.write.or.weight": (mem_bytes, reg_width, 2),
|
| 91 |
+
"memory.write.or.bias": (mem_bytes, reg_width),
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
errors = []
|
| 95 |
+
for key, expected in expected_shapes.items():
|
| 96 |
+
if key not in routing_keys:
|
| 97 |
+
errors.append(f"routing.json missing key: {key}")
|
| 98 |
+
continue
|
| 99 |
+
if key not in f.keys():
|
| 100 |
+
errors.append(f"safetensors missing key: {key}")
|
| 101 |
+
continue
|
| 102 |
+
actual = f.get_tensor(key).shape
|
| 103 |
+
if not _shape_matches(actual, expected):
|
| 104 |
+
errors.append(f"{key} shape {tuple(actual)} != {expected}")
|
| 105 |
+
|
| 106 |
+
if errors:
|
| 107 |
+
print("Packed memory validation failed:", file=sys.stderr)
|
| 108 |
+
for err in errors:
|
| 109 |
+
print(f" - {err}", file=sys.stderr)
|
| 110 |
+
return 1
|
| 111 |
+
|
| 112 |
+
print("Packed memory routing validation: ok")
|
| 113 |
+
return 0
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
if __name__ == "__main__":
|
| 117 |
+
raise SystemExit(main())
|
tensors.txt
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
todo.md
CHANGED
|
@@ -56,54 +56,55 @@ The machine runs. Callers just provide initial state and collect results.
|
|
| 56 |
### State Tensor Layout
|
| 57 |
```
|
| 58 |
ββββββββββ¬βββββββββββ¬ββββββββ¬βββββββββ¬ββββββββββββββββββββββ
|
| 59 |
-
β PC [
|
| 60 |
ββββββββββ΄βββββββββββ΄ββββββββ΄βββββββββ΄ββββββββββββββββββββββ
|
| 61 |
-
|
| 62 |
```
|
| 63 |
|
| 64 |
### Memory Hierarchy
|
| 65 |
| Level | Size | Tensors | Access |
|
| 66 |
|-------|------|---------|--------|
|
| 67 |
| Registers | 4 Γ 8-bit | Direct wiring | Immediate |
|
| 68 |
-
|
|
| 69 |
-
| Cold bank | 64KB | ~1.6M | 16-bit addressed |
|
| 70 |
|
| 71 |
### Full 64KB Configuration
|
| 72 |
- Address space: 0x0000 - 0xFFFF
|
| 73 |
- Routing circuits: ~1.64M tensors
|
| 74 |
-
- State tensor:
|
| 75 |
|
| 76 |
## Phase 1: Memory Infrastructure
|
| 77 |
|
|
|
|
|
|
|
| 78 |
| Component | Description | Tensors | Status |
|
| 79 |
|-----------|-------------|---------|--------|
|
| 80 |
-
| Address Decoder
|
| 81 |
-
|
|
| 82 |
-
| Memory
|
| 83 |
-
| Memory
|
| 84 |
-
| Memory Write Demux | Route write to address | ~524,288 | Pending |
|
| 85 |
-
| Memory Cell Logic | Conditional update | ~524,288 | Pending |
|
| 86 |
|
| 87 |
## Phase 2: Execution Engine
|
| 88 |
|
| 89 |
| Component | Description | Status |
|
| 90 |
|-----------|-------------|--------|
|
| 91 |
-
| Instruction Fetch | PC β Memory β IR |
|
| 92 |
-
| Operand Fetch | Decode β Register/Memory Read |
|
| 93 |
-
| ALU Dispatch | Opcode β Operation Select |
|
| 94 |
-
| Result Writeback | Route to destination |
|
| 95 |
-
| Flag Update | Compute Z/N/C/V |
|
| 96 |
| PC Advance | Increment or Jump | Done |
|
| 97 |
| Halt Detection | HALT opcode β stop | Done |
|
| 98 |
|
| 99 |
## Phase 3: ACT Integration
|
| 100 |
|
|
|
|
|
|
|
| 101 |
| Component | Description | Status |
|
| 102 |
|-----------|-------------|--------|
|
| 103 |
-
| Cycle Block | All Phase 2 as single layer |
|
| 104 |
-
| Recurrence Wrapper | Loop until halt signal |
|
| 105 |
-
| Max Cycles Guard | Prevent infinite loops |
|
| 106 |
-
| State I/O | Pack/unpack state tensor |
|
| 107 |
|
| 108 |
## Instruction Set
|
| 109 |
|
|
@@ -119,11 +120,11 @@ The machine runs. Callers just provide initial state and collect results.
|
|
| 119 |
| 0x7 | MUL | R[d] = R[a] * R[b] | Done |
|
| 120 |
| 0x8 | DIV | R[d] = R[a] / R[b] | Done |
|
| 121 |
| 0x9 | CMP | flags = R[a] - R[b] | Done |
|
| 122 |
-
| 0xA | LOAD | R[d] = M[addr] |
|
| 123 |
-
| 0xB | STORE | M[addr] = R[s] |
|
| 124 |
-
| 0xC | JMP | PC = addr |
|
| 125 |
| 0xD | JZ/JNZ | PC = addr if flag | Done |
|
| 126 |
-
| 0xE | CALL | push PC; PC = addr |
|
| 127 |
| 0xF | HALT | stop execution | Done |
|
| 128 |
|
| 129 |
## Completed Circuits
|
|
@@ -151,8 +152,8 @@ The machine runs. Callers just provide initial state and collect results.
|
|
| 151 |
- Comparators, threshold gates
|
| 152 |
- Conditional jumps
|
| 153 |
|
| 154 |
-
**Current:
|
| 155 |
-
**
|
| 156 |
|
| 157 |
## Applications
|
| 158 |
|
|
|
|
| 56 |
### State Tensor Layout
|
| 57 |
```
|
| 58 |
ββββββββββ¬βββββββββββ¬ββββββββ¬βββββββββ¬ββββββββββββββββββββββ
|
| 59 |
+
β PC [16] β Regs[32] βFlags[4βCtrl[4] β Memory [N Γ 8] β
|
| 60 |
ββββββββββ΄βββββββββββ΄ββββββββ΄βββββββββ΄ββββββββββββββββββββββ
|
| 61 |
+
16 + 32 + 4 + 4 + N Γ 8 bits
|
| 62 |
```
|
| 63 |
|
| 64 |
### Memory Hierarchy
|
| 65 |
| Level | Size | Tensors | Access |
|
| 66 |
|-------|------|---------|--------|
|
| 67 |
| Registers | 4 Γ 8-bit | Direct wiring | Immediate |
|
| 68 |
+
| Main memory | 64KB | ~1.6M | 16-bit addressed |
|
|
|
|
| 69 |
|
| 70 |
### Full 64KB Configuration
|
| 71 |
- Address space: 0x0000 - 0xFFFF
|
| 72 |
- Routing circuits: ~1.64M tensors
|
| 73 |
+
- State tensor: 88 + 524,288 = 524,376 bits per instance
|
| 74 |
|
| 75 |
## Phase 1: Memory Infrastructure
|
| 76 |
|
| 77 |
+
64KB memory circuits are implemented and pass comprehensive eval.
|
| 78 |
+
|
| 79 |
| Component | Description | Tensors | Status |
|
| 80 |
|-----------|-------------|---------|--------|
|
| 81 |
+
| Address Decoder 16-bit | 16-bit β 65536 one-hot | 2 (packed) | Done |
|
| 82 |
+
| Memory Read MUX 64K | 65536-to-1 Γ 8 bits | 4 (packed) | Done |
|
| 83 |
+
| Memory Write Demux | Route write to address | 4 (packed) | Done |
|
| 84 |
+
| Memory Cell Logic | Conditional update | 6 (packed) | Done |
|
|
|
|
|
|
|
| 85 |
|
| 86 |
## Phase 2: Execution Engine
|
| 87 |
|
| 88 |
| Component | Description | Status |
|
| 89 |
|-----------|-------------|--------|
|
| 90 |
+
| Instruction Fetch | PC β Memory β IR | Done |
|
| 91 |
+
| Operand Fetch | Decode β Register/Memory Read | Done |
|
| 92 |
+
| ALU Dispatch | Opcode β Operation Select | Done |
|
| 93 |
+
| Result Writeback | Route to destination | Done |
|
| 94 |
+
| Flag Update | Compute Z/N/C/V | Done |
|
| 95 |
| PC Advance | Increment or Jump | Done |
|
| 96 |
| Halt Detection | HALT opcode β stop | Done |
|
| 97 |
|
| 98 |
## Phase 3: ACT Integration
|
| 99 |
|
| 100 |
+
Threshold runtime available in cpu/threshold_cpu.py (cycle + ACT loop + state I/O).
|
| 101 |
+
|
| 102 |
| Component | Description | Status |
|
| 103 |
|-----------|-------------|--------|
|
| 104 |
+
| Cycle Block | All Phase 2 as single layer | Done |
|
| 105 |
+
| Recurrence Wrapper | Loop until halt signal | Done |
|
| 106 |
+
| Max Cycles Guard | Prevent infinite loops | Done |
|
| 107 |
+
| State I/O | Pack/unpack state tensor | Done |
|
| 108 |
|
| 109 |
## Instruction Set
|
| 110 |
|
|
|
|
| 120 |
| 0x7 | MUL | R[d] = R[a] * R[b] | Done |
|
| 121 |
| 0x8 | DIV | R[d] = R[a] / R[b] | Done |
|
| 122 |
| 0x9 | CMP | flags = R[a] - R[b] | Done |
|
| 123 |
+
| 0xA | LOAD | R[d] = M[addr] | Done |
|
| 124 |
+
| 0xB | STORE | M[addr] = R[s] | Done |
|
| 125 |
+
| 0xC | JMP | PC = addr | Done |
|
| 126 |
| 0xD | JZ/JNZ | PC = addr if flag | Done |
|
| 127 |
+
| 0xE | CALL | push PC; PC = addr | Done |
|
| 128 |
| 0xF | HALT | stop execution | Done |
|
| 129 |
|
| 130 |
## Completed Circuits
|
|
|
|
| 152 |
- Comparators, threshold gates
|
| 153 |
- Conditional jumps
|
| 154 |
|
| 155 |
+
**Current: 6,296 tensors (packed memory)**
|
| 156 |
+
**Parameters: 8,267,667**
|
| 157 |
|
| 158 |
## Applications
|
| 159 |
|