File size: 6,000 Bytes
597e7c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# ISA reference card — 8-bit threshold-logic CPU

This is the architecture exposed by the safetensors files. Every instruction below is *implemented entirely as threshold neurons*; the same gate-level circuits run whether you simulate in Python (`eval.py` / `play.py` / `test_cpu.py`) or compile the CPU's threshold network through `safetensors2verilog` to FPGA-synthesizable Verilog.

## Architectural state

| Field | Width | Notes |
|---|---|---|
| PC | N bits | program counter; N = address width (0–16) |
| IR | 16 bits | instruction register |
| R0–R3 | 8 bits each | general-purpose registers |
| FLAGS | 4 bits | Z, N, C, V |
| SP | N bits | stack pointer (CALL/RET) |
| CTRL | 4 bits | HALT, MEM_WE, MEM_RE, RESERVED |
| MEM | 2^N × 8 bits | byte-addressable memory |

State tensor layout (MSB-first within each multi-bit field):

```
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
```

## Instruction encoding

```
15..12   11..10   9..8   7..0
opcode   rd       rs     imm8
```

| Class | Use of fields |
|---|---|
| **R-type** | `rd = rd op rs``imm8` ignored |
| **I-type** | `rd = op rd, imm8``rs` ignored |
| **Address-extended** | next 16-bit word is the absolute address (big-endian); `imm8` reserved. Applies to `LOAD`, `STORE`, `JMP`, `Jcc`, `CALL`. |

Address-extended instructions consume **4 bytes** (instruction word + address word). Untaken conditional jumps still skip the address word, so the PC always advances by 4.

## Opcode table

| Opcode | Mnemonic | Class | Operation |
|---|---|---|---|
| 0x0 | ADD     | R | R[rd] = R[rd] + R[rs] |
| 0x1 | SUB     | R | R[rd] = R[rd] - R[rs] |
| 0x2 | AND     | R | R[rd] = R[rd] & R[rs] |
| 0x3 | OR      | R | R[rd] = R[rd] \| R[rs] |
| 0x4 | XOR     | R | R[rd] = R[rd] ^ R[rs] |
| 0x5 | SHL     | R | R[rd] = R[rd] << 1 |
| 0x6 | SHR     | R | R[rd] = R[rd] >> 1 |
| 0x7 | MUL     | R | R[rd] = R[rd] * R[rs]   (low 8 bits) |
| 0x8 | DIV     | R | R[rd] = R[rd] / R[rs] |
| 0x9 | CMP     | R | flags = R[rd] - R[rs]   (no writeback) |
| 0xA | LOAD    | A | R[rd] = M[addr] |
| 0xB | STORE   | A | M[addr] = R[rs] |
| 0xC | JMP     | A | PC = addr |
| 0xD | Jcc     | A | PC = addr if cond.  imm8[2:0] selects condition |
| 0xE | CALL    | A | push PC; PC = addr |
| 0xF | HALT    | – | stop execution |

### Conditional-jump conditions (encoded in imm8[2:0] of the Jcc opcode)

| imm8[2:0] | Mnemonic | Fires when |
|---|---|---|
| 0 | JZ | Z flag set (last result was zero) |
| 1 | JNZ | Z flag clear |
| 2 | JC | carry-out set (last add overflowed unsigned) |
| 3 | JNC | carry-out clear |
| 4 | JN | result was negative (sign bit set) |
| 5 | JP | result was positive (sign bit clear) |
| 6 | JV | signed-overflow flag set |
| 7 | JNV | signed-overflow flag clear |

## Worked example: write your own program

The Python assembler in `cpu_programs.py` exposes one-method-per-mnemonic helpers on a tiny `Asm` class. Here's "store the value 7 to address 0x10, then halt":

```python
from cpu_programs import Asm

a = Asm(size=64)        # 64 bytes of memory
a.org(0)
# Set R0 to 7. There is no LDI; use XOR R0,R0 to zero it then ADD an
# immediate from memory.
a.label("seven")
a.org(32); a.db(7)        # memory byte at addr 32 holds the constant 7

a.org(0)
a.xor_(0, 0)              # R0 = 0
a.load(0, "seven")        # R0 = M[seven] = 7
a.store(0, "dest")        # M[dest] = R0
a.halt()

a.label("dest"); a.db(0)  # destination cell

bytes_ = a.assemble()
```

Then drop the assembled bytes into the CPU's initial memory and let the threshold-network forward pass run.

## Using the CPU as a threshold-network forward pass

The CPU is a single tensor program. State in, state out. The driver:

1. Builds an initial state tensor with the program loaded at `MEM[0..]`.
2. Calls the safetensors-derived threshold network, which internally loops one fetch–decode–execute cycle and re-feeds the state.
3. After ≤ N cycles (or earlier if the HALT control bit fires), reads the final memory contents.

Concretely, this is what `test_cpu.py` and `play.py` already do; both serve as runnable tutorials. The minimal driver loop is:

```python
from build import ThresholdComputer
from safetensors.torch import load_file

tensors = load_file("variants/neural_computer8_small.safetensors")
cpu = ThresholdComputer(tensors, data_bits=8)
state = cpu.initial_state(memory=bytes_)
state = cpu.run(state, max_cycles=200)
result = cpu.read_memory(state, addr=0x10)
print(result)   # 7
```

## Common pitfalls

- **No load-immediate.** `LOAD` reads from memory; there is no LDI / MOV-imm instruction. To put a constant in a register, place it in memory and `LOAD` it.
- **Address-extended instructions are 4 bytes wide.** Branch targets must point at the start of an instruction word, not into the middle of one.
- **`MUL` keeps only the low 8 bits.** Detect overflow via `CMP` against expected truncation.
- **`CMP` writes only flags**, never the destination register. Always followed by a `Jcc`.
- **`SHL` and `SHR` shift by 1.** No variable-amount shifter; chain them or compose with bit operations.

## Threshold-network artefacts you'll want next

- `python eval_all.py variants/<file>.safetensors` — gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits).
- `python eval_all.py --cpu-program variants/<file>.safetensors` — assembled program through the threshold-gated CPU.
- `python -m safetensors2verilog <file>.safetensors --frontend threshold_logic --circuit arithmetic.ripplecarry8bit -o rc8.v` — extract one circuit, dependency-closed, into synthesizable Verilog.
- `python -m safetensors2verilog ... --inspect` — print the port contract for any extracted circuit (which pins exist, what widths).
- `python -m safetensors2verilog ... --equiv-check` — automatically build a Python-vs-iverilog cross-check testbench for the extracted circuit.