File size: 6,000 Bytes
597e7c2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | # ISA reference card — 8-bit threshold-logic CPU
This is the architecture exposed by the safetensors files. Every instruction below is *implemented entirely as threshold neurons*; the same gate-level circuits run whether you simulate in Python (`eval.py` / `play.py` / `test_cpu.py`) or compile the CPU's threshold network through `safetensors2verilog` to FPGA-synthesizable Verilog.
## Architectural state
| Field | Width | Notes |
|---|---|---|
| PC | N bits | program counter; N = address width (0–16) |
| IR | 16 bits | instruction register |
| R0–R3 | 8 bits each | general-purpose registers |
| FLAGS | 4 bits | Z, N, C, V |
| SP | N bits | stack pointer (CALL/RET) |
| CTRL | 4 bits | HALT, MEM_WE, MEM_RE, RESERVED |
| MEM | 2^N × 8 bits | byte-addressable memory |
State tensor layout (MSB-first within each multi-bit field):
```
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
```
## Instruction encoding
```
15..12 11..10 9..8 7..0
opcode rd rs imm8
```
| Class | Use of fields |
|---|---|
| **R-type** | `rd = rd op rs` — `imm8` ignored |
| **I-type** | `rd = op rd, imm8` — `rs` ignored |
| **Address-extended** | next 16-bit word is the absolute address (big-endian); `imm8` reserved. Applies to `LOAD`, `STORE`, `JMP`, `Jcc`, `CALL`. |
Address-extended instructions consume **4 bytes** (instruction word + address word). Untaken conditional jumps still skip the address word, so the PC always advances by 4.
## Opcode table
| Opcode | Mnemonic | Class | Operation |
|---|---|---|---|
| 0x0 | ADD | R | R[rd] = R[rd] + R[rs] |
| 0x1 | SUB | R | R[rd] = R[rd] - R[rs] |
| 0x2 | AND | R | R[rd] = R[rd] & R[rs] |
| 0x3 | OR | R | R[rd] = R[rd] \| R[rs] |
| 0x4 | XOR | R | R[rd] = R[rd] ^ R[rs] |
| 0x5 | SHL | R | R[rd] = R[rd] << 1 |
| 0x6 | SHR | R | R[rd] = R[rd] >> 1 |
| 0x7 | MUL | R | R[rd] = R[rd] * R[rs] (low 8 bits) |
| 0x8 | DIV | R | R[rd] = R[rd] / R[rs] |
| 0x9 | CMP | R | flags = R[rd] - R[rs] (no writeback) |
| 0xA | LOAD | A | R[rd] = M[addr] |
| 0xB | STORE | A | M[addr] = R[rs] |
| 0xC | JMP | A | PC = addr |
| 0xD | Jcc | A | PC = addr if cond. imm8[2:0] selects condition |
| 0xE | CALL | A | push PC; PC = addr |
| 0xF | HALT | – | stop execution |
### Conditional-jump conditions (encoded in imm8[2:0] of the Jcc opcode)
| imm8[2:0] | Mnemonic | Fires when |
|---|---|---|
| 0 | JZ | Z flag set (last result was zero) |
| 1 | JNZ | Z flag clear |
| 2 | JC | carry-out set (last add overflowed unsigned) |
| 3 | JNC | carry-out clear |
| 4 | JN | result was negative (sign bit set) |
| 5 | JP | result was positive (sign bit clear) |
| 6 | JV | signed-overflow flag set |
| 7 | JNV | signed-overflow flag clear |
## Worked example: write your own program
The Python assembler in `cpu_programs.py` exposes one-method-per-mnemonic helpers on a tiny `Asm` class. Here's "store the value 7 to address 0x10, then halt":
```python
from cpu_programs import Asm
a = Asm(size=64) # 64 bytes of memory
a.org(0)
# Set R0 to 7. There is no LDI; use XOR R0,R0 to zero it then ADD an
# immediate from memory.
a.label("seven")
a.org(32); a.db(7) # memory byte at addr 32 holds the constant 7
a.org(0)
a.xor_(0, 0) # R0 = 0
a.load(0, "seven") # R0 = M[seven] = 7
a.store(0, "dest") # M[dest] = R0
a.halt()
a.label("dest"); a.db(0) # destination cell
bytes_ = a.assemble()
```
Then drop the assembled bytes into the CPU's initial memory and let the threshold-network forward pass run.
## Using the CPU as a threshold-network forward pass
The CPU is a single tensor program. State in, state out. The driver:
1. Builds an initial state tensor with the program loaded at `MEM[0..]`.
2. Calls the safetensors-derived threshold network, which internally loops one fetch–decode–execute cycle and re-feeds the state.
3. After ≤ N cycles (or earlier if the HALT control bit fires), reads the final memory contents.
Concretely, this is what `test_cpu.py` and `play.py` already do; both serve as runnable tutorials. The minimal driver loop is:
```python
from build import ThresholdComputer
from safetensors.torch import load_file
tensors = load_file("variants/neural_computer8_small.safetensors")
cpu = ThresholdComputer(tensors, data_bits=8)
state = cpu.initial_state(memory=bytes_)
state = cpu.run(state, max_cycles=200)
result = cpu.read_memory(state, addr=0x10)
print(result) # 7
```
## Common pitfalls
- **No load-immediate.** `LOAD` reads from memory; there is no LDI / MOV-imm instruction. To put a constant in a register, place it in memory and `LOAD` it.
- **Address-extended instructions are 4 bytes wide.** Branch targets must point at the start of an instruction word, not into the middle of one.
- **`MUL` keeps only the low 8 bits.** Detect overflow via `CMP` against expected truncation.
- **`CMP` writes only flags**, never the destination register. Always followed by a `Jcc`.
- **`SHL` and `SHR` shift by 1.** No variable-amount shifter; chain them or compose with bit operations.
## Threshold-network artefacts you'll want next
- `python eval_all.py variants/<file>.safetensors` — gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits).
- `python eval_all.py --cpu-program variants/<file>.safetensors` — assembled program through the threshold-gated CPU.
- `python -m safetensors2verilog <file>.safetensors --frontend threshold_logic --circuit arithmetic.ripplecarry8bit -o rc8.v` — extract one circuit, dependency-closed, into synthesizable Verilog.
- `python -m safetensors2verilog ... --inspect` — print the port contract for any extracted circuit (which pins exist, what widths).
- `python -m safetensors2verilog ... --equiv-check` — automatically build a Python-vs-iverilog cross-check testbench for the extracted circuit.
|