ISA reference card — 8-bit threshold-logic CPU
This is the architecture exposed by the safetensors files. Every instruction below is implemented entirely as threshold neurons; the same gate-level circuits run whether you simulate in Python (eval.py / play.py / test_cpu.py) or compile the CPU's threshold network through safetensors2verilog to FPGA-synthesizable Verilog.
Architectural state
| Field | Width | Notes |
|---|---|---|
| PC | N bits | program counter; N = address width (0–16) |
| IR | 16 bits | instruction register |
| R0–R3 | 8 bits each | general-purpose registers |
| FLAGS | 4 bits | Z, N, C, V |
| SP | N bits | stack pointer (CALL/RET) |
| CTRL | 4 bits | HALT, MEM_WE, MEM_RE, RESERVED |
| MEM | 2^N × 8 bits | byte-addressable memory |
State tensor layout (MSB-first within each multi-bit field):
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
Instruction encoding
15..12 11..10 9..8 7..0
opcode rd rs imm8
| Class | Use of fields |
|---|---|
| R-type | rd = rd op rs — imm8 ignored |
| I-type | rd = op rd, imm8 — rs ignored |
| Address-extended | next 16-bit word is the absolute address (big-endian); imm8 reserved. Applies to LOAD, STORE, JMP, Jcc, CALL. |
Address-extended instructions consume 4 bytes (instruction word + address word). Untaken conditional jumps still skip the address word, so the PC always advances by 4.
Opcode table
| Opcode | Mnemonic | Class | Operation |
|---|---|---|---|
| 0x0 | ADD | R | R[rd] = R[rd] + R[rs] |
| 0x1 | SUB | R | R[rd] = R[rd] - R[rs] |
| 0x2 | AND | R | R[rd] = R[rd] & R[rs] |
| 0x3 | OR | R | R[rd] = R[rd] | R[rs] |
| 0x4 | XOR | R | R[rd] = R[rd] ^ R[rs] |
| 0x5 | SHL | R | R[rd] = R[rd] << 1 |
| 0x6 | SHR | R | R[rd] = R[rd] >> 1 |
| 0x7 | MUL | R | R[rd] = R[rd] * R[rs] (low 8 bits) |
| 0x8 | DIV | R | R[rd] = R[rd] / R[rs] |
| 0x9 | CMP | R | flags = R[rd] - R[rs] (no writeback) |
| 0xA | LOAD | A | R[rd] = M[addr] |
| 0xB | STORE | A | M[addr] = R[rs] |
| 0xC | JMP | A | PC = addr |
| 0xD | Jcc | A | PC = addr if cond. imm8[2:0] selects condition |
| 0xE | CALL | A | push PC; PC = addr |
| 0xF | HALT | – | stop execution |
Conditional-jump conditions (encoded in imm8[2:0] of the Jcc opcode)
| imm8[2:0] | Mnemonic | Fires when |
|---|---|---|
| 0 | JZ | Z flag set (last result was zero) |
| 1 | JNZ | Z flag clear |
| 2 | JC | carry-out set (last add overflowed unsigned) |
| 3 | JNC | carry-out clear |
| 4 | JN | result was negative (sign bit set) |
| 5 | JP | result was positive (sign bit clear) |
| 6 | JV | signed-overflow flag set |
| 7 | JNV | signed-overflow flag clear |
Worked example: write your own program
The Python assembler in cpu_programs.py exposes one-method-per-mnemonic helpers on a tiny Asm class. Here's "store the value 7 to address 0x10, then halt":
from cpu_programs import Asm
a = Asm(size=64) # 64 bytes of memory
a.org(0)
# Set R0 to 7. There is no LDI; use XOR R0,R0 to zero it then ADD an
# immediate from memory.
a.label("seven")
a.org(32); a.db(7) # memory byte at addr 32 holds the constant 7
a.org(0)
a.xor_(0, 0) # R0 = 0
a.load(0, "seven") # R0 = M[seven] = 7
a.store(0, "dest") # M[dest] = R0
a.halt()
a.label("dest"); a.db(0) # destination cell
bytes_ = a.assemble()
Then drop the assembled bytes into the CPU's initial memory and let the threshold-network forward pass run.
Using the CPU as a threshold-network forward pass
The CPU is a single tensor program. State in, state out. The driver:
- Builds an initial state tensor with the program loaded at
MEM[0..]. - Calls the safetensors-derived threshold network, which internally loops one fetch–decode–execute cycle and re-feeds the state.
- After ≤ N cycles (or earlier if the HALT control bit fires), reads the final memory contents.
Concretely, this is what test_cpu.py and play.py already do; both serve as runnable tutorials. The minimal driver loop is:
from build import ThresholdComputer
from safetensors.torch import load_file
tensors = load_file("variants/neural_computer8_small.safetensors")
cpu = ThresholdComputer(tensors, data_bits=8)
state = cpu.initial_state(memory=bytes_)
state = cpu.run(state, max_cycles=200)
result = cpu.read_memory(state, addr=0x10)
print(result) # 7
Common pitfalls
- No load-immediate.
LOADreads from memory; there is no LDI / MOV-imm instruction. To put a constant in a register, place it in memory andLOADit. - Address-extended instructions are 4 bytes wide. Branch targets must point at the start of an instruction word, not into the middle of one.
MULkeeps only the low 8 bits. Detect overflow viaCMPagainst expected truncation.CMPwrites only flags, never the destination register. Always followed by aJcc.SHLandSHRshift by 1. No variable-amount shifter; chain them or compose with bit operations.
Threshold-network artefacts you'll want next
python eval_all.py variants/<file>.safetensors— gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits).python eval_all.py --cpu-program variants/<file>.safetensors— assembled program through the threshold-gated CPU.python -m safetensors2verilog <file>.safetensors --frontend threshold_logic --circuit arithmetic.ripplecarry8bit -o rc8.v— extract one circuit, dependency-closed, into synthesizable Verilog.python -m safetensors2verilog ... --inspect— print the port contract for any extracted circuit (which pins exist, what widths).python -m safetensors2verilog ... --equiv-check— automatically build a Python-vs-iverilog cross-check testbench for the extracted circuit.