CharlesCNorton
eval_all: hash-keyed result cache (--cache-dir, --no-cache); README: bit-ordering scope rules; docs/ISA.md: opcode reference and end-to-end tutorial; docs/float-pipeline.md: composition gap notes
597e7c2 | # ISA reference card — 8-bit threshold-logic CPU | |
| This is the architecture exposed by the safetensors files. Every instruction below is *implemented entirely as threshold neurons*; the same gate-level circuits run whether you simulate in Python (`eval.py` / `play.py` / `test_cpu.py`) or compile the CPU's threshold network through `safetensors2verilog` to FPGA-synthesizable Verilog. | |
| ## Architectural state | |
| | Field | Width | Notes | | |
| |---|---|---| | |
| | PC | N bits | program counter; N = address width (0–16) | | |
| | IR | 16 bits | instruction register | | |
| | R0–R3 | 8 bits each | general-purpose registers | | |
| | FLAGS | 4 bits | Z, N, C, V | | |
| | SP | N bits | stack pointer (CALL/RET) | | |
| | CTRL | 4 bits | HALT, MEM_WE, MEM_RE, RESERVED | | |
| | MEM | 2^N × 8 bits | byte-addressable memory | | |
| State tensor layout (MSB-first within each multi-bit field): | |
| ``` | |
| [ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ] | |
| ``` | |
| ## Instruction encoding | |
| ``` | |
| 15..12 11..10 9..8 7..0 | |
| opcode rd rs imm8 | |
| ``` | |
| | Class | Use of fields | | |
| |---|---| | |
| | **R-type** | `rd = rd op rs` — `imm8` ignored | | |
| | **I-type** | `rd = op rd, imm8` — `rs` ignored | | |
| | **Address-extended** | next 16-bit word is the absolute address (big-endian); `imm8` reserved. Applies to `LOAD`, `STORE`, `JMP`, `Jcc`, `CALL`. | | |
| Address-extended instructions consume **4 bytes** (instruction word + address word). Untaken conditional jumps still skip the address word, so the PC always advances by 4. | |
| ## Opcode table | |
| | Opcode | Mnemonic | Class | Operation | | |
| |---|---|---|---| | |
| | 0x0 | ADD | R | R[rd] = R[rd] + R[rs] | | |
| | 0x1 | SUB | R | R[rd] = R[rd] - R[rs] | | |
| | 0x2 | AND | R | R[rd] = R[rd] & R[rs] | | |
| | 0x3 | OR | R | R[rd] = R[rd] \| R[rs] | | |
| | 0x4 | XOR | R | R[rd] = R[rd] ^ R[rs] | | |
| | 0x5 | SHL | R | R[rd] = R[rd] << 1 | | |
| | 0x6 | SHR | R | R[rd] = R[rd] >> 1 | | |
| | 0x7 | MUL | R | R[rd] = R[rd] * R[rs] (low 8 bits) | | |
| | 0x8 | DIV | R | R[rd] = R[rd] / R[rs] | | |
| | 0x9 | CMP | R | flags = R[rd] - R[rs] (no writeback) | | |
| | 0xA | LOAD | A | R[rd] = M[addr] | | |
| | 0xB | STORE | A | M[addr] = R[rs] | | |
| | 0xC | JMP | A | PC = addr | | |
| | 0xD | Jcc | A | PC = addr if cond. imm8[2:0] selects condition | | |
| | 0xE | CALL | A | push PC; PC = addr | | |
| | 0xF | HALT | – | stop execution | | |
| ### Conditional-jump conditions (encoded in imm8[2:0] of the Jcc opcode) | |
| | imm8[2:0] | Mnemonic | Fires when | | |
| |---|---|---| | |
| | 0 | JZ | Z flag set (last result was zero) | | |
| | 1 | JNZ | Z flag clear | | |
| | 2 | JC | carry-out set (last add overflowed unsigned) | | |
| | 3 | JNC | carry-out clear | | |
| | 4 | JN | result was negative (sign bit set) | | |
| | 5 | JP | result was positive (sign bit clear) | | |
| | 6 | JV | signed-overflow flag set | | |
| | 7 | JNV | signed-overflow flag clear | | |
| ## Worked example: write your own program | |
| The Python assembler in `cpu_programs.py` exposes one-method-per-mnemonic helpers on a tiny `Asm` class. Here's "store the value 7 to address 0x10, then halt": | |
| ```python | |
| from cpu_programs import Asm | |
| a = Asm(size=64) # 64 bytes of memory | |
| a.org(0) | |
| # Set R0 to 7. There is no LDI; use XOR R0,R0 to zero it then ADD an | |
| # immediate from memory. | |
| a.label("seven") | |
| a.org(32); a.db(7) # memory byte at addr 32 holds the constant 7 | |
| a.org(0) | |
| a.xor_(0, 0) # R0 = 0 | |
| a.load(0, "seven") # R0 = M[seven] = 7 | |
| a.store(0, "dest") # M[dest] = R0 | |
| a.halt() | |
| a.label("dest"); a.db(0) # destination cell | |
| bytes_ = a.assemble() | |
| ``` | |
| Then drop the assembled bytes into the CPU's initial memory and let the threshold-network forward pass run. | |
| ## Using the CPU as a threshold-network forward pass | |
| The CPU is a single tensor program. State in, state out. The driver: | |
| 1. Builds an initial state tensor with the program loaded at `MEM[0..]`. | |
| 2. Calls the safetensors-derived threshold network, which internally loops one fetch–decode–execute cycle and re-feeds the state. | |
| 3. After ≤ N cycles (or earlier if the HALT control bit fires), reads the final memory contents. | |
| Concretely, this is what `test_cpu.py` and `play.py` already do; both serve as runnable tutorials. The minimal driver loop is: | |
| ```python | |
| from build import ThresholdComputer | |
| from safetensors.torch import load_file | |
| tensors = load_file("variants/neural_computer8_small.safetensors") | |
| cpu = ThresholdComputer(tensors, data_bits=8) | |
| state = cpu.initial_state(memory=bytes_) | |
| state = cpu.run(state, max_cycles=200) | |
| result = cpu.read_memory(state, addr=0x10) | |
| print(result) # 7 | |
| ``` | |
| ## Common pitfalls | |
| - **No load-immediate.** `LOAD` reads from memory; there is no LDI / MOV-imm instruction. To put a constant in a register, place it in memory and `LOAD` it. | |
| - **Address-extended instructions are 4 bytes wide.** Branch targets must point at the start of an instruction word, not into the middle of one. | |
| - **`MUL` keeps only the low 8 bits.** Detect overflow via `CMP` against expected truncation. | |
| - **`CMP` writes only flags**, never the destination register. Always followed by a `Jcc`. | |
| - **`SHL` and `SHR` shift by 1.** No variable-amount shifter; chain them or compose with bit operations. | |
| ## Threshold-network artefacts you'll want next | |
| - `python eval_all.py variants/<file>.safetensors` — gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits). | |
| - `python eval_all.py --cpu-program variants/<file>.safetensors` — assembled program through the threshold-gated CPU. | |
| - `python -m safetensors2verilog <file>.safetensors --frontend threshold_logic --circuit arithmetic.ripplecarry8bit -o rc8.v` — extract one circuit, dependency-closed, into synthesizable Verilog. | |
| - `python -m safetensors2verilog ... --inspect` — print the port contract for any extracted circuit (which pins exist, what widths). | |
| - `python -m safetensors2verilog ... --equiv-check` — automatically build a Python-vs-iverilog cross-check testbench for the extracted circuit. | |