CharlesCNorton

eval_all: hash-keyed result cache (--cache-dir, --no-cache); README: bit-ordering scope rules; docs/ISA.md: opcode reference and end-to-end tutorial; docs/float-pipeline.md: composition gap notes

597e7c2 6 days ago

preview code

raw

history blame contribute delete

6 kB

	# ISA reference card — 8-bit threshold-logic CPU

	This is the architecture exposed by the safetensors files. Every instruction below is implemented entirely as threshold neurons; the same gate-level circuits run whether you simulate in Python (`eval.py` / `play.py` / `test_cpu.py`) or compile the CPU's threshold network through `safetensors2verilog` to FPGA-synthesizable Verilog.

	## Architectural state

	\| Field \| Width \| Notes \|
	\|---\|---\|---\|
	\| PC \| N bits \| program counter; N = address width (0–16) \|
	\| IR \| 16 bits \| instruction register \|
	\| R0–R3 \| 8 bits each \| general-purpose registers \|
	\| FLAGS \| 4 bits \| Z, N, C, V \|
	\| SP \| N bits \| stack pointer (CALL/RET) \|
	\| CTRL \| 4 bits \| HALT, MEM_WE, MEM_RE, RESERVED \|
	\| MEM \| 2^N × 8 bits \| byte-addressable memory \|

	State tensor layout (MSB-first within each multi-bit field):

	```
	[ PC[N] \| IR[16] \| R0[8] R1[8] R2[8] R3[8] \| FLAGS[4] \| SP[N] \| CTRL[4] \| MEM[2^N][8] ]
	```

	## Instruction encoding

	```
	15..12 11..10 9..8 7..0
	opcode rd rs imm8
	```

	\| Class \| Use of fields \|
	\|---\|---\|
	\| R-type \| `rd = rd op rs` — `imm8` ignored \|
	\| I-type \| `rd = op rd, imm8` — `rs` ignored \|
	\| Address-extended \| next 16-bit word is the absolute address (big-endian); `imm8` reserved. Applies to `LOAD`, `STORE`, `JMP`, `Jcc`, `CALL`. \|

	Address-extended instructions consume 4 bytes (instruction word + address word). Untaken conditional jumps still skip the address word, so the PC always advances by 4.

	## Opcode table

	\| Opcode \| Mnemonic \| Class \| Operation \|
	\|---\|---\|---\|---\|
	\| 0x0 \| ADD \| R \| R[rd] = R[rd] + R[rs] \|
	\| 0x1 \| SUB \| R \| R[rd] = R[rd] - R[rs] \|
	\| 0x2 \| AND \| R \| R[rd] = R[rd] & R[rs] \|
	\| 0x3 \| OR \| R \| R[rd] = R[rd] \\| R[rs] \|
	\| 0x4 \| XOR \| R \| R[rd] = R[rd] ^ R[rs] \|
	\| 0x5 \| SHL \| R \| R[rd] = R[rd] << 1 \|
	\| 0x6 \| SHR \| R \| R[rd] = R[rd] >> 1 \|
	\| 0x7 \| MUL \| R \| R[rd] = R[rd] * R[rs] (low 8 bits) \|
	\| 0x8 \| DIV \| R \| R[rd] = R[rd] / R[rs] \|
	\| 0x9 \| CMP \| R \| flags = R[rd] - R[rs] (no writeback) \|
	\| 0xA \| LOAD \| A \| R[rd] = M[addr] \|
	\| 0xB \| STORE \| A \| M[addr] = R[rs] \|
	\| 0xC \| JMP \| A \| PC = addr \|
	\| 0xD \| Jcc \| A \| PC = addr if cond. imm8[2:0] selects condition \|
	\| 0xE \| CALL \| A \| push PC; PC = addr \|
	\| 0xF \| HALT \| – \| stop execution \|

	### Conditional-jump conditions (encoded in imm8[2:0] of the Jcc opcode)

	\| imm8[2:0] \| Mnemonic \| Fires when \|
	\|---\|---\|---\|
	\| 0 \| JZ \| Z flag set (last result was zero) \|
	\| 1 \| JNZ \| Z flag clear \|
	\| 2 \| JC \| carry-out set (last add overflowed unsigned) \|
	\| 3 \| JNC \| carry-out clear \|
	\| 4 \| JN \| result was negative (sign bit set) \|
	\| 5 \| JP \| result was positive (sign bit clear) \|
	\| 6 \| JV \| signed-overflow flag set \|
	\| 7 \| JNV \| signed-overflow flag clear \|

	## Worked example: write your own program

	The Python assembler in `cpu_programs.py` exposes one-method-per-mnemonic helpers on a tiny `Asm` class. Here's "store the value 7 to address 0x10, then halt":

	```python
	from cpu_programs import Asm

	a = Asm(size=64) # 64 bytes of memory
	a.org(0)
	# Set R0 to 7. There is no LDI; use XOR R0,R0 to zero it then ADD an
	# immediate from memory.
	a.label("seven")
	a.org(32); a.db(7) # memory byte at addr 32 holds the constant 7

	a.org(0)
	a.xor_(0, 0) # R0 = 0
	a.load(0, "seven") # R0 = M[seven] = 7
	a.store(0, "dest") # M[dest] = R0
	a.halt()

	a.label("dest"); a.db(0) # destination cell

	bytes_ = a.assemble()
	```

	Then drop the assembled bytes into the CPU's initial memory and let the threshold-network forward pass run.

	## Using the CPU as a threshold-network forward pass

	The CPU is a single tensor program. State in, state out. The driver:

	1. Builds an initial state tensor with the program loaded at `MEM[0..]`.
	2. Calls the safetensors-derived threshold network, which internally loops one fetch–decode–execute cycle and re-feeds the state.
	3. After ≤ N cycles (or earlier if the HALT control bit fires), reads the final memory contents.

	Concretely, this is what `test_cpu.py` and `play.py` already do; both serve as runnable tutorials. The minimal driver loop is:

	```python
	from build import ThresholdComputer
	from safetensors.torch import load_file

	tensors = load_file("variants/neural_computer8_small.safetensors")
	cpu = ThresholdComputer(tensors, data_bits=8)
	state = cpu.initial_state(memory=bytes_)
	state = cpu.run(state, max_cycles=200)
	result = cpu.read_memory(state, addr=0x10)
	print(result) # 7
	```

	## Common pitfalls

	- No load-immediate. `LOAD` reads from memory; there is no LDI / MOV-imm instruction. To put a constant in a register, place it in memory and `LOAD` it.
	- Address-extended instructions are 4 bytes wide. Branch targets must point at the start of an instruction word, not into the middle of one.
	- `MUL` keeps only the low 8 bits. Detect overflow via `CMP` against expected truncation.
	- `CMP` writes only flags, never the destination register. Always followed by a `Jcc`.
	- `SHL` and `SHR` shift by 1. No variable-amount shifter; chain them or compose with bit operations.

	## Threshold-network artefacts you'll want next

	- `python eval_all.py variants/<file>.safetensors` — gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits).
	- `python eval_all.py --cpu-program variants/<file>.safetensors` — assembled program through the threshold-gated CPU.
	- `python -m safetensors2verilog <file>.safetensors --frontend threshold_logic --circuit arithmetic.ripplecarry8bit -o rc8.v` — extract one circuit, dependency-closed, into synthesizable Verilog.
	- `python -m safetensors2verilog ... --inspect` — print the port contract for any extracted circuit (which pins exist, what widths).
	- `python -m safetensors2verilog ... --equiv-check` — automatically build a Python-vs-iverilog cross-check testbench for the extracted circuit.