CharlesCNorton
commited on
Commit
·
f90261b
1
Parent(s):
7ec35ca
Add configurable memory partitioning for LLM integration
Browse filesbuild.py:
- Add --memory-profile flag (full/reduced/scratchpad/registers/none)
- Add --addr-bits flag for custom address width (0-16)
- Memory functions now accept addr_bits/mem_bytes parameters
- Pure ALU mode (addr_bits=0) skips memory generation entirely
- Reports memory vs ALU param counts
README.md:
- Document configurable memory profiles
- Update ALU operations list (MUL, DIV, ROL, ROR)
- Add state tensor layout table for each profile
- Add build tool usage examples
- Update citation year to 2026
README.md
CHANGED
|
@@ -18,7 +18,7 @@ Every logic gate is a threshold neuron: `output = 1 if (Σ wᵢxᵢ + b) ≥ 0 e
|
|
| 18 |
|
| 19 |
```
|
| 20 |
Tensors: 11,581
|
| 21 |
-
Parameters: 8,290,134
|
| 22 |
```
|
| 23 |
|
| 24 |
---
|
|
@@ -30,8 +30,8 @@ A complete 8-bit processor where every operation—from Boolean logic to arithme
|
|
| 30 |
| Component | Specification |
|
| 31 |
|-----------|---------------|
|
| 32 |
| Registers | 4 × 8-bit general purpose |
|
| 33 |
-
| Memory | 64KB
|
| 34 |
-
| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR,
|
| 35 |
| Flags | Zero, Negative, Carry, Overflow |
|
| 36 |
| Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
|
| 37 |
|
|
@@ -167,7 +167,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
|
|
| 167 |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
|
| 168 |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
|
| 169 |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
|
| 170 |
-
| Memory | 3 |
|
| 171 |
|
| 172 |
---
|
| 173 |
|
|
@@ -199,14 +199,22 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
|
|
| 199 |
All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
|
| 200 |
|
| 201 |
```
|
| 202 |
-
[ PC[
|
| 203 |
```
|
| 204 |
|
|
|
|
|
|
|
| 205 |
Flags are ordered as: `Z, N, C, V`.
|
| 206 |
|
| 207 |
Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
|
| 208 |
|
| 209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
|
| 211 |
---
|
| 212 |
|
|
@@ -228,11 +236,9 @@ Interpretation:
|
|
| 228 |
|
| 229 |
## Verification
|
| 230 |
|
| 231 |
-
The model includes `eval.py` which exhaustively tests all circuits:
|
| 232 |
-
|
| 233 |
```bash
|
| 234 |
python eval.py
|
| 235 |
-
|
| 236 |
```
|
| 237 |
|
| 238 |
### Verification Status
|
|
@@ -288,6 +294,8 @@ All weights are integers. All activations are Heaviside step. Designed for:
|
|
| 288 |
|
| 289 |
The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
|
| 290 |
|
|
|
|
|
|
|
| 291 |
### Core Thesis
|
| 292 |
|
| 293 |
Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
|
|
@@ -477,12 +485,42 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
|
|
| 477 |
|
| 478 |
| File | Description |
|
| 479 |
|------|-------------|
|
| 480 |
-
| `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters |
|
| 481 |
| `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
|
| 482 |
| `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
|
| 483 |
-
| `build.py` | Build tools
|
| 484 |
| `prune_weights.py` | Weight magnitude pruning |
|
| 485 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 486 |
---
|
| 487 |
|
| 488 |
## Citation
|
|
@@ -491,7 +529,7 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
|
|
| 491 |
@misc{8bit-threshold-computer,
|
| 492 |
title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
|
| 493 |
author={Norton, Charles},
|
| 494 |
-
year={
|
| 495 |
howpublished={Hugging Face},
|
| 496 |
url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
|
| 497 |
}
|
|
|
|
| 18 |
|
| 19 |
```
|
| 20 |
Tensors: 11,581
|
| 21 |
+
Parameters: 8,290,134 (full CPU) / 32,397 (pure ALU for LLM)
|
| 22 |
```
|
| 23 |
|
| 24 |
---
|
|
|
|
| 30 |
| Component | Specification |
|
| 31 |
|-----------|---------------|
|
| 32 |
| Registers | 4 × 8-bit general purpose |
|
| 33 |
+
| Memory | Configurable: 0B (pure ALU) to 64KB (full CPU) |
|
| 34 |
+
| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, MUL, DIV, INC, DEC, NEG, ROL, ROR, CMP) |
|
| 35 |
| Flags | Zero, Negative, Carry, Overflow |
|
| 36 |
| Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
|
| 37 |
|
|
|
|
| 167 |
| Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
|
| 168 |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
|
| 169 |
| Pattern | 10 | Popcount, leading/trailing ones, symmetry |
|
| 170 |
+
| Memory | 3 | N-bit addr decoder, 2^N×8 read mux, write cells (configurable, packed) |
|
| 171 |
|
| 172 |
---
|
| 173 |
|
|
|
|
| 199 |
All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
|
| 200 |
|
| 201 |
```
|
| 202 |
+
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
|
| 203 |
```
|
| 204 |
|
| 205 |
+
Where N = address bits (configurable: 0-16).
|
| 206 |
+
|
| 207 |
Flags are ordered as: `Z, N, C, V`.
|
| 208 |
|
| 209 |
Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
|
| 210 |
|
| 211 |
+
| Memory Profile | Addr Bits | Memory Size | State Bits |
|
| 212 |
+
|----------------|-----------|-------------|------------|
|
| 213 |
+
| Full CPU | 16 | 64KB | 524,376 |
|
| 214 |
+
| Reduced | 12 | 4KB | 32,856 |
|
| 215 |
+
| Scratchpad | 8 | 256B | 2,104 |
|
| 216 |
+
| Registers | 4 | 16B | 184 |
|
| 217 |
+
| Pure ALU | 0 | 0B | 56 |
|
| 218 |
|
| 219 |
---
|
| 220 |
|
|
|
|
| 236 |
|
| 237 |
## Verification
|
| 238 |
|
|
|
|
|
|
|
| 239 |
```bash
|
| 240 |
python eval.py
|
| 241 |
+
python threshold_cpu.py
|
| 242 |
```
|
| 243 |
|
| 244 |
### Verification Status
|
|
|
|
| 294 |
|
| 295 |
The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
|
| 296 |
|
| 297 |
+
**For LLM integration, use `--memory-profile none` to generate a pure ALU model (~32K params) without memory circuits.**
|
| 298 |
+
|
| 299 |
### Core Thesis
|
| 300 |
|
| 301 |
Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
|
|
|
|
| 485 |
|
| 486 |
| File | Description |
|
| 487 |
|------|-------------|
|
| 488 |
+
| `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters (full CPU) |
|
| 489 |
| `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
|
| 490 |
| `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
|
| 491 |
+
| `build.py` | Build tools with configurable memory partitioning |
|
| 492 |
| `prune_weights.py` | Weight magnitude pruning |
|
| 493 |
|
| 494 |
+
### Build Tool Usage
|
| 495 |
+
|
| 496 |
+
```bash
|
| 497 |
+
# Full CPU (64KB memory, default)
|
| 498 |
+
python build.py memory --apply
|
| 499 |
+
|
| 500 |
+
# LLM integration profiles
|
| 501 |
+
python build.py --memory-profile none memory --apply # Pure ALU (32K params)
|
| 502 |
+
python build.py --memory-profile registers memory --apply # 16-byte register file
|
| 503 |
+
python build.py --memory-profile scratchpad memory --apply # 256-byte scratchpad
|
| 504 |
+
|
| 505 |
+
# Custom memory size
|
| 506 |
+
python build.py --addr-bits 6 memory --apply # 64 bytes (2^6)
|
| 507 |
+
|
| 508 |
+
# Regenerate ALU and input metadata
|
| 509 |
+
python build.py alu --apply
|
| 510 |
+
python build.py inputs --apply
|
| 511 |
+
python build.py all --apply # memory + alu + inputs
|
| 512 |
+
```
|
| 513 |
+
|
| 514 |
+
Memory profiles:
|
| 515 |
+
|
| 516 |
+
| Profile | Addr Bits | Memory | Memory Params | Total Params |
|
| 517 |
+
|---------|-----------|--------|---------------|--------------|
|
| 518 |
+
| `none` | 0 | 0B | 0 | ~32K |
|
| 519 |
+
| `registers` | 4 | 16B | ~2K | ~34K |
|
| 520 |
+
| `scratchpad` | 8 | 256B | ~30K | ~63K |
|
| 521 |
+
| `reduced` | 12 | 4KB | ~516K | ~549K |
|
| 522 |
+
| `full` | 16 | 64KB | ~8.26M | ~8.29M |
|
| 523 |
+
|
| 524 |
---
|
| 525 |
|
| 526 |
## Citation
|
|
|
|
| 529 |
@misc{8bit-threshold-computer,
|
| 530 |
title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
|
| 531 |
author={Norton, Charles},
|
| 532 |
+
year={2026},
|
| 533 |
howpublished={Hugging Face},
|
| 534 |
url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
|
| 535 |
}
|
build.py
CHANGED
|
@@ -115,8 +115,16 @@ from safetensors.torch import save_file
|
|
| 115 |
MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
|
| 116 |
MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
|
| 122 |
def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
|
|
@@ -172,21 +180,21 @@ def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None
|
|
| 172 |
del tensors[key]
|
| 173 |
|
| 174 |
|
| 175 |
-
def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
|
| 176 |
-
weights = torch.empty((
|
| 177 |
-
bias = torch.empty((
|
| 178 |
-
for addr in range(
|
| 179 |
-
bits = [(addr >> (
|
| 180 |
weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
|
| 181 |
bias[addr] = -float(sum(bits))
|
| 182 |
tensors["memory.addr_decode.weight"] = weights
|
| 183 |
tensors["memory.addr_decode.bias"] = bias
|
| 184 |
|
| 185 |
|
| 186 |
-
def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
|
| 187 |
-
and_weight = torch.ones((8,
|
| 188 |
-
and_bias = torch.full((8,
|
| 189 |
-
or_weight = torch.ones((8,
|
| 190 |
or_bias = torch.full((8,), -1.0, dtype=torch.float32)
|
| 191 |
tensors["memory.read.and.weight"] = and_weight
|
| 192 |
tensors["memory.read.and.bias"] = and_bias
|
|
@@ -194,17 +202,17 @@ def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
|
|
| 194 |
tensors["memory.read.or.bias"] = or_bias
|
| 195 |
|
| 196 |
|
| 197 |
-
def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
|
| 198 |
-
sel_weight = torch.ones((
|
| 199 |
-
sel_bias = torch.full((
|
| 200 |
-
nsel_weight = torch.full((
|
| 201 |
-
nsel_bias = torch.zeros((
|
| 202 |
-
and_old_weight = torch.ones((
|
| 203 |
-
and_old_bias = torch.full((
|
| 204 |
-
and_new_weight = torch.ones((
|
| 205 |
-
and_new_bias = torch.full((
|
| 206 |
-
or_weight = torch.ones((
|
| 207 |
-
or_bias = torch.full((
|
| 208 |
tensors["memory.write.sel.weight"] = sel_weight
|
| 209 |
tensors["memory.write.sel.bias"] = sel_bias
|
| 210 |
tensors["memory.write.nsel.weight"] = nsel_weight
|
|
@@ -217,13 +225,13 @@ def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
|
|
| 217 |
tensors["memory.write.or.bias"] = or_bias
|
| 218 |
|
| 219 |
|
| 220 |
-
def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
|
| 221 |
for bit in range(16):
|
| 222 |
add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
|
| 223 |
for bit in range(8):
|
| 224 |
add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
|
| 225 |
add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
|
| 226 |
-
for bit in range(
|
| 227 |
add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
|
| 228 |
|
| 229 |
|
|
@@ -502,9 +510,9 @@ def add_comparators(tensors: Dict[str, torch.Tensor]) -> None:
|
|
| 502 |
add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
|
| 503 |
|
| 504 |
|
| 505 |
-
def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
|
| 506 |
-
tensors["manifest.memory_bytes"] = torch.tensor([float(
|
| 507 |
-
tensors["manifest.pc_width"] = torch.tensor([float(
|
| 508 |
tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
|
| 509 |
|
| 510 |
|
|
@@ -1174,33 +1182,69 @@ def build_inputs(tensors: Dict[str, torch.Tensor]) -> tuple[Dict[str, torch.Tens
|
|
| 1174 |
return tensors, reg, stats
|
| 1175 |
|
| 1176 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1177 |
def cmd_memory(args) -> None:
|
|
|
|
|
|
|
| 1178 |
print("=" * 60)
|
| 1179 |
print(" BUILD MEMORY CIRCUITS")
|
| 1180 |
print("=" * 60)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1181 |
print(f"\nLoading: {args.model}")
|
| 1182 |
tensors = load_tensors(args.model)
|
| 1183 |
print(f" Loaded {len(tensors)} tensors")
|
|
|
|
| 1184 |
print("\nDropping existing memory/control tensors...")
|
| 1185 |
drop_prefixes(tensors, [
|
| 1186 |
"memory.addr_decode.", "memory.read.", "memory.write.",
|
| 1187 |
"control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
|
| 1188 |
])
|
| 1189 |
print(f" Now {len(tensors)} tensors")
|
| 1190 |
-
|
| 1191 |
-
|
| 1192 |
-
|
| 1193 |
-
|
| 1194 |
-
|
| 1195 |
-
|
| 1196 |
-
|
| 1197 |
-
|
| 1198 |
-
print("
|
| 1199 |
-
|
| 1200 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1201 |
print("\nUpdating manifest...")
|
| 1202 |
-
update_manifest(tensors)
|
| 1203 |
-
print(f" memory_bytes={
|
|
|
|
| 1204 |
if args.apply:
|
| 1205 |
print(f"\nSaving: {args.model}")
|
| 1206 |
save_file(tensors, str(args.model))
|
|
@@ -1210,7 +1254,12 @@ def cmd_memory(args) -> None:
|
|
| 1210 |
print(" Done.")
|
| 1211 |
else:
|
| 1212 |
print("\n[DRY-RUN] Use --apply to save.")
|
|
|
|
| 1213 |
print(f"\nTotal: {len(tensors)} tensors")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1214 |
print("=" * 60)
|
| 1215 |
|
| 1216 |
|
|
@@ -1341,16 +1390,50 @@ def cmd_all(args) -> None:
|
|
| 1341 |
|
| 1342 |
|
| 1343 |
def main() -> None:
|
| 1344 |
-
parser = argparse.ArgumentParser(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1345 |
parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
|
| 1346 |
parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
|
| 1347 |
parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1348 |
subparsers = parser.add_subparsers(dest="command", help="Subcommands")
|
| 1349 |
-
subparsers.add_parser("memory", help="Generate
|
| 1350 |
subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
|
| 1351 |
subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
|
| 1352 |
subparsers.add_parser("all", help="Run memory, alu, then inputs")
|
|
|
|
| 1353 |
args = parser.parse_args()
|
|
|
|
| 1354 |
if args.command == "memory":
|
| 1355 |
cmd_memory(args)
|
| 1356 |
elif args.command == "alu":
|
|
|
|
| 115 |
MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
|
| 116 |
MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
|
| 117 |
|
| 118 |
+
DEFAULT_ADDR_BITS = 16
|
| 119 |
+
DEFAULT_MEM_BYTES = 1 << DEFAULT_ADDR_BITS
|
| 120 |
+
|
| 121 |
+
MEMORY_PROFILES = {
|
| 122 |
+
"full": 16, # 64KB - full CPU mode
|
| 123 |
+
"reduced": 12, # 4KB - reduced CPU
|
| 124 |
+
"scratchpad": 8, # 256 bytes - LLM scratchpad
|
| 125 |
+
"registers": 4, # 16 bytes - LLM register file
|
| 126 |
+
"none": 0, # Pure ALU, no memory
|
| 127 |
+
}
|
| 128 |
|
| 129 |
|
| 130 |
def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
|
|
|
|
| 180 |
del tensors[key]
|
| 181 |
|
| 182 |
|
| 183 |
+
def add_decoder(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
|
| 184 |
+
weights = torch.empty((mem_bytes, addr_bits), dtype=torch.float32)
|
| 185 |
+
bias = torch.empty((mem_bytes,), dtype=torch.float32)
|
| 186 |
+
for addr in range(mem_bytes):
|
| 187 |
+
bits = [(addr >> (addr_bits - 1 - i)) & 1 for i in range(addr_bits)]
|
| 188 |
weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
|
| 189 |
bias[addr] = -float(sum(bits))
|
| 190 |
tensors["memory.addr_decode.weight"] = weights
|
| 191 |
tensors["memory.addr_decode.bias"] = bias
|
| 192 |
|
| 193 |
|
| 194 |
+
def add_memory_read_mux(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
|
| 195 |
+
and_weight = torch.ones((8, mem_bytes, 2), dtype=torch.float32)
|
| 196 |
+
and_bias = torch.full((8, mem_bytes), -2.0, dtype=torch.float32)
|
| 197 |
+
or_weight = torch.ones((8, mem_bytes), dtype=torch.float32)
|
| 198 |
or_bias = torch.full((8,), -1.0, dtype=torch.float32)
|
| 199 |
tensors["memory.read.and.weight"] = and_weight
|
| 200 |
tensors["memory.read.and.bias"] = and_bias
|
|
|
|
| 202 |
tensors["memory.read.or.bias"] = or_bias
|
| 203 |
|
| 204 |
|
| 205 |
+
def add_memory_write_cells(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
|
| 206 |
+
sel_weight = torch.ones((mem_bytes, 2), dtype=torch.float32)
|
| 207 |
+
sel_bias = torch.full((mem_bytes,), -2.0, dtype=torch.float32)
|
| 208 |
+
nsel_weight = torch.full((mem_bytes, 1), -1.0, dtype=torch.float32)
|
| 209 |
+
nsel_bias = torch.zeros((mem_bytes,), dtype=torch.float32)
|
| 210 |
+
and_old_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
|
| 211 |
+
and_old_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
|
| 212 |
+
and_new_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
|
| 213 |
+
and_new_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
|
| 214 |
+
or_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
|
| 215 |
+
or_bias = torch.full((mem_bytes, 8), -1.0, dtype=torch.float32)
|
| 216 |
tensors["memory.write.sel.weight"] = sel_weight
|
| 217 |
tensors["memory.write.sel.bias"] = sel_bias
|
| 218 |
tensors["memory.write.nsel.weight"] = nsel_weight
|
|
|
|
| 225 |
tensors["memory.write.or.bias"] = or_bias
|
| 226 |
|
| 227 |
|
| 228 |
+
def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor], addr_bits: int) -> None:
|
| 229 |
for bit in range(16):
|
| 230 |
add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
|
| 231 |
for bit in range(8):
|
| 232 |
add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
|
| 233 |
add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
|
| 234 |
+
for bit in range(addr_bits):
|
| 235 |
add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
|
| 236 |
|
| 237 |
|
|
|
|
| 510 |
add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
|
| 511 |
|
| 512 |
|
| 513 |
+
def update_manifest(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
|
| 514 |
+
tensors["manifest.memory_bytes"] = torch.tensor([float(mem_bytes)], dtype=torch.float32)
|
| 515 |
+
tensors["manifest.pc_width"] = torch.tensor([float(addr_bits)], dtype=torch.float32)
|
| 516 |
tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
|
| 517 |
|
| 518 |
|
|
|
|
| 1182 |
return tensors, reg, stats
|
| 1183 |
|
| 1184 |
|
| 1185 |
+
def resolve_memory_config(args) -> tuple:
|
| 1186 |
+
"""Resolve memory configuration from args, returns (addr_bits, mem_bytes)."""
|
| 1187 |
+
if hasattr(args, 'memory_profile') and args.memory_profile:
|
| 1188 |
+
addr_bits = MEMORY_PROFILES[args.memory_profile]
|
| 1189 |
+
elif hasattr(args, 'addr_bits') and args.addr_bits is not None:
|
| 1190 |
+
addr_bits = args.addr_bits
|
| 1191 |
+
else:
|
| 1192 |
+
addr_bits = DEFAULT_ADDR_BITS
|
| 1193 |
+
mem_bytes = (1 << addr_bits) if addr_bits > 0 else 0
|
| 1194 |
+
return addr_bits, mem_bytes
|
| 1195 |
+
|
| 1196 |
+
|
| 1197 |
def cmd_memory(args) -> None:
|
| 1198 |
+
addr_bits, mem_bytes = resolve_memory_config(args)
|
| 1199 |
+
|
| 1200 |
print("=" * 60)
|
| 1201 |
print(" BUILD MEMORY CIRCUITS")
|
| 1202 |
print("=" * 60)
|
| 1203 |
+
print(f"\nMemory configuration:")
|
| 1204 |
+
print(f" Address bits: {addr_bits}")
|
| 1205 |
+
print(f" Memory bytes: {mem_bytes:,}")
|
| 1206 |
+
if addr_bits == 0:
|
| 1207 |
+
print(f" Mode: PURE ALU (no memory)")
|
| 1208 |
+
elif addr_bits <= 4:
|
| 1209 |
+
print(f" Mode: LLM registers")
|
| 1210 |
+
elif addr_bits <= 8:
|
| 1211 |
+
print(f" Mode: LLM scratchpad")
|
| 1212 |
+
elif addr_bits <= 12:
|
| 1213 |
+
print(f" Mode: Reduced CPU")
|
| 1214 |
+
else:
|
| 1215 |
+
print(f" Mode: Full CPU")
|
| 1216 |
+
|
| 1217 |
print(f"\nLoading: {args.model}")
|
| 1218 |
tensors = load_tensors(args.model)
|
| 1219 |
print(f" Loaded {len(tensors)} tensors")
|
| 1220 |
+
|
| 1221 |
print("\nDropping existing memory/control tensors...")
|
| 1222 |
drop_prefixes(tensors, [
|
| 1223 |
"memory.addr_decode.", "memory.read.", "memory.write.",
|
| 1224 |
"control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
|
| 1225 |
])
|
| 1226 |
print(f" Now {len(tensors)} tensors")
|
| 1227 |
+
|
| 1228 |
+
if addr_bits > 0:
|
| 1229 |
+
print("\nGenerating memory circuits...")
|
| 1230 |
+
add_decoder(tensors, addr_bits, mem_bytes)
|
| 1231 |
+
add_memory_read_mux(tensors, mem_bytes)
|
| 1232 |
+
add_memory_write_cells(tensors, mem_bytes)
|
| 1233 |
+
print(" Added decoder, read mux, write cells")
|
| 1234 |
+
|
| 1235 |
+
print("\nGenerating buffer gates...")
|
| 1236 |
+
try:
|
| 1237 |
+
add_fetch_load_store_buffers(tensors, addr_bits)
|
| 1238 |
+
print(" Added fetch/load/store/mem_addr buffers")
|
| 1239 |
+
except ValueError as e:
|
| 1240 |
+
print(f" Buffers already exist: {e}")
|
| 1241 |
+
else:
|
| 1242 |
+
print("\nSkipping memory circuits (addr_bits=0, pure ALU mode)")
|
| 1243 |
+
|
| 1244 |
print("\nUpdating manifest...")
|
| 1245 |
+
update_manifest(tensors, addr_bits, mem_bytes)
|
| 1246 |
+
print(f" memory_bytes={mem_bytes:,}, pc_width={addr_bits}")
|
| 1247 |
+
|
| 1248 |
if args.apply:
|
| 1249 |
print(f"\nSaving: {args.model}")
|
| 1250 |
save_file(tensors, str(args.model))
|
|
|
|
| 1254 |
print(" Done.")
|
| 1255 |
else:
|
| 1256 |
print("\n[DRY-RUN] Use --apply to save.")
|
| 1257 |
+
|
| 1258 |
print(f"\nTotal: {len(tensors)} tensors")
|
| 1259 |
+
mem_params = sum(t.numel() for k, t in tensors.items() if k.startswith("memory."))
|
| 1260 |
+
alu_params = sum(t.numel() for k, t in tensors.items() if not k.startswith("memory.") and not k.startswith("manifest."))
|
| 1261 |
+
print(f" Memory params: {mem_params:,}")
|
| 1262 |
+
print(f" ALU/Logic params: {alu_params:,}")
|
| 1263 |
print("=" * 60)
|
| 1264 |
|
| 1265 |
|
|
|
|
| 1390 |
|
| 1391 |
|
| 1392 |
def main() -> None:
|
| 1393 |
+
parser = argparse.ArgumentParser(
|
| 1394 |
+
description="Build tools for threshold computer safetensors",
|
| 1395 |
+
formatter_class=argparse.RawDescriptionHelpFormatter,
|
| 1396 |
+
epilog="""
|
| 1397 |
+
Memory Profiles:
|
| 1398 |
+
full 64KB (16-bit addr) - Full CPU mode
|
| 1399 |
+
reduced 4KB (12-bit addr) - Reduced CPU
|
| 1400 |
+
scratchpad 256B (8-bit addr) - LLM scratchpad
|
| 1401 |
+
registers 16B (4-bit addr) - LLM register file
|
| 1402 |
+
none 0B (no memory) - Pure ALU for LLM
|
| 1403 |
+
|
| 1404 |
+
Examples:
|
| 1405 |
+
python build.py memory --memory-profile none --apply # LLM-only (no RAM)
|
| 1406 |
+
python build.py memory --memory-profile scratchpad # 256-byte scratchpad
|
| 1407 |
+
python build.py memory --addr-bits 6 # Custom: 64 bytes
|
| 1408 |
+
python build.py memory # Default: 64KB
|
| 1409 |
+
"""
|
| 1410 |
+
)
|
| 1411 |
parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
|
| 1412 |
parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
|
| 1413 |
parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
|
| 1414 |
+
|
| 1415 |
+
mem_group = parser.add_mutually_exclusive_group()
|
| 1416 |
+
mem_group.add_argument(
|
| 1417 |
+
"--memory-profile", "-m",
|
| 1418 |
+
choices=list(MEMORY_PROFILES.keys()),
|
| 1419 |
+
help="Memory size profile (full/reduced/scratchpad/registers/none)"
|
| 1420 |
+
)
|
| 1421 |
+
mem_group.add_argument(
|
| 1422 |
+
"--addr-bits", "-a",
|
| 1423 |
+
type=int,
|
| 1424 |
+
choices=range(0, 17),
|
| 1425 |
+
metavar="N",
|
| 1426 |
+
help="Address bus width in bits (0-16). 0=no memory, 16=64KB"
|
| 1427 |
+
)
|
| 1428 |
+
|
| 1429 |
subparsers = parser.add_subparsers(dest="command", help="Subcommands")
|
| 1430 |
+
subparsers.add_parser("memory", help="Generate memory circuits (size controlled by --memory-profile or --addr-bits)")
|
| 1431 |
subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
|
| 1432 |
subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
|
| 1433 |
subparsers.add_parser("all", help="Run memory, alu, then inputs")
|
| 1434 |
+
|
| 1435 |
args = parser.parse_args()
|
| 1436 |
+
|
| 1437 |
if args.command == "memory":
|
| 1438 |
cmd_memory(args)
|
| 1439 |
elif args.command == "alu":
|