CharlesCNorton commited on
Commit
f90261b
·
1 Parent(s): 7ec35ca

Add configurable memory partitioning for LLM integration

Browse files

build.py:
- Add --memory-profile flag (full/reduced/scratchpad/registers/none)
- Add --addr-bits flag for custom address width (0-16)
- Memory functions now accept addr_bits/mem_bytes parameters
- Pure ALU mode (addr_bits=0) skips memory generation entirely
- Reports memory vs ALU param counts

README.md:
- Document configurable memory profiles
- Update ALU operations list (MUL, DIV, ROL, ROR)
- Add state tensor layout table for each profile
- Add build tool usage examples
- Update citation year to 2026

Files changed (2) hide show
  1. README.md +50 -12
  2. build.py +125 -42
README.md CHANGED
@@ -18,7 +18,7 @@ Every logic gate is a threshold neuron: `output = 1 if (Σ wᵢxᵢ + b) ≥ 0 e
18
 
19
  ```
20
  Tensors: 11,581
21
- Parameters: 8,290,134
22
  ```
23
 
24
  ---
@@ -30,8 +30,8 @@ A complete 8-bit processor where every operation—from Boolean logic to arithme
30
  | Component | Specification |
31
  |-----------|---------------|
32
  | Registers | 4 × 8-bit general purpose |
33
- | Memory | 64KB addressable |
34
- | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
35
  | Flags | Zero, Negative, Carry, Overflow |
36
  | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
37
 
@@ -167,7 +167,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
167
  | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
168
  | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
169
  | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
170
- | Memory | 3 | 16-bit addr decoder, 65536x8 read mux, write cell update (packed) |
171
 
172
  ---
173
 
@@ -199,14 +199,22 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
199
  All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
200
 
201
  ```
202
- [ PC[16] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[16] | CTRL[4] | MEM[65536][8] ]
203
  ```
204
 
 
 
205
  Flags are ordered as: `Z, N, C, V`.
206
 
207
  Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
208
 
209
- Total state size: `524376` bits.
 
 
 
 
 
 
210
 
211
  ---
212
 
@@ -228,11 +236,9 @@ Interpretation:
228
 
229
  ## Verification
230
 
231
- The model includes `eval.py` which exhaustively tests all circuits:
232
-
233
  ```bash
234
  python eval.py
235
- # Output: Fitness: 1.000000
236
  ```
237
 
238
  ### Verification Status
@@ -288,6 +294,8 @@ All weights are integers. All activations are Heaviside step. Designed for:
288
 
289
  The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
290
 
 
 
291
  ### Core Thesis
292
 
293
  Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
@@ -477,12 +485,42 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
477
 
478
  | File | Description |
479
  |------|-------------|
480
- | `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters |
481
  | `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
482
  | `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
483
- | `build.py` | Build tools for memory, ALU, and .inputs tensors |
484
  | `prune_weights.py` | Weight magnitude pruning |
485
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
486
  ---
487
 
488
  ## Citation
@@ -491,7 +529,7 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
491
  @misc{8bit-threshold-computer,
492
  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
493
  author={Norton, Charles},
494
- year={2025},
495
  howpublished={Hugging Face},
496
  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
497
  }
 
18
 
19
  ```
20
  Tensors: 11,581
21
+ Parameters: 8,290,134 (full CPU) / 32,397 (pure ALU for LLM)
22
  ```
23
 
24
  ---
 
30
  | Component | Specification |
31
  |-----------|---------------|
32
  | Registers | 4 × 8-bit general purpose |
33
+ | Memory | Configurable: 0B (pure ALU) to 64KB (full CPU) |
34
+ | ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, MUL, DIV, INC, DEC, NEG, ROL, ROR, CMP) |
35
  | Flags | Zero, Negative, Carry, Overflow |
36
  | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
37
 
 
167
  | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
168
  | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
169
  | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
170
+ | Memory | 3 | N-bit addr decoder, 2^N×8 read mux, write cells (configurable, packed) |
171
 
172
  ---
173
 
 
199
  All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
200
 
201
  ```
202
+ [ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
203
  ```
204
 
205
+ Where N = address bits (configurable: 0-16).
206
+
207
  Flags are ordered as: `Z, N, C, V`.
208
 
209
  Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
210
 
211
+ | Memory Profile | Addr Bits | Memory Size | State Bits |
212
+ |----------------|-----------|-------------|------------|
213
+ | Full CPU | 16 | 64KB | 524,376 |
214
+ | Reduced | 12 | 4KB | 32,856 |
215
+ | Scratchpad | 8 | 256B | 2,104 |
216
+ | Registers | 4 | 16B | 184 |
217
+ | Pure ALU | 0 | 0B | 56 |
218
 
219
  ---
220
 
 
236
 
237
  ## Verification
238
 
 
 
239
  ```bash
240
  python eval.py
241
+ python threshold_cpu.py
242
  ```
243
 
244
  ### Verification Status
 
294
 
295
  The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
296
 
297
+ **For LLM integration, use `--memory-profile none` to generate a pure ALU model (~32K params) without memory circuits.**
298
+
299
  ### Core Thesis
300
 
301
  Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
 
485
 
486
  | File | Description |
487
  |------|-------------|
488
+ | `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters (full CPU) |
489
  | `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
490
  | `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
491
+ | `build.py` | Build tools with configurable memory partitioning |
492
  | `prune_weights.py` | Weight magnitude pruning |
493
 
494
+ ### Build Tool Usage
495
+
496
+ ```bash
497
+ # Full CPU (64KB memory, default)
498
+ python build.py memory --apply
499
+
500
+ # LLM integration profiles
501
+ python build.py --memory-profile none memory --apply # Pure ALU (32K params)
502
+ python build.py --memory-profile registers memory --apply # 16-byte register file
503
+ python build.py --memory-profile scratchpad memory --apply # 256-byte scratchpad
504
+
505
+ # Custom memory size
506
+ python build.py --addr-bits 6 memory --apply # 64 bytes (2^6)
507
+
508
+ # Regenerate ALU and input metadata
509
+ python build.py alu --apply
510
+ python build.py inputs --apply
511
+ python build.py all --apply # memory + alu + inputs
512
+ ```
513
+
514
+ Memory profiles:
515
+
516
+ | Profile | Addr Bits | Memory | Memory Params | Total Params |
517
+ |---------|-----------|--------|---------------|--------------|
518
+ | `none` | 0 | 0B | 0 | ~32K |
519
+ | `registers` | 4 | 16B | ~2K | ~34K |
520
+ | `scratchpad` | 8 | 256B | ~30K | ~63K |
521
+ | `reduced` | 12 | 4KB | ~516K | ~549K |
522
+ | `full` | 16 | 64KB | ~8.26M | ~8.29M |
523
+
524
  ---
525
 
526
  ## Citation
 
529
  @misc{8bit-threshold-computer,
530
  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
531
  author={Norton, Charles},
532
+ year={2026},
533
  howpublished={Hugging Face},
534
  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
535
  }
build.py CHANGED
@@ -115,8 +115,16 @@ from safetensors.torch import save_file
115
  MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
116
  MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
117
 
118
- ADDR_BITS = 16
119
- MEM_BYTES = 1 << ADDR_BITS
 
 
 
 
 
 
 
 
120
 
121
 
122
  def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
@@ -172,21 +180,21 @@ def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None
172
  del tensors[key]
173
 
174
 
175
- def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
176
- weights = torch.empty((MEM_BYTES, ADDR_BITS), dtype=torch.float32)
177
- bias = torch.empty((MEM_BYTES,), dtype=torch.float32)
178
- for addr in range(MEM_BYTES):
179
- bits = [(addr >> (ADDR_BITS - 1 - i)) & 1 for i in range(ADDR_BITS)]
180
  weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
181
  bias[addr] = -float(sum(bits))
182
  tensors["memory.addr_decode.weight"] = weights
183
  tensors["memory.addr_decode.bias"] = bias
184
 
185
 
186
- def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
187
- and_weight = torch.ones((8, MEM_BYTES, 2), dtype=torch.float32)
188
- and_bias = torch.full((8, MEM_BYTES), -2.0, dtype=torch.float32)
189
- or_weight = torch.ones((8, MEM_BYTES), dtype=torch.float32)
190
  or_bias = torch.full((8,), -1.0, dtype=torch.float32)
191
  tensors["memory.read.and.weight"] = and_weight
192
  tensors["memory.read.and.bias"] = and_bias
@@ -194,17 +202,17 @@ def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
194
  tensors["memory.read.or.bias"] = or_bias
195
 
196
 
197
- def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
198
- sel_weight = torch.ones((MEM_BYTES, 2), dtype=torch.float32)
199
- sel_bias = torch.full((MEM_BYTES,), -2.0, dtype=torch.float32)
200
- nsel_weight = torch.full((MEM_BYTES, 1), -1.0, dtype=torch.float32)
201
- nsel_bias = torch.zeros((MEM_BYTES,), dtype=torch.float32)
202
- and_old_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
203
- and_old_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
204
- and_new_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
205
- and_new_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
206
- or_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
207
- or_bias = torch.full((MEM_BYTES, 8), -1.0, dtype=torch.float32)
208
  tensors["memory.write.sel.weight"] = sel_weight
209
  tensors["memory.write.sel.bias"] = sel_bias
210
  tensors["memory.write.nsel.weight"] = nsel_weight
@@ -217,13 +225,13 @@ def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
217
  tensors["memory.write.or.bias"] = or_bias
218
 
219
 
220
- def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
221
  for bit in range(16):
222
  add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
223
  for bit in range(8):
224
  add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
225
  add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
226
- for bit in range(ADDR_BITS):
227
  add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
228
 
229
 
@@ -502,9 +510,9 @@ def add_comparators(tensors: Dict[str, torch.Tensor]) -> None:
502
  add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
503
 
504
 
505
- def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
506
- tensors["manifest.memory_bytes"] = torch.tensor([float(MEM_BYTES)], dtype=torch.float32)
507
- tensors["manifest.pc_width"] = torch.tensor([float(ADDR_BITS)], dtype=torch.float32)
508
  tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
509
 
510
 
@@ -1174,33 +1182,69 @@ def build_inputs(tensors: Dict[str, torch.Tensor]) -> tuple[Dict[str, torch.Tens
1174
  return tensors, reg, stats
1175
 
1176
 
 
 
 
 
 
 
 
 
 
 
 
 
1177
  def cmd_memory(args) -> None:
 
 
1178
  print("=" * 60)
1179
  print(" BUILD MEMORY CIRCUITS")
1180
  print("=" * 60)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1181
  print(f"\nLoading: {args.model}")
1182
  tensors = load_tensors(args.model)
1183
  print(f" Loaded {len(tensors)} tensors")
 
1184
  print("\nDropping existing memory/control tensors...")
1185
  drop_prefixes(tensors, [
1186
  "memory.addr_decode.", "memory.read.", "memory.write.",
1187
  "control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
1188
  ])
1189
  print(f" Now {len(tensors)} tensors")
1190
- print("\nGenerating memory circuits...")
1191
- add_decoder(tensors)
1192
- add_memory_read_mux(tensors)
1193
- add_memory_write_cells(tensors)
1194
- print(" Added decoder, read mux, write cells")
1195
- print("\nGenerating buffer gates...")
1196
- try:
1197
- add_fetch_load_store_buffers(tensors)
1198
- print(" Added fetch/load/store/mem_addr buffers")
1199
- except ValueError as e:
1200
- print(f" Buffers already exist: {e}")
 
 
 
 
 
 
1201
  print("\nUpdating manifest...")
1202
- update_manifest(tensors)
1203
- print(f" memory_bytes={MEM_BYTES}, pc_width={ADDR_BITS}")
 
1204
  if args.apply:
1205
  print(f"\nSaving: {args.model}")
1206
  save_file(tensors, str(args.model))
@@ -1210,7 +1254,12 @@ def cmd_memory(args) -> None:
1210
  print(" Done.")
1211
  else:
1212
  print("\n[DRY-RUN] Use --apply to save.")
 
1213
  print(f"\nTotal: {len(tensors)} tensors")
 
 
 
 
1214
  print("=" * 60)
1215
 
1216
 
@@ -1341,16 +1390,50 @@ def cmd_all(args) -> None:
1341
 
1342
 
1343
  def main() -> None:
1344
- parser = argparse.ArgumentParser(description="Build tools for threshold computer safetensors")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1345
  parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
1346
  parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
1347
  parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1348
  subparsers = parser.add_subparsers(dest="command", help="Subcommands")
1349
- subparsers.add_parser("memory", help="Generate 64KB memory circuits")
1350
  subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
1351
  subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
1352
  subparsers.add_parser("all", help="Run memory, alu, then inputs")
 
1353
  args = parser.parse_args()
 
1354
  if args.command == "memory":
1355
  cmd_memory(args)
1356
  elif args.command == "alu":
 
115
  MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
116
  MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
117
 
118
+ DEFAULT_ADDR_BITS = 16
119
+ DEFAULT_MEM_BYTES = 1 << DEFAULT_ADDR_BITS
120
+
121
+ MEMORY_PROFILES = {
122
+ "full": 16, # 64KB - full CPU mode
123
+ "reduced": 12, # 4KB - reduced CPU
124
+ "scratchpad": 8, # 256 bytes - LLM scratchpad
125
+ "registers": 4, # 16 bytes - LLM register file
126
+ "none": 0, # Pure ALU, no memory
127
+ }
128
 
129
 
130
  def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
 
180
  del tensors[key]
181
 
182
 
183
+ def add_decoder(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
184
+ weights = torch.empty((mem_bytes, addr_bits), dtype=torch.float32)
185
+ bias = torch.empty((mem_bytes,), dtype=torch.float32)
186
+ for addr in range(mem_bytes):
187
+ bits = [(addr >> (addr_bits - 1 - i)) & 1 for i in range(addr_bits)]
188
  weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
189
  bias[addr] = -float(sum(bits))
190
  tensors["memory.addr_decode.weight"] = weights
191
  tensors["memory.addr_decode.bias"] = bias
192
 
193
 
194
+ def add_memory_read_mux(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
195
+ and_weight = torch.ones((8, mem_bytes, 2), dtype=torch.float32)
196
+ and_bias = torch.full((8, mem_bytes), -2.0, dtype=torch.float32)
197
+ or_weight = torch.ones((8, mem_bytes), dtype=torch.float32)
198
  or_bias = torch.full((8,), -1.0, dtype=torch.float32)
199
  tensors["memory.read.and.weight"] = and_weight
200
  tensors["memory.read.and.bias"] = and_bias
 
202
  tensors["memory.read.or.bias"] = or_bias
203
 
204
 
205
+ def add_memory_write_cells(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
206
+ sel_weight = torch.ones((mem_bytes, 2), dtype=torch.float32)
207
+ sel_bias = torch.full((mem_bytes,), -2.0, dtype=torch.float32)
208
+ nsel_weight = torch.full((mem_bytes, 1), -1.0, dtype=torch.float32)
209
+ nsel_bias = torch.zeros((mem_bytes,), dtype=torch.float32)
210
+ and_old_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
211
+ and_old_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
212
+ and_new_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
213
+ and_new_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
214
+ or_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
215
+ or_bias = torch.full((mem_bytes, 8), -1.0, dtype=torch.float32)
216
  tensors["memory.write.sel.weight"] = sel_weight
217
  tensors["memory.write.sel.bias"] = sel_bias
218
  tensors["memory.write.nsel.weight"] = nsel_weight
 
225
  tensors["memory.write.or.bias"] = or_bias
226
 
227
 
228
+ def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor], addr_bits: int) -> None:
229
  for bit in range(16):
230
  add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
231
  for bit in range(8):
232
  add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
233
  add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
234
+ for bit in range(addr_bits):
235
  add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
236
 
237
 
 
510
  add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
511
 
512
 
513
+ def update_manifest(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
514
+ tensors["manifest.memory_bytes"] = torch.tensor([float(mem_bytes)], dtype=torch.float32)
515
+ tensors["manifest.pc_width"] = torch.tensor([float(addr_bits)], dtype=torch.float32)
516
  tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
517
 
518
 
 
1182
  return tensors, reg, stats
1183
 
1184
 
1185
+ def resolve_memory_config(args) -> tuple:
1186
+ """Resolve memory configuration from args, returns (addr_bits, mem_bytes)."""
1187
+ if hasattr(args, 'memory_profile') and args.memory_profile:
1188
+ addr_bits = MEMORY_PROFILES[args.memory_profile]
1189
+ elif hasattr(args, 'addr_bits') and args.addr_bits is not None:
1190
+ addr_bits = args.addr_bits
1191
+ else:
1192
+ addr_bits = DEFAULT_ADDR_BITS
1193
+ mem_bytes = (1 << addr_bits) if addr_bits > 0 else 0
1194
+ return addr_bits, mem_bytes
1195
+
1196
+
1197
  def cmd_memory(args) -> None:
1198
+ addr_bits, mem_bytes = resolve_memory_config(args)
1199
+
1200
  print("=" * 60)
1201
  print(" BUILD MEMORY CIRCUITS")
1202
  print("=" * 60)
1203
+ print(f"\nMemory configuration:")
1204
+ print(f" Address bits: {addr_bits}")
1205
+ print(f" Memory bytes: {mem_bytes:,}")
1206
+ if addr_bits == 0:
1207
+ print(f" Mode: PURE ALU (no memory)")
1208
+ elif addr_bits <= 4:
1209
+ print(f" Mode: LLM registers")
1210
+ elif addr_bits <= 8:
1211
+ print(f" Mode: LLM scratchpad")
1212
+ elif addr_bits <= 12:
1213
+ print(f" Mode: Reduced CPU")
1214
+ else:
1215
+ print(f" Mode: Full CPU")
1216
+
1217
  print(f"\nLoading: {args.model}")
1218
  tensors = load_tensors(args.model)
1219
  print(f" Loaded {len(tensors)} tensors")
1220
+
1221
  print("\nDropping existing memory/control tensors...")
1222
  drop_prefixes(tensors, [
1223
  "memory.addr_decode.", "memory.read.", "memory.write.",
1224
  "control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
1225
  ])
1226
  print(f" Now {len(tensors)} tensors")
1227
+
1228
+ if addr_bits > 0:
1229
+ print("\nGenerating memory circuits...")
1230
+ add_decoder(tensors, addr_bits, mem_bytes)
1231
+ add_memory_read_mux(tensors, mem_bytes)
1232
+ add_memory_write_cells(tensors, mem_bytes)
1233
+ print(" Added decoder, read mux, write cells")
1234
+
1235
+ print("\nGenerating buffer gates...")
1236
+ try:
1237
+ add_fetch_load_store_buffers(tensors, addr_bits)
1238
+ print(" Added fetch/load/store/mem_addr buffers")
1239
+ except ValueError as e:
1240
+ print(f" Buffers already exist: {e}")
1241
+ else:
1242
+ print("\nSkipping memory circuits (addr_bits=0, pure ALU mode)")
1243
+
1244
  print("\nUpdating manifest...")
1245
+ update_manifest(tensors, addr_bits, mem_bytes)
1246
+ print(f" memory_bytes={mem_bytes:,}, pc_width={addr_bits}")
1247
+
1248
  if args.apply:
1249
  print(f"\nSaving: {args.model}")
1250
  save_file(tensors, str(args.model))
 
1254
  print(" Done.")
1255
  else:
1256
  print("\n[DRY-RUN] Use --apply to save.")
1257
+
1258
  print(f"\nTotal: {len(tensors)} tensors")
1259
+ mem_params = sum(t.numel() for k, t in tensors.items() if k.startswith("memory."))
1260
+ alu_params = sum(t.numel() for k, t in tensors.items() if not k.startswith("memory.") and not k.startswith("manifest."))
1261
+ print(f" Memory params: {mem_params:,}")
1262
+ print(f" ALU/Logic params: {alu_params:,}")
1263
  print("=" * 60)
1264
 
1265
 
 
1390
 
1391
 
1392
  def main() -> None:
1393
+ parser = argparse.ArgumentParser(
1394
+ description="Build tools for threshold computer safetensors",
1395
+ formatter_class=argparse.RawDescriptionHelpFormatter,
1396
+ epilog="""
1397
+ Memory Profiles:
1398
+ full 64KB (16-bit addr) - Full CPU mode
1399
+ reduced 4KB (12-bit addr) - Reduced CPU
1400
+ scratchpad 256B (8-bit addr) - LLM scratchpad
1401
+ registers 16B (4-bit addr) - LLM register file
1402
+ none 0B (no memory) - Pure ALU for LLM
1403
+
1404
+ Examples:
1405
+ python build.py memory --memory-profile none --apply # LLM-only (no RAM)
1406
+ python build.py memory --memory-profile scratchpad # 256-byte scratchpad
1407
+ python build.py memory --addr-bits 6 # Custom: 64 bytes
1408
+ python build.py memory # Default: 64KB
1409
+ """
1410
+ )
1411
  parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
1412
  parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
1413
  parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
1414
+
1415
+ mem_group = parser.add_mutually_exclusive_group()
1416
+ mem_group.add_argument(
1417
+ "--memory-profile", "-m",
1418
+ choices=list(MEMORY_PROFILES.keys()),
1419
+ help="Memory size profile (full/reduced/scratchpad/registers/none)"
1420
+ )
1421
+ mem_group.add_argument(
1422
+ "--addr-bits", "-a",
1423
+ type=int,
1424
+ choices=range(0, 17),
1425
+ metavar="N",
1426
+ help="Address bus width in bits (0-16). 0=no memory, 16=64KB"
1427
+ )
1428
+
1429
  subparsers = parser.add_subparsers(dest="command", help="Subcommands")
1430
+ subparsers.add_parser("memory", help="Generate memory circuits (size controlled by --memory-profile or --addr-bits)")
1431
  subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
1432
  subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
1433
  subparsers.add_parser("all", help="Run memory, alu, then inputs")
1434
+
1435
  args = parser.parse_args()
1436
+
1437
  if args.command == "memory":
1438
  cmd_memory(args)
1439
  elif args.command == "alu":