Add configurable memory partitioning for LLM integration

build.py:
- Add --memory-profile flag (full/reduced/scratchpad/registers/none)
- Add --addr-bits flag for custom address width (0-16)
- Memory functions now accept addr_bits/mem_bytes parameters
- Pure ALU mode (addr_bits=0) skips memory generation entirely
- Reports memory vs ALU param counts

README.md:
- Document configurable memory profiles
- Update ALU operations list (MUL, DIV, ROL, ROR)
- Add state tensor layout table for each profile
- Add build tool usage examples
- Update citation year to 2026

Files changed (2) hide show

README.md +50 -12
build.py +125 -42

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ Every logic gate is a threshold neuron: `output = 1 if (Σ wᵢxᵢ + b) ≥ 0 e
 ```
 Tensors:    11,581
-Parameters: 8,290,134
 ```
 ---
@@ -30,8 +30,8 @@ A complete 8-bit processor where every operation—from Boolean logic to arithme
 | Component | Specification |
 |-----------|---------------|
 | Registers | 4 × 8-bit general purpose |
-| Memory | 64KB addressable |
-| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, INC, DEC, CMP, NEG, PASS, ZERO, ONES, NOP) |
 | Flags | Zero, Negative, Carry, Overflow |
 | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
@@ -167,7 +167,7 @@ The weights in this repository implement a complete 8-bit computer: registers, A
 | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
 | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
 | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
-| Memory | 3 | 16-bit addr decoder, 65536x8 read mux, write cell update (packed) |
 ---
@@ -199,14 +199,22 @@ for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
 All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
 ```
-[ PC[16] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[16] | CTRL[4] | MEM[65536][8] ]
 ```
 Flags are ordered as: `Z, N, C, V`.
 Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
-Total state size: `524376` bits.
 ---
@@ -228,11 +236,9 @@ Interpretation:
 ## Verification
-The model includes `eval.py` which exhaustively tests all circuits:
 ```bash
 python eval.py
-# Output: Fitness: 1.000000
 ```
 ### Verification Status
@@ -288,6 +294,8 @@ All weights are integers. All activations are Heaviside step. Designed for:
 The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
 ### Core Thesis
 Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
@@ -477,12 +485,42 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
 | File | Description |
 |------|-------------|
-| `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters |
 | `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
 | `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
-| `build.py` | Build tools for memory, ALU, and .inputs tensors |
 | `prune_weights.py` | Weight magnitude pruning |
 ---
 ## Citation
@@ -491,7 +529,7 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
 @misc{8bit-threshold-computer,
   title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
   author={Norton, Charles},
-  year={2025},
   howpublished={Hugging Face},
   url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
 }

 ```
 Tensors:    11,581
+Parameters: 8,290,134 (full CPU) / 32,397 (pure ALU for LLM)
 ```
 ---
 | Component | Specification |
 |-----------|---------------|
 | Registers | 4 × 8-bit general purpose |
+| Memory | Configurable: 0B (pure ALU) to 64KB (full CPU) |
+| ALU | 16 operations (ADD, SUB, AND, OR, XOR, NOT, SHL, SHR, MUL, DIV, INC, DEC, NEG, ROL, ROR, CMP) |
 | Flags | Zero, Negative, Carry, Overflow |
 | Control | JMP, JZ, JNZ, JC, JNC, JN, JP, JV, JNV, CALL, RET, PUSH, POP |
 | Modular | 11 | Divisibility by 2-12 (multi-layer for non-powers-of-2) |
 | Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
 | Pattern | 10 | Popcount, leading/trailing ones, symmetry |
+| Memory | 3 | N-bit addr decoder, 2^N×8 read mux, write cells (configurable, packed) |
 ---
 All multi-bit fields are **MSB-first** (index 0 is the most-significant bit).
 ```
+[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
 ```
+Where N = address bits (configurable: 0-16).
 Flags are ordered as: `Z, N, C, V`.
 Control bits are ordered as: `HALT, MEM_WE, MEM_RE, RESERVED`.
+| Memory Profile | Addr Bits | Memory Size | State Bits |
+|----------------|-----------|-------------|------------|
+| Full CPU | 16 | 64KB | 524,376 |
+| Reduced | 12 | 4KB | 32,856 |
+| Scratchpad | 8 | 256B | 2,104 |
+| Registers | 4 | 16B | 184 |
+| Pure ALU | 0 | 0B | 56 |
 ---
 ## Verification
 ```bash
 python eval.py
+python threshold_cpu.py
 ```
 ### Verification Status
 The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.
+**For LLM integration, use `--memory-profile none` to generate a pure ALU model (~32K params) without memory circuits.**
 ### Core Thesis
 Standard LLMs fail at arithmetic because they're interpolators—they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.
 | File | Description |
 |------|-------------|
+| `neural_computer.safetensors` | 11,581 tensors, 8,290,134 parameters (full CPU) |
 | `threshold_cpu.py` | CPU state, reference cycle, threshold runtime |
 | `eval.py` | Unified evaluation suite (6,441 tests, GPU-batched) |
+| `build.py` | Build tools with configurable memory partitioning |
 | `prune_weights.py` | Weight magnitude pruning |
+### Build Tool Usage
+```bash
+# Full CPU (64KB memory, default)
+python build.py memory --apply
+# LLM integration profiles
+python build.py --memory-profile none memory --apply       # Pure ALU (32K params)
+python build.py --memory-profile registers memory --apply  # 16-byte register file
+python build.py --memory-profile scratchpad memory --apply # 256-byte scratchpad
+# Custom memory size
+python build.py --addr-bits 6 memory --apply  # 64 bytes (2^6)
+# Regenerate ALU and input metadata
+python build.py alu --apply
+python build.py inputs --apply
+python build.py all --apply  # memory + alu + inputs
+```
+Memory profiles:
+| Profile | Addr Bits | Memory | Memory Params | Total Params |
+|---------|-----------|--------|---------------|--------------|
+| `none` | 0 | 0B | 0 | ~32K |
+| `registers` | 4 | 16B | ~2K | ~34K |
+| `scratchpad` | 8 | 256B | ~30K | ~63K |
+| `reduced` | 12 | 4KB | ~516K | ~549K |
+| `full` | 16 | 64KB | ~8.26M | ~8.29M |
 ---
 ## Citation
 @misc{8bit-threshold-computer,
   title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
   author={Norton, Charles},
+  year={2026},
   howpublished={Hugging Face},
   url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
 }

build.py CHANGED Viewed

@@ -115,8 +115,16 @@ from safetensors.torch import save_file
 MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
 MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
-ADDR_BITS = 16
-MEM_BYTES = 1 << ADDR_BITS
 def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
@@ -172,21 +180,21 @@ def drop_prefixes(tensors: Dict[str, torch.Tensor], prefixes: List[str]) -> None
             del tensors[key]
-def add_decoder(tensors: Dict[str, torch.Tensor]) -> None:
-    weights = torch.empty((MEM_BYTES, ADDR_BITS), dtype=torch.float32)
-    bias = torch.empty((MEM_BYTES,), dtype=torch.float32)
-    for addr in range(MEM_BYTES):
-        bits = [(addr >> (ADDR_BITS - 1 - i)) & 1 for i in range(ADDR_BITS)]
         weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
         bias[addr] = -float(sum(bits))
     tensors["memory.addr_decode.weight"] = weights
     tensors["memory.addr_decode.bias"] = bias
-def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
-    and_weight = torch.ones((8, MEM_BYTES, 2), dtype=torch.float32)
-    and_bias = torch.full((8, MEM_BYTES), -2.0, dtype=torch.float32)
-    or_weight = torch.ones((8, MEM_BYTES), dtype=torch.float32)
     or_bias = torch.full((8,), -1.0, dtype=torch.float32)
     tensors["memory.read.and.weight"] = and_weight
     tensors["memory.read.and.bias"] = and_bias
@@ -194,17 +202,17 @@ def add_memory_read_mux(tensors: Dict[str, torch.Tensor]) -> None:
     tensors["memory.read.or.bias"] = or_bias
-def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
-    sel_weight = torch.ones((MEM_BYTES, 2), dtype=torch.float32)
-    sel_bias = torch.full((MEM_BYTES,), -2.0, dtype=torch.float32)
-    nsel_weight = torch.full((MEM_BYTES, 1), -1.0, dtype=torch.float32)
-    nsel_bias = torch.zeros((MEM_BYTES,), dtype=torch.float32)
-    and_old_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
-    and_old_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
-    and_new_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
-    and_new_bias = torch.full((MEM_BYTES, 8), -2.0, dtype=torch.float32)
-    or_weight = torch.ones((MEM_BYTES, 8, 2), dtype=torch.float32)
-    or_bias = torch.full((MEM_BYTES, 8), -1.0, dtype=torch.float32)
     tensors["memory.write.sel.weight"] = sel_weight
     tensors["memory.write.sel.bias"] = sel_bias
     tensors["memory.write.nsel.weight"] = nsel_weight
@@ -217,13 +225,13 @@ def add_memory_write_cells(tensors: Dict[str, torch.Tensor]) -> None:
     tensors["memory.write.or.bias"] = or_bias
-def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor]) -> None:
     for bit in range(16):
         add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
     for bit in range(8):
         add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
         add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
-    for bit in range(ADDR_BITS):
         add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
@@ -502,9 +510,9 @@ def add_comparators(tensors: Dict[str, torch.Tensor]) -> None:
     add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
-def update_manifest(tensors: Dict[str, torch.Tensor]) -> None:
-    tensors["manifest.memory_bytes"] = torch.tensor([float(MEM_BYTES)], dtype=torch.float32)
-    tensors["manifest.pc_width"] = torch.tensor([float(ADDR_BITS)], dtype=torch.float32)
     tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
@@ -1174,33 +1182,69 @@ def build_inputs(tensors: Dict[str, torch.Tensor]) -> tuple[Dict[str, torch.Tens
     return tensors, reg, stats
 def cmd_memory(args) -> None:
     print("=" * 60)
     print(" BUILD MEMORY CIRCUITS")
     print("=" * 60)
     print(f"\nLoading: {args.model}")
     tensors = load_tensors(args.model)
     print(f"  Loaded {len(tensors)} tensors")
     print("\nDropping existing memory/control tensors...")
     drop_prefixes(tensors, [
         "memory.addr_decode.", "memory.read.", "memory.write.",
         "control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
     ])
     print(f"  Now {len(tensors)} tensors")
-    print("\nGenerating memory circuits...")
-    add_decoder(tensors)
-    add_memory_read_mux(tensors)
-    add_memory_write_cells(tensors)
-    print("  Added decoder, read mux, write cells")
-    print("\nGenerating buffer gates...")
-    try:
-        add_fetch_load_store_buffers(tensors)
-        print("  Added fetch/load/store/mem_addr buffers")
-    except ValueError as e:
-        print(f"  Buffers already exist: {e}")
     print("\nUpdating manifest...")
-    update_manifest(tensors)
-    print(f"  memory_bytes={MEM_BYTES}, pc_width={ADDR_BITS}")
     if args.apply:
         print(f"\nSaving: {args.model}")
         save_file(tensors, str(args.model))
@@ -1210,7 +1254,12 @@ def cmd_memory(args) -> None:
         print("  Done.")
     else:
         print("\n[DRY-RUN] Use --apply to save.")
     print(f"\nTotal: {len(tensors)} tensors")
     print("=" * 60)
@@ -1341,16 +1390,50 @@ def cmd_all(args) -> None:
 def main() -> None:
-    parser = argparse.ArgumentParser(description="Build tools for threshold computer safetensors")
     parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
     parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
     parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
     subparsers = parser.add_subparsers(dest="command", help="Subcommands")
-    subparsers.add_parser("memory", help="Generate 64KB memory circuits")
     subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
     subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
     subparsers.add_parser("all", help="Run memory, alu, then inputs")
     args = parser.parse_args()
     if args.command == "memory":
         cmd_memory(args)
     elif args.command == "alu":

 MODEL_PATH = Path(__file__).resolve().parent / "neural_computer.safetensors"
 MANIFEST_PATH = Path(__file__).resolve().parent / "tensors.txt"
+DEFAULT_ADDR_BITS = 16
+DEFAULT_MEM_BYTES = 1 << DEFAULT_ADDR_BITS
+MEMORY_PROFILES = {
+    "full": 16,      # 64KB - full CPU mode
+    "reduced": 12,   # 4KB - reduced CPU
+    "scratchpad": 8, # 256 bytes - LLM scratchpad
+    "registers": 4,  # 16 bytes - LLM register file
+    "none": 0,       # Pure ALU, no memory
+}
 def load_tensors(path: Path) -> Dict[str, torch.Tensor]:
             del tensors[key]
+def add_decoder(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
+    weights = torch.empty((mem_bytes, addr_bits), dtype=torch.float32)
+    bias = torch.empty((mem_bytes,), dtype=torch.float32)
+    for addr in range(mem_bytes):
+        bits = [(addr >> (addr_bits - 1 - i)) & 1 for i in range(addr_bits)]
         weights[addr] = torch.tensor([1.0 if bit == 1 else -1.0 for bit in bits], dtype=torch.float32)
         bias[addr] = -float(sum(bits))
     tensors["memory.addr_decode.weight"] = weights
     tensors["memory.addr_decode.bias"] = bias
+def add_memory_read_mux(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
+    and_weight = torch.ones((8, mem_bytes, 2), dtype=torch.float32)
+    and_bias = torch.full((8, mem_bytes), -2.0, dtype=torch.float32)
+    or_weight = torch.ones((8, mem_bytes), dtype=torch.float32)
     or_bias = torch.full((8,), -1.0, dtype=torch.float32)
     tensors["memory.read.and.weight"] = and_weight
     tensors["memory.read.and.bias"] = and_bias
     tensors["memory.read.or.bias"] = or_bias
+def add_memory_write_cells(tensors: Dict[str, torch.Tensor], mem_bytes: int) -> None:
+    sel_weight = torch.ones((mem_bytes, 2), dtype=torch.float32)
+    sel_bias = torch.full((mem_bytes,), -2.0, dtype=torch.float32)
+    nsel_weight = torch.full((mem_bytes, 1), -1.0, dtype=torch.float32)
+    nsel_bias = torch.zeros((mem_bytes,), dtype=torch.float32)
+    and_old_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
+    and_old_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
+    and_new_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
+    and_new_bias = torch.full((mem_bytes, 8), -2.0, dtype=torch.float32)
+    or_weight = torch.ones((mem_bytes, 8, 2), dtype=torch.float32)
+    or_bias = torch.full((mem_bytes, 8), -1.0, dtype=torch.float32)
     tensors["memory.write.sel.weight"] = sel_weight
     tensors["memory.write.sel.bias"] = sel_bias
     tensors["memory.write.nsel.weight"] = nsel_weight
     tensors["memory.write.or.bias"] = or_bias
+def add_fetch_load_store_buffers(tensors: Dict[str, torch.Tensor], addr_bits: int) -> None:
     for bit in range(16):
         add_gate(tensors, f"control.fetch.ir.bit{bit}", [1.0], [-1.0])
     for bit in range(8):
         add_gate(tensors, f"control.load.bit{bit}", [1.0], [-1.0])
         add_gate(tensors, f"control.store.bit{bit}", [1.0], [-1.0])
+    for bit in range(addr_bits):
         add_gate(tensors, f"control.mem_addr.bit{bit}", [1.0], [-1.0])
     add_gate(tensors, "arithmetic.equality8bit.layer2", [1.0, 1.0], [-2.0])
+def update_manifest(tensors: Dict[str, torch.Tensor], addr_bits: int, mem_bytes: int) -> None:
+    tensors["manifest.memory_bytes"] = torch.tensor([float(mem_bytes)], dtype=torch.float32)
+    tensors["manifest.pc_width"] = torch.tensor([float(addr_bits)], dtype=torch.float32)
     tensors["manifest.version"] = torch.tensor([3.0], dtype=torch.float32)
     return tensors, reg, stats
+def resolve_memory_config(args) -> tuple:
+    """Resolve memory configuration from args, returns (addr_bits, mem_bytes)."""
+    if hasattr(args, 'memory_profile') and args.memory_profile:
+        addr_bits = MEMORY_PROFILES[args.memory_profile]
+    elif hasattr(args, 'addr_bits') and args.addr_bits is not None:
+        addr_bits = args.addr_bits
+    else:
+        addr_bits = DEFAULT_ADDR_BITS
+    mem_bytes = (1 << addr_bits) if addr_bits > 0 else 0
+    return addr_bits, mem_bytes
 def cmd_memory(args) -> None:
+    addr_bits, mem_bytes = resolve_memory_config(args)
     print("=" * 60)
     print(" BUILD MEMORY CIRCUITS")
     print("=" * 60)
+    print(f"\nMemory configuration:")
+    print(f"  Address bits: {addr_bits}")
+    print(f"  Memory bytes: {mem_bytes:,}")
+    if addr_bits == 0:
+        print(f"  Mode: PURE ALU (no memory)")
+    elif addr_bits <= 4:
+        print(f"  Mode: LLM registers")
+    elif addr_bits <= 8:
+        print(f"  Mode: LLM scratchpad")
+    elif addr_bits <= 12:
+        print(f"  Mode: Reduced CPU")
+    else:
+        print(f"  Mode: Full CPU")
     print(f"\nLoading: {args.model}")
     tensors = load_tensors(args.model)
     print(f"  Loaded {len(tensors)} tensors")
     print("\nDropping existing memory/control tensors...")
     drop_prefixes(tensors, [
         "memory.addr_decode.", "memory.read.", "memory.write.",
         "control.fetch.ir.", "control.load.", "control.store.", "control.mem_addr.",
     ])
     print(f"  Now {len(tensors)} tensors")
+    if addr_bits > 0:
+        print("\nGenerating memory circuits...")
+        add_decoder(tensors, addr_bits, mem_bytes)
+        add_memory_read_mux(tensors, mem_bytes)
+        add_memory_write_cells(tensors, mem_bytes)
+        print("  Added decoder, read mux, write cells")
+        print("\nGenerating buffer gates...")
+        try:
+            add_fetch_load_store_buffers(tensors, addr_bits)
+            print("  Added fetch/load/store/mem_addr buffers")
+        except ValueError as e:
+            print(f"  Buffers already exist: {e}")
+    else:
+        print("\nSkipping memory circuits (addr_bits=0, pure ALU mode)")
     print("\nUpdating manifest...")
+    update_manifest(tensors, addr_bits, mem_bytes)
+    print(f"  memory_bytes={mem_bytes:,}, pc_width={addr_bits}")
     if args.apply:
         print(f"\nSaving: {args.model}")
         save_file(tensors, str(args.model))
         print("  Done.")
     else:
         print("\n[DRY-RUN] Use --apply to save.")
     print(f"\nTotal: {len(tensors)} tensors")
+    mem_params = sum(t.numel() for k, t in tensors.items() if k.startswith("memory."))
+    alu_params = sum(t.numel() for k, t in tensors.items() if not k.startswith("memory.") and not k.startswith("manifest."))
+    print(f"  Memory params: {mem_params:,}")
+    print(f"  ALU/Logic params: {alu_params:,}")
     print("=" * 60)
 def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Build tools for threshold computer safetensors",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Memory Profiles:
+  full        64KB (16-bit addr) - Full CPU mode
+  reduced     4KB  (12-bit addr) - Reduced CPU
+  scratchpad  256B (8-bit addr)  - LLM scratchpad
+  registers   16B  (4-bit addr)  - LLM register file
+  none        0B   (no memory)   - Pure ALU for LLM
+Examples:
+  python build.py memory --memory-profile none --apply    # LLM-only (no RAM)
+  python build.py memory --memory-profile scratchpad     # 256-byte scratchpad
+  python build.py memory --addr-bits 6                   # Custom: 64 bytes
+  python build.py memory                                 # Default: 64KB
+"""
+    )
     parser.add_argument("--model", type=Path, default=MODEL_PATH, help="Model path")
     parser.add_argument("--apply", action="store_true", help="Apply changes (default: dry-run)")
     parser.add_argument("--manifest", action="store_true", help="Write tensors.txt manifest (memory only)")
+    mem_group = parser.add_mutually_exclusive_group()
+    mem_group.add_argument(
+        "--memory-profile", "-m",
+        choices=list(MEMORY_PROFILES.keys()),
+        help="Memory size profile (full/reduced/scratchpad/registers/none)"
+    )
+    mem_group.add_argument(
+        "--addr-bits", "-a",
+        type=int,
+        choices=range(0, 17),
+        metavar="N",
+        help="Address bus width in bits (0-16). 0=no memory, 16=64KB"
+    )
     subparsers = parser.add_subparsers(dest="command", help="Subcommands")
+    subparsers.add_parser("memory", help="Generate memory circuits (size controlled by --memory-profile or --addr-bits)")
     subparsers.add_parser("alu", help="Generate ALU extension circuits (SHL, SHR, comparators)")
     subparsers.add_parser("inputs", help="Add .inputs metadata tensors")
     subparsers.add_parser("all", help="Run memory, alu, then inputs")
     args = parser.parse_args()
     if args.command == "memory":
         cmd_memory(args)
     elif args.command == "alu":