phanerozoic commited on
Commit
272cd6a
Β·
verified Β·
1 Parent(s): dc52122

Roadmap: Threshold Logic Neural Turing Machine with 64KB memory + LLM integration

Browse files
Files changed (1) hide show
  1. todo.md +129 -82
todo.md CHANGED
@@ -1,101 +1,148 @@
1
- # Self-Contained Tensor CPU Roadmap
2
 
3
  ## Vision
4
- A fully self-contained CPU where:
5
- - All computation is threshold circuits (frozen weights)
6
- - Memory is a tensor partition (data flows through)
7
- - Stepper logic is encoded as circuits (no external orchestration)
8
- - One forward pass = one clock tick
9
 
10
- ## Architecture
 
 
 
 
 
 
11
 
12
  ```
13
- Input State Tensor:
14
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
15
- β”‚ PC [8] β”‚ Regs [32] β”‚ Flags β”‚ Memory [NΓ—8] β”‚
16
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
17
- ↓
18
- Threshold Circuits
19
- (fetch/decode/execute)
20
- ↓
21
- Output State Tensor:
22
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
23
- β”‚ PC' [8] β”‚ Regs' [32]β”‚ Flags' β”‚ Memory' [NΓ—8] β”‚
24
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 
 
 
 
 
 
 
 
 
25
  ```
26
 
27
- ## Phase 1: Memory Infrastructure
28
 
29
- | Component | Description | Status |
30
- |-----------|-------------|--------|
31
- | Memory Address Decoder | 8-bit address β†’ 256 one-hot select | Pending |
32
- | Memory Read MUX | 256-to-1 mux, select byte by address | Pending |
33
- | Memory Write Demux | Route write data to addressed location | Pending |
34
- | Memory Cell Logic | Conditional update: new or keep old | Pending |
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Phase 2: Instruction Fetch
 
 
 
 
 
37
 
38
- | Component | Description | Status |
39
- |-----------|-------------|--------|
40
- | PC β†’ Memory Read | Fetch instruction at PC address | Pending |
41
- | Instruction Split | Separate opcode from operands | Pending |
42
- | Operand Decode | Extract src/dst register indices | Pending |
43
 
44
- ## Phase 3: Execute Cycle
 
 
 
 
 
 
45
 
46
- | Component | Description | Status |
47
- |-----------|-------------|--------|
48
- | Register Read MUX | Select source register(s) | Done (regmux4to1) |
49
- | ALU Dispatch | Route to correct operation circuit | Pending |
50
- | Result MUX | Select ALU output | Pending |
51
- | Writeback Logic | Route result to register or memory | Pending |
52
- | PC Update | Increment or load jump target | Done (pc_inc, pc_load) |
53
 
54
- ## Phase 4: Full Integration
55
 
56
  | Component | Description | Status |
57
  |-----------|-------------|--------|
58
- | State Packer | Combine all outputs into state tensor | Pending |
59
- | State Unpacker | Split input state into components | Pending |
60
- | Single-Pass Execute | One forward pass = one instruction | Pending |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ## Completed Building Blocks
63
 
64
- These circuits are ready to use:
65
-
66
- ### Arithmetic
67
- - NEG (76 tensors)
68
- - SUB (162 tensors)
69
- - ADC (144 tensors)
70
- - SBC (160 tensors)
71
- - DIV (1984 tensors)
72
- - ADD, MUL (from original model)
73
-
74
- ### Comparison & Logic
75
- - CMP (168 tensors)
76
- - ASR, ROL, ROR (62 tensors total)
77
- - All boolean gates (from original model)
78
-
79
- ### Control
80
- - NOP (24 tensors)
81
- - HALT (42 tensors)
82
- - PC Incrementer (62 tensors)
83
- - PC Load MUX (50 tensors)
84
- - Instruction Decoder (44 tensors)
85
- - Register File MUX (84 tensors)
86
- - Conditional jumps (from original model)
87
-
88
- ## Memory Size Options
89
-
90
- | Size | Bytes | Bit-Tensors | Use Case |
91
- |------|-------|-------------|----------|
92
- | Tiny | 256 | ~2K | Proof of concept |
93
- | Small | 4KB | ~32K | Simple programs |
94
- | Medium | 64KB | ~512K | Full 8-bit address space |
95
-
96
- ## Notes
97
-
98
- - Memory is DATA flowing through, not stored in weights
99
- - Weights remain frozen - only input/output tensors change
100
- - "Stepper" = calling forward() repeatedly
101
- - No Python logic in the loop - just tensor→forward→tensor
 
 
 
1
+ # Threshold Logic Neural Turing Machine
2
 
3
  ## Vision
 
 
 
 
 
4
 
5
+ A verified computational coprocessor embedded in transformer architecture:
6
+ - **Frozen circuits**: Exhaustively tested threshold logic (can't compute wrong)
7
+ - **ACT execution**: Runs until HALT within single forward pass
8
+ - **Dual memory**: Hidden state integration + dedicated 64KB address space
9
+ - **LLM integration**: Router/Extract/Inject learned, computation exact
10
+
11
+ ## Architecture Overview
12
 
13
  ```
14
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
15
+ β”‚ Transformer Layer β”‚
16
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
17
+ β”‚ β”‚ Attention β”‚ β”‚ MLP β”‚ β”‚ ThresholdCPU β”‚ β”‚
18
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ (ACT-style) β”‚ β”‚
19
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
20
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
21
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Router (learn)β”‚ β”‚ β”‚
22
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚
23
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Extract (learnβ”‚ β”‚ β”‚
24
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚
25
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚CPU (frozen) β”‚ β”‚ β”‚
26
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ↻ until HALT β”‚ β”‚ β”‚
27
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚
28
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚Inject (learn)β”‚ β”‚ β”‚
29
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
30
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
31
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
32
+ β”‚ ↓ β”‚
33
+ β”‚ Residual + CPU State β”‚
34
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
35
  ```
36
 
37
+ ## Memory Architecture
38
 
39
+ ### Hidden State Integration (Hot Memory)
40
+ Reserve dimensions of residual stream for CPU state:
41
+ ```
42
+ dims 0-511: CPU memory (512 bits = 64 bytes hot cache)
43
+ dims 512-543: Registers (32 bits = 4 Γ— 8-bit)
44
+ dims 544-551: PC (8 bits)
45
+ dims 552-555: Flags (4 bits: Z, N, C, V)
46
+ dims 556-559: Control (halt, interrupt, etc.)
47
+ dims 560-959: Normal embeddings (400 dims)
48
+ ```
49
+
50
+ ### Dedicated Memory Bank (Cold Storage)
51
+ Full 64KB addressable memory via routing circuits:
52
+ ```
53
+ Address space: 0x0000 - 0xFFFF (65,536 bytes)
54
+ Tensors: ~1.6M (routing overhead)
55
+ Access: Via 16-bit address decoder + mux/demux
56
+ ```
57
 
58
+ ### Memory Hierarchy
59
+ | Level | Size | Access | Use Case |
60
+ |-------|------|--------|----------|
61
+ | Registers | 4 Γ— 8-bit | Direct | Operands, accumulators |
62
+ | Hot cache | 64 bytes | Embedded in hidden state | Stack, scratch |
63
+ | Cold bank | 64KB | Circuit-routed | Programs, data, heap |
64
 
65
+ ## Phase 1: Memory Infrastructure
 
 
 
 
66
 
67
+ | Component | Description | Tensors | Status |
68
+ |-----------|-------------|---------|--------|
69
+ | Address Decoder 16-bit | 16-bit β†’ 65536 one-hot | ~65,600 | Pending |
70
+ | Memory Read MUX | 65536-to-1 Γ— 8 bits | ~524,288 | Pending |
71
+ | Memory Write Demux | Route to addressed byte | ~524,288 | Pending |
72
+ | Memory Cell Logic | Conditional update per byte | ~524,288 | Pending |
73
+ | Bank Controller | Page/bank switching | ~1,000 | Pending |
74
 
75
+ **Estimated Phase 1 total: ~1.64M tensors**
 
 
 
 
 
 
76
 
77
+ ## Phase 2: ACT Execution Engine
78
 
79
  | Component | Description | Status |
80
  |-----------|-------------|--------|
81
+ | Cycle Block | One fetch/decode/execute iteration | Pending |
82
+ | Halt Detector | HALT instruction β†’ stop signal | Pending |
83
+ | Cycle Counter | Track pondering steps | Pending |
84
+ | State Checkpointing | Save state for gradient flow | Pending |
85
+
86
+ ## Phase 3: LLM Integration Layers
87
+
88
+ | Component | Description | Trainable | Status |
89
+ |-----------|-------------|-----------|--------|
90
+ | Router | Detect computation need | Yes | Pending |
91
+ | State Extractor | Embeddings β†’ CPU state | Yes | Pending |
92
+ | State Injector | CPU state β†’ embedding delta | Yes | Pending |
93
+ | KV Cache Binding | CPU state persists with cache | No | Pending |
94
+
95
+ ## Phase 4: Instruction Set
96
+
97
+ | Category | Instructions | Status |
98
+ |----------|--------------|--------|
99
+ | Arithmetic | ADD, SUB, MUL, DIV, NEG, ADC, SBC | Done |
100
+ | Logic | AND, OR, XOR, NOT, shifts, rotates | Done |
101
+ | Compare | CMP (sets flags) | Done |
102
+ | Control | JMP, Jcc (conditional), CALL, RET | Partial |
103
+ | Memory | LOAD, STORE (8/16-bit addressing) | Pending |
104
+ | Stack | PUSH, POP | Partial |
105
+ | System | NOP, HALT | Done |
106
 
107
  ## Completed Building Blocks
108
 
109
+ ### Arithmetic Core (2,756 tensors)
110
+ - NEG: 76 tensors, 256/256 tests
111
+ - SUB: 162 tensors, 65536/65536 tests
112
+ - ADC: 144 tensors, 131072/131072 tests
113
+ - SBC: 160 tensors, 131072/131072 tests
114
+ - DIV: 1984 tensors, 65280/65280 tests
115
+ - CMP: 168 tensors, 65536/65536 tests
116
+ - ASR/ROL/ROR: 62 tensors total
117
+
118
+ ### Control Core (306 tensors)
119
+ - NOP: 24 tensors, 4096/4096 tests
120
+ - HALT: 42 tensors, 24576/24576 tests
121
+ - PC Incrementer: 62 tensors, 256/256 tests
122
+ - PC Load MUX: 50 tensors, 1536/1536 tests
123
+ - Register MUX: 84 tensors, 1036/1036 tests
124
+ - Instruction Decoder: 44 tensors, 16/16 tests
125
+
126
+ ### Original Model
127
+ - Boolean gates, adders, multiplier, comparators
128
+ - Threshold gates, pattern recognition
129
+ - Modular arithmetic, error detection
130
+ - ~3,100 tensors
131
+
132
+ **Current total: 6,184 tensors**
133
+ **Projected with 64KB memory: ~1.65M tensors**
134
+
135
+ ## Design Principles
136
+
137
+ 1. **Frozen correctness**: Circuit weights never change, exhaustively verified
138
+ 2. **Learned interface**: Router/Extract/Inject are trainable, CPU is not
139
+ 3. **Functional state**: Memory flows through as data, not mutated weights
140
+ 4. **Halting semantics**: HALT instruction terminates ACT loop
141
+ 5. **Composable**: Each circuit tested in isolation, composed at runtime
142
+
143
+ ## Key Insight
144
+
145
+ The LLM learns **when** to compute and **how** to format input/output.
146
+ The CPU defines **what** computation means - exactly, verifiably, always.
147
+
148
+ This is not a learned calculator. This is a proven calculator with a learned interface.