grapheneaffiliates commited on
Commit
18393bc
·
verified ·
1 Parent(s): 59aa889

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +66 -1350
README.md CHANGED
@@ -1,1350 +1,66 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- library_name: pytorch
6
- tags:
7
- - geometric-attention
8
- - h4-polytope
9
- - ternary-quantization
10
- - bitnet
11
- - cpu-inference
12
- - rag
13
- - coxeter-chambers
14
- - e8-lattice
15
- - lila-e8
16
- - project-olympus
17
- - smollm3
18
- - continuous-learning
19
- - percepta
20
- datasets:
21
- - roneneldan/TinyStories
22
- metrics:
23
- - perplexity
24
- pipeline_tag: text-generation
25
- ---
26
-
27
- # H4 Polytopic Attention
28
-
29
- **4D geometric attention with O(log t) queries via Coxeter chamber navigation and E8 lattice-indexed RAM**
30
-
31
- A transformer system that replaces standard softmax attention with 4D attention heads built on the H4 polytope (600-cell), using the E8 lattice as a memory backend. Includes a deterministic executor, a trainable hybrid architecture (frozen H4 geometry + learnable adapters), ternary quantization (BitNet b1.58), and a unified geometric RAG system where the same E8->H4 projection handles both document retrieval and attention. The golden ratio appears at every level of the architecture.
32
-
33
- **Author:** Timothy McGirl
34
- **Repository:** [grapheneaffiliate/h4-polytopic-attention](https://github.com/grapheneaffiliate/h4-polytopic-attention)
35
-
36
- ---
37
-
38
- ## Table of Contents
39
-
40
- 1. [Overview](#overview)
41
- 2. [Why H4?](#why-h4)
42
- 3. [Architecture](#architecture)
43
- 4. [The Seven Phases](#the-seven-phases)
44
- 5. [Mathematical Foundation](#mathematical-foundation)
45
- 6. [Directory Structure](#directory-structure)
46
- 7. [Installation](#installation)
47
- 8. [Usage](#usage)
48
- 9. [Instruction Set Architecture](#instruction-set-architecture)
49
- 10. [Benchmarks](#benchmarks)
50
- 11. [MCP Server Integration](#mcp-server-integration)
51
- 12. [API Reference](#api-reference)
52
- 13. [Theory Deep Dive](#theory-deep-dive)
53
- 14. [Autoresearch Results](#autoresearch-results)
54
- 15. [Citation](#citation)
55
-
56
- ---
57
-
58
- ## Overview
59
-
60
- Standard transformers use softmax attention with O(t) cost per query over t cached tokens. H4 Polytopic Attention replaces this with a geometric data structure --- the Coxeter chamber tree of the H4 reflection group --- that achieves O(log t) max-dot-product queries in 4D.
61
-
62
- The system functions as a complete virtual machine:
63
-
64
- - **Attention heads = RAM lookup** (4D, O(log t) via ChamberTree)
65
- - **FFN layers = ALU** (instruction decode + execute)
66
- - **Execution trace = token sequence** (each step is one token)
67
- - **E8 lattice = hierarchical memory** (8D Voronoi cell addressing)
68
-
69
- The key architectural insight is that memory addressing (8D, E8 lattice) and attention queries (4D, H4 chambers) are unified through the E8 -> H4 projection, which uses the Coxeter element eigenvalues cos(pi/5) = phi/2. They are not two separate systems bolted together --- they share the same golden-ratio geometry.
70
-
71
- ### What this IS
72
-
73
- A complete, working system --- not a research prototype. Phases 1-4 proved geometric attention works as a deterministic computer. Phase 5 made it trainable. Phase 6 made it ternary (1.58-bit weights, 17x compression). Phase 7 unified retrieval and generation into a single geometric RAG pipeline. An 8-hour overnight training run on CPU produced a 24M ternary parameter model that generates coherent English at perplexity 10.0, beating the published TinyStories-33M baseline at fewer parameters.
74
-
75
- ### What this is NOT
76
-
77
- This is not llama.cpp. Projects like llama.cpp and ollama take GPU-designed models and run them slowly on CPU. This architecture is **designed for CPU from the ground up**. The ChamberTree replaces the operation GPUs are best at (parallel matmul) with the operation CPUs are best at (branching tree traversal). Ternary weights replace float multiply-accumulate with integer add/subtract. The frozen geometric backbone means most of the model is static lookup tables, not learned weights. The model would actually run *slower* on a GPU because GPUs are bad at tree traversal. That's the inversion nobody else has.
78
-
79
- ---
80
-
81
- ## Why H4?
82
-
83
- The H4 reflection group is the largest finite reflection group in 4D, with |W(H4)| = 14,400 elements. Its associated polytope, the 600-cell, has 120 vertices on S3. This is the sweet spot for attention heads:
84
-
85
- | Dimension | Structure | Symmetries | Bits/query | Hull vertices |
86
- |-----------|-----------|------------|------------|---------------|
87
- | 2D (Percepta) | S1 circle | SO(2), continuous | ~1 | O(sqrt(t)) |
88
- | **4D (H4)** | **S3 3-sphere** | **W(H4) = 14,400** | **~2** | **14,400 chambers** |
89
- | 8D | E8 lattice | W(E8) = 696,729,600 | - | Memory backend |
90
-
91
- The 4D choice is optimal because:
92
-
93
- 1. **S3 has the Hopf fibration** (pi_3(S3) = Z), enabling hierarchical selection that S1 cannot express
94
- 2. **A single [f64; 4] fits in one 256-bit AVX2 register** --- zero wasted SIMD lanes
95
- 3. **H4 connects to E8** via the projection E8 -> H4, unifying attention with memory
96
- 4. **The golden ratio phi = (1+sqrt(5))/2** appears in every vertex coordinate, connecting the geometry to Fibonacci-spaced checkpoints and phi-recursive state encoding
97
-
98
- ---
99
-
100
- ## Architecture
101
-
102
- ```
103
- Program (ISA instructions)
104
- |
105
- v
106
- +-------------------+
107
- | State Encoder | Encode (IP, registers, opcode, step)
108
- | (d_model = 32) | as a d_model-dimensional vector
109
- +-------------------+
110
- |
111
- d_model vector
112
- |
113
- +---------------+---------------+
114
- | |
115
- v v
116
- +-------------------+ +--------------------+
117
- | H4 Attention Heads | | E8 Lattice Memory |
118
- | (4D per head) | | (8D Voronoi cells) |
119
- | ChamberTree lookup | | 240 kissing nbrs |
120
- | O(log t) queries | | O(1) bucket decode |
121
- +-------------------+ +--------------------+
122
- | cos(pi/5) |
123
- | = phi/2 projection |
124
- +---------------+---------------+
125
- |
126
- v
127
- +-------------------+
128
- | FFN Layers | Instruction decode + ALU
129
- | (opcode -> op) | (ADD, SUB, MUL, STORE, ...)
130
- +-------------------+
131
- |
132
- v
133
- Next execution state
134
- ```
135
-
136
- ### Data flow for a single execution step
137
-
138
- 1. **Encode** the current state (instruction pointer, register file, opcode, operands, step counter) as a d_model-dimensional vector using golden-angle spirals on S3
139
- 2. **Attention** queries look back at the execution trace via H4 ChamberTree (O(log t) per head)
140
- 3. **E8 memory** operations (STORE_MEM / LOAD_MEM) use Voronoi cell bucketing with 240-neighbor shell traversal
141
- 4. **FFN** decodes the opcode and executes the instruction (ADD, SUB, MUL, etc.)
142
- 5. **Append** the state vector to the execution trace and repeat
143
-
144
- ---
145
-
146
- ## The Seven Phases
147
-
148
- ### Phase 1: H4 Geometry and Attention (Python PoC + Rust)
149
-
150
- The foundation: 600-cell vertex generation, Coxeter chamber navigation, O(log t) max-dot-product queries via hierarchical bucketing.
151
-
152
- **Key files:** `h4_polytopic_attention.py`, `h4.rs`, `vec4.rs`, `chamber_tree.rs`, `attention.rs`
153
-
154
- **What was proven:** 4D attention heads are quadratically more expressive than 2D heads, and the ChamberTree gives O(log t) queries with (5/16)^3 ~ 3% scan ratio at 3 levels.
155
-
156
- ### Phase 2: Weight Compiler
157
-
158
- Analytical construction of transformer weights that execute programs. No training --- weights are computed directly from the H4 geometry and the instruction set.
159
-
160
- **Key file:** `weight_compiler.py`
161
-
162
- **ISA:** LOAD, ADD, SUB, MUL, STORE, JMP, JNZ, HALT
163
-
164
- **Head allocation (8 heads):**
165
- - Heads 0-1: Instruction pointer lookup (find matching IP in history)
166
- - Heads 2-3: Register file access (find register state)
167
- - Heads 4-5: Operand fetch (fetch operand values from trace)
168
- - Heads 6-7: Control flow (branch prediction via Coxeter chamber)
169
-
170
- ### Phase 3: MCP Server + Hybrid LLM
171
-
172
- Exposes the executor as an MCP server for Claude Code. Claude handles reasoning; the H4 executor handles exact computation. Uses Max plan OAuth --- zero extra cost.
173
-
174
- **Key files:** `h4_mcp_server.py`, `hybrid_llm.py`
175
-
176
- **Tools exposed:** `h4_fibonacci`, `h4_compile_and_run`, `h4_geometry_info`, `h4_benchmark`, `h4_lattice_memory`
177
-
178
- ### Phase 4: E8 Lattice Memory
179
-
180
- The full implementation of lattice-indexed RAM. Memory operations use E8 Voronoi cell decoding for O(1) bucket addressing, with the E8 -> H4 projection unifying memory access and attention geometry.
181
-
182
- **Key files (Rust):** `vec8.rs`, `e8_lattice.rs`, `lattice_memory.rs`
183
- **Key files (Python):** Upgraded `E8LatticeIndex` in `h4_polytopic_attention.py`, `STORE_MEM`/`LOAD_MEM` in `weight_compiler.py`
184
-
185
- **New ISA opcodes:** STORE_MEM (R[a] -> E8 cell at addr R[b]), LOAD_MEM (E8 cell at addr R[a] -> R[dest])
186
-
187
- ### Phase 5: Trainable Hybrid Attention (PyTorch)
188
-
189
- Extends the frozen-backbone architecture to trainable token sequence modeling. The frozen H4 geometry provides spatial partitioning; learned adapters handle projection and weighting.
190
-
191
- **Key files:**
192
- - `h4_hybrid_attention.py` --- H4AttentionLayer (drop-in attention replacement) + H4TransformerBlock
193
- - `h4_language_model.py` --- Full LM: token embedding + golden-angle positional encoding + N x H4TransformerBlock + LM head
194
- - `train_cpu.py` --- Autoresearch training script (2-min CPU budget per experiment)
195
- - `benchmark_h4_vs_softmax.py` --- Speed/quality comparison at various context lengths
196
- - `utils/phi_positional.py` --- Golden-angle positional encoding using phi-inverse spacing
197
- - `utils/chamber_index.py` --- PyTorch-compatible ChamberTree bridge for top-k candidate filtering
198
-
199
- **Architecture (per H4AttentionLayer):**
200
-
201
- | Component | Type | Description |
202
- |-----------|------|-------------|
203
- | 600-cell vertices (120 x 4) | Frozen buffer | H4 polytope geometry |
204
- | H4 simple roots (4 x 4) | Frozen buffer | Coxeter reflection hyperplanes |
205
- | E8->H4 projection (4 x 8) | Frozen buffer | Golden-ratio eigenvalue projection |
206
- | W_q_proj, W_k_proj | Trainable | Project d_model -> H4 query/key space (R^4 per head) |
207
- | W_v_proj | Trainable | Project d_model -> value space |
208
- | W_nudge (n_heads x 4 x 4) | Trainable | Per-head query rotation in H4 space |
209
- | chamber_bonus (n_heads x 16) | Trainable | Per-head, per-chamber attention bias on keys |
210
- | W_out | Trainable | Output projection back to d_model |
211
-
212
- **What was proven:**
213
- - All gradients flow through trainable components (nudge, projections, chamber bonus)
214
- - W_nudge dominant directions align 96.5% with 600-cell vertices after training --- geometry attracts learning
215
- - Chamber entropy stays high (2.33/2.77 max) --- model uses the full geometric partition, not collapsing
216
- - ChamberTree scan ratio scales logarithmically: 43.6% at T=128, 3.1% at T=2048 (halves per doubling)
217
- - Python ChamberTree has high constant factors; Rust implementation needed for wall-clock advantage
218
-
219
- **Autoresearch loop** (`h4_program.md`): Autonomous experiment protocol where Claude Code iterates on the trainable adapters while the frozen geometry remains fixed. 2-minute CPU budget per experiment, ~24 experiments per overnight run.
220
-
221
- ### Phase 6: BitNet b1.58 Integration (Ternary Weights)
222
-
223
- Quantizes all trainable projections to ternary {-1, 0, +1} via BitNet b1.58's absmean method. The frozen geometry stays float32 (static lookup tables). Forward pass uses straight-through estimator (STE) for gradient flow.
224
-
225
- **Key files:**
226
- - `bitlinear.py` --- BitLinear drop-in replacement for nn.Linear with STE training
227
- - `ternary_diagnostics.py` --- Chamber preservation tests, weight structure analysis, size comparison
228
- - `export_ternary.py` --- Export trained model with frozen ternary weights for deployment
229
-
230
- **What stays float32:** 600-cell vertices, simple roots, E8 projection (frozen buffers), chamber_bonus (too small to quantize), embeddings, layer norms, LM head.
231
-
232
- **What becomes ternary:** W_q_proj, W_k_proj, W_v_proj, W_out (attention projections), FFN layers (the bulk of parameters).
233
-
234
- **What was proven (initial verification):**
235
- - **Chamber preservation 97.9% at initialization** --- ternary barely perturbs near-identity nudge weights
236
- - **Geo alignment 96.7%** --- unchanged from float (0.965 vs 0.967), geometry survives ternary
237
- - **STE gradients verified** --- all trainable parameters receive healthy gradients through the quantization barrier
238
-
239
- **What was proven (autoresearch, 30 experiments):**
240
- - **0.003 bpb gap** --- val_bpb 0.065 (ternary, d_model=256) vs 0.062 (float, d_model=128) after autonomous hyperparameter search
241
- - **~17x compression** --- trainable weights: ~310 KB ternary vs ~1.4 MB float32
242
- - **BitNet 2x-width scaling law confirmed** --- doubling d_model from 128 to 256 closed the gap from 0.025 to 0.003
243
- - **Chamber preservation 76.2% after training** --- ternary model finds its own geometric routing, different from but equally effective as float
244
- - **LR cliff at 70% chamber preservation** --- below this threshold, routing becomes too noisy and quality degrades
245
- - **Dropout=0 is optimal** --- frozen geometric backbone acts as the regularizer
246
-
247
- **Inference path after ternary:** Token embeddings -> ternary projections (add/sub only) -> S3 normalize -> ternary nudge -> ChamberTree (sign comparisons, 3.1% scan) -> softmax over candidates -> ternary FFN (add/sub only) -> next token. Only float multiplies: root dot products (4x4) and softmax.
248
-
249
- ### Phase 7: Unified Geometric RAG (Retrieval + Ranking)
250
-
251
- The same E8->H4 projection that routes attention also indexes and retrieves documents. The H4 bi-encoder handles geometric retrieval (R@5=100%); a pre-trained cross-encoder handles precision reranking (R@1=98.5%). Combined: **98.5% accuracy on document search, no GPU, no API, $0/month.**
252
-
253
- **Key files:**
254
- - `rag/encoder.py` --- Encode documents into E8 lattice memory via golden-angle spiral embeddings
255
- - `rag/pipeline.py` --- End-to-end QA pipeline (retrieve + generate), CPU only
256
- - `rag/ranking_model.py` --- H4 bi-encoder: score (question, passage) in H4 geometric space
257
- - `rag/train_ranker.py` --- Bi-encoder contrastive training (InfoNCE) on SQuAD
258
- - `rag/cross_encoder.py` --- H4 cross-encoder reranker (joint question+passage attention)
259
- - `rag/train_cross_encoder.py` --- Cross-encoder fine-tuning with LM backbone
260
- - `rag/eval_rerankers.py` --- Head-to-head: H4 vs pre-trained MiniLM reranker
261
- - `rag/tokenizer.py` --- BPE tokenizer (tiktoken GPT-2 base, restricted vocab)
262
- - `rag/demo.py` --- Interactive CLI demo: point at documents, ask questions
263
- - `rag/cost_benchmark.py` --- H4 CPU vs GPU vs API cost comparison
264
-
265
- **How the production system works:**
266
- 1. Documents encode into 8D E8 Voronoi cells via `H4DocumentEncoder`
267
- 2. Questions project through E8->H4 (cos(pi/5) = phi/2) for geometric retrieval
268
- 3. H4 bi-encoder retrieves top-5 candidates --- **R@5 = 100%, 20ms** (the answer is always in the results)
269
- 4. Pre-trained cross-encoder (MiniLM-L6) reranks top-5 --- **R@1 = 98.5%, ~500ms**
270
- 5. Best candidate returned with source attribution
271
-
272
- **Head-to-head reranker comparison (same candidates from H4 bi-encoder):**
273
-
274
- | Reranker | R@1 | Params | Notes |
275
- |----------|-----|--------|-------|
276
- | Random baseline | 20.0% | --- | Chance on 5 candidates |
277
- | H4 cross-encoder (overnight) | **80% peak** (69% final) | 25M ternary | 5.9K SQuAD pairs, 8h CPU |
278
- | **Pre-trained MiniLM-L6** | **98.5%** | **22M float** | **Trained on 500K+ MS MARCO pairs** |
279
-
280
- The H4 geometric retrieval does the hard, novel part (finding the right documents via E8 lattice + ChamberTree, O(log t), ternary, CPU). The pre-trained model does the easy, proven part (picking the best one from 5 candidates). Our contribution is the retrieval geometry; the reranking uses proven off-the-shelf technology.
281
-
282
- **Autoresearch findings (12 bi-encoder experiments):**
283
- - Temperature is the dominant hyperparameter for ternary contrastive learning (0.15 optimal, 2x float default)
284
- - Bi-encoder R@1 peaks at ~40% regardless of scale (architectural ceiling, not training ceiling)
285
- - Bi-encoder R@5 = 100% at 3.7M params --- perfect retrieval
286
- - E8 lattice retrieval: 7.8ms per query, 240-neighbor Voronoi search
287
-
288
- **Full-scale language generation (overnight, 8 hours CPU):**
289
- - **Perplexity 10.0** on TinyStories (24M ternary params, d_model=512, 8 layers)
290
- - Beats TinyStories-33M published baseline (~15 PPL) at fewer params with ternary weights
291
- - Generates coherent stories: *"Once upon a time, there was a lazy cat named Tom. Tom liked to sleep all day..."*
292
-
293
- ---
294
-
295
- ## Mathematical Foundation
296
-
297
- ### The Golden Ratio
298
-
299
- phi = (1 + sqrt(5)) / 2 = 1.6180339887...
300
-
301
- It appears at every level:
302
-
303
- | Where | How |
304
- |-------|-----|
305
- | 600-cell vertices | Coordinates contain phi/2 and 1/(2*phi) |
306
- | Coxeter eigenvalues | cos(pi/5) = phi/2, cos(2*pi/5) = 1/(2*phi) |
307
- | E8 -> H4 projection | 4x8 matrix built from cos(k*pi/5) rotation blocks |
308
- | State encoding | Golden-angle spiral for well-separated IP directions |
309
- | Checkpoint spacing | Fibonacci-indexed levels grow with base phi |
310
- | ChamberTree rotation | Level angles: 0, pi/5, pi/5 * phi |
311
- | Attention scaling | Queries scaled by phi: q_vec * phi |
312
-
313
- ### The 600-Cell
314
-
315
- 120 vertices on the unit 3-sphere S3, in three orbits:
316
-
317
- ```
318
- Orbit 1: 8 vertices — permutations of (+-1, 0, 0, 0)
319
- Orbit 2: 16 vertices — (+-1/2, +-1/2, +-1/2, +-1/2)
320
- Orbit 3: 96 vertices — even permutations of (0, +-1/2, +-phi/2, +-1/(2*phi))
321
- Total: 120 vertices
322
- ```
323
-
324
- The dot products between any two vertices take exactly 8 distinct values:
325
-
326
- ```
327
- {-1, -phi/2, -1/2, -1/(2*phi), 0, 1/(2*phi), 1/2, phi/2}
328
- ```
329
-
330
- These are exactly the cosines of multiples of pi/5, reflecting the pentagonal symmetry.
331
-
332
- ### The H4 Reflection Group
333
-
334
- The 4 simple roots of H4 define reflection hyperplanes that partition S3 into Coxeter chambers:
335
-
336
- ```
337
- alpha_1 = (1, -1, 0, 0) / sqrt(2)
338
- alpha_2 = (0, 1, -1, 0) / sqrt(2)
339
- alpha_3 = (0, 0, 1, 0)
340
- alpha_4 = (-1/2, -1/2, -1/2, (-1/(2*phi) + phi/2)) / norm
341
- ```
342
-
343
- The sign pattern of a vector's dot products with these 4 roots gives a 4-bit bucket index (0-15), partitioning S3 into 16 regions.
344
-
345
- ### ChamberTree: 3-Level Hierarchical Bucketing
346
-
347
- ```
348
- Level 0: 16 buckets (4 root splits, original roots)
349
- Level 1: 16 x 16 = 256 sub-buckets (roots rotated by pi/5)
350
- Level 2: 256 x 16 = 4,096 leaf buckets (roots rotated by pi/5 * phi)
351
- ```
352
-
353
- **Exact query:** visits all 16 buckets at each level (full scan)
354
- **Approximate query:** visits primary + 4 Hamming-1 neighbors = 5/16 per level
355
- - Over 3 levels: (5/16)^3 = 3.05% of keys scanned
356
- - Effective complexity: O(log t) for t cached entries
357
-
358
- ### The E8 Lattice
359
-
360
- The densest sphere packing in 8D (Viazovska 2016). Decomposes as:
361
-
362
- ```
363
- E8 = D8 ∪ (D8 + [1/2]^8)
364
-
365
- where D8 = { x in Z^8 : x_1 + x_2 + ... + x_8 = 0 (mod 2) }
366
- ```
367
-
368
- **Closest-lattice-point decoder:** Given any point in R8, find the nearest E8 lattice point in O(1):
369
- 1. Round to nearest D8 point (integers with even sum, parity correction)
370
- 2. Round to nearest D8 + [1/2]^8 point (half-integers with even sum)
371
- 3. Return whichever is closer
372
-
373
- **Kissing number = 240:** Each lattice point has exactly 240 nearest neighbors, in two orbits:
374
- - 112 vectors: +-e_i +- e_j for i < j (pairs of unit vectors)
375
- - 128 vectors: (+-1/2)^8 with even number of minus signs
376
-
377
- ### E8 -> H4 Projection
378
-
379
- The 4x8 projection matrix uses rotation blocks built from the Coxeter element eigenvalues:
380
-
381
- ```
382
- [ cos(pi/5) sin(pi/5) cos(2pi/5) sin(2pi/5) 0 0 0 0 ]
383
- P = [-sin(pi/5) cos(pi/5) -sin(2pi/5) cos(2pi/5) 0 0 0 0 ]
384
- [ 0 0 0 0 cos(pi/5) sin(pi/5) cos(2pi/5) sin(2pi/5) ]
385
- [ 0 0 0 0 -sin(pi/5) cos(pi/5) -sin(2pi/5) cos(2pi/5) ]
386
-
387
- where cos(pi/5) = phi/2 = 0.80902...
388
- cos(2pi/5) = 1/(2*phi) = 0.30902...
389
- ```
390
-
391
- This is the same projection that connects E8 root systems to icosahedral symmetry in the GSM physics framework. It preserves the golden-ratio structure, ensuring that memory embeddings in 8D map cleanly to 4D attention queries.
392
-
393
- ### Phi-Recursive State Encoding
394
-
395
- Long execution traces are compressed using Fibonacci-indexed checkpoint levels:
396
-
397
- ```
398
- Level 0: stores every step (F_1 = 1 apart)
399
- Level 1: stores every phi steps (F_2 = 1 apart)
400
- Level 2: stores every phi^2 steps (F_3 = 2 apart)
401
- Level 3: stores every phi^3 steps (F_4 = 3 apart)
402
- ...
403
- Level k: stores every F_{k+1} steps
404
- ```
405
-
406
- **Storage:** O(t * log_phi(t)) instead of O(t^2)
407
- **Retrieval:** O(log_phi(t)) via Zeckendorf decomposition (every positive integer is a unique sum of non-consecutive Fibonacci numbers)
408
-
409
- ---
410
-
411
- ## Directory Structure
412
-
413
- ```
414
- h4-polytopic-attention/
415
- +-- README.md This file
416
- +-- RESULTS.md Autoresearch results (30 experiments)
417
- +-- h4_program.md Autonomous research protocol (Phases 5-7)
418
- +-- docs/
419
- | +-- PAPER.md Full arXiv paper draft
420
- | +-- ARCHITECTURE.md Detailed architecture guide
421
- | +-- h4_polytopic_attention_whitepaper.pdf Original whitepaper
422
- +-- python/
423
- | +-- h4_polytopic_attention.py Phase 1: Frozen geometry (600-cell, ChamberTree, E8)
424
- | +-- weight_compiler.py Phase 2: Analytical weights + H4Executor
425
- | +-- h4_mcp_server.py Phase 3: MCP server (5 tools)
426
- | +-- hybrid_llm.py Phase 3: Claude Agent SDK integration
427
- | +-- h4_hybrid_attention.py Phase 5: H4AttentionLayer + H4TransformerBlock
428
- | +-- h4_language_model.py Phase 5: Full LM architecture
429
- | +-- train_cpu.py Phase 5: Autoresearch training script (2-min CPU budget)
430
- | +-- benchmark_h4_vs_softmax.py Phase 5: Scaling comparison (Rust + Python + softmax)
431
- | +-- bitlinear.py Phase 6: BitLinear ternary {-1,0,+1} with STE
432
- | +-- ternary_diagnostics.py Phase 6: Chamber preservation + weight analysis
433
- | +-- export_ternary.py Phase 6: Export frozen ternary model
434
- | +-- prepare_data.py Data pipeline (synthetic, Shakespeare, TinyStories)
435
- | +-- baselines.py Softmax + linear attention baseline models
436
- | +-- compare_baselines.py Head-to-head comparison script
437
- | +-- utils/
438
- | +-- phi_positional.py Golden-angle positional encoding
439
- | +-- chamber_index.py ChamberTree bridge (Rust + Python fallback)
440
- | +-- rag/ Phase 7: Unified geometric RAG
441
- | +-- encoder.py Document encoding into E8 lattice memory
442
- | +-- pipeline.py End-to-end QA pipeline (retrieve + generate)
443
- | +-- ranking_model.py H4Ranker (contrastive scoring in H4 space)
444
- | +-- train_ranker.py Bi-encoder contrastive training on SQuAD
445
- | +-- cross_encoder.py Cross-encoder reranker (joint Q+P attention)
446
- | +-- train_cross_encoder.py Cross-encoder fine-tuning with LM backbone
447
- | +-- tokenizer.py BPE tokenizer (tiktoken, 4096 vocab)
448
- | +-- train_qa.py Generative QA training (F1 metric)
449
- | +-- prepare_qa.py SQuAD download + QA data preparation
450
- | +-- demo.py Interactive CLI demo
451
- | +-- cost_benchmark.py H4 CPU vs GPU vs API cost comparison
452
- +-- olympus/ Project Olympus: specialist system
453
- | +-- router.py Two-tier routing (100% accuracy)
454
- | +-- h4_swap.py Progressive H4 attention swap
455
- | +-- knowledge_index.py E8 lattice knowledge index (Wikipedia)
456
- | +-- train_specialist.py QLoRA training scaffold
457
- | +-- train_code_specialist.py Code specialist (training on GPU)
458
- | +-- train_math_specialist.py Math specialist (training on GPU)
459
- | +-- train_qa_specialist.py QA specialist (training on GPU)
460
- | +-- demo.py Interactive Olympus demo
461
- | +-- data/download_all.py Download all training data
462
- +-- PROJECT_OLYMPUS.md Full Olympus plan + legal audit
463
- +-- OLYMPUS_CONTINUOUS_LEARNING.md Self-improving system design
464
- +-- sample_docs/ Sample documents for RAG demo
465
- +-- checkpoints/ Saved model checkpoints
466
- +-- rust/
467
- | +-- Cargo.toml Dependencies: rayon, pyo3, numpy
468
- | +-- src/
469
- | +-- lib.rs PyO3 bridge: h4_rust Python module
470
- | +-- main.rs Benchmarks (Phases 1-4)
471
- | +-- vec4.rs 4D SIMD vector (AVX2-aligned)
472
- | +-- vec8.rs Phase 4: 8D E8 vector + projection
473
- | +-- h4.rs 600-cell generation + verification
474
- | +-- chamber_tree.rs 3-level ChamberTree with approx top-k
475
- | +-- attention.rs Multi-head H4 attention (rayon)
476
- | +-- e8_lattice.rs Phase 4: E8 decoder, 240 kissing vectors
477
- | +-- lattice_memory.rs Phase 4: Lattice-indexed RAM
478
- ```
479
-
480
- ---
481
-
482
- ## Installation
483
-
484
- ### Python
485
-
486
- ```bash
487
- # Clone the repository
488
- git clone https://github.com/grapheneaffiliate/h4-polytopic-attention.git
489
- cd h4-polytopic-attention
490
-
491
- # Install Python dependencies
492
- pip install numpy mcp
493
-
494
- # Phase 5 additionally requires PyTorch
495
- pip install torch
496
-
497
- # Run the proof-of-concept
498
- py python/h4_polytopic_attention.py
499
-
500
- # Run the weight compiler demo (Fibonacci)
501
- py python/weight_compiler.py
502
-
503
- # Run the Phase 5 training script (2-minute CPU budget)
504
- py python/train_cpu.py
505
-
506
- # Run the H4 vs softmax benchmark
507
- py python/benchmark_h4_vs_softmax.py
508
-
509
- # Run ternary diagnostics (Phase 6)
510
- py python/ternary_diagnostics.py
511
-
512
- # Train with ternary weights
513
- # Edit python/train_cpu.py: set USE_BITLINEAR = True
514
- py python/train_cpu.py
515
-
516
- # Phase 7: RAG — requires tiktoken for BPE tokenizer
517
- pip install tiktoken
518
-
519
- # Interactive RAG demo (point at any documents)
520
- py python/rag/demo.py --docs path/to/your/documents/
521
-
522
- # Train the passage ranker on SQuAD (10-minute CPU budget)
523
- py python/rag/train_ranker.py
524
-
525
- # Run cost benchmark (H4 CPU vs GPU vs API)
526
- py python/rag/cost_benchmark.py
527
- ```
528
-
529
- ### Rust
530
-
531
- ```bash
532
- cd rust
533
-
534
- # Build with optimizations (required for SIMD)
535
- cargo build --release
536
-
537
- # Run benchmarks (50k steps, ~2 minutes)
538
- cargo run --release
539
-
540
- # Run E8 lattice tests
541
- cargo test
542
-
543
- # Build the Python bridge (requires maturin)
544
- pip install maturin
545
- maturin develop --release
546
- # This installs h4_rust module — enables 10.6x speedup in Python benchmarks
547
- ```
548
-
549
- ### MCP Server (Claude Code Integration)
550
-
551
- Add to your Claude Code MCP settings (`~/.mcp.json` or settings):
552
-
553
- ```json
554
- {
555
- "mcpServers": {
556
- "h4-executor": {
557
- "command": "py",
558
- "args": ["C:/Users/atchi/h4-polytopic-attention/python/h4_mcp_server.py"]
559
- }
560
- }
561
- }
562
- ```
563
-
564
- After restarting Claude Code, you'll have access to `h4_fibonacci`, `h4_compile_and_run`, `h4_geometry_info`, `h4_benchmark`, and `h4_lattice_memory` tools.
565
-
566
- ---
567
-
568
- ## Usage
569
-
570
- ### Running Fibonacci through the H4 Executor (Python)
571
-
572
- ```python
573
- from weight_compiler import fibonacci_program, H4Executor
574
-
575
- prog = fibonacci_program(15) # Compute F(0)..F(16)
576
- executor = H4Executor(prog, d_model=32)
577
- result = executor.run(max_steps=200)
578
-
579
- print(f"F(16) = {int(result['registers'][1])}") # 987
580
- ```
581
-
582
- ### Using E8 Lattice Memory (Python)
583
-
584
- ```python
585
- from weight_compiler import Program, H4Executor
586
-
587
- prog = Program()
588
- prog.add("LOAD", a=42, dest=0) # R0 = 42
589
- prog.add("LOAD", a=100, dest=1) # R1 = 100 (address)
590
- prog.add("STORE_MEM", a=0, b=1) # mem[100] = 42 (via E8 lattice)
591
- prog.add("LOAD", a=0, dest=0) # R0 = 0 (clear)
592
- prog.add("LOAD_MEM", a=1, dest=0) # R0 = mem[100] (via E8 lookup)
593
- prog.add("HALT")
594
-
595
- executor = H4Executor(prog, d_model=32)
596
- result = executor.run(max_steps=50)
597
- print(f"R0 = {int(result['registers'][0])}") # 42
598
- print(f"Lattice stats: {result['lattice_memory']}")
599
- ```
600
-
601
- ### Querying the E8 Lattice Directly (Python)
602
-
603
- ```python
604
- from h4_polytopic_attention import E8LatticeIndex
605
- import numpy as np
606
-
607
- lattice = E8LatticeIndex()
608
-
609
- # Store 1000 values
610
- for i in range(1000):
611
- emb = np.random.randn(8)
612
- lattice.insert(emb, value=float(i), address=i)
613
-
614
- # Query: find nearest stored embedding
615
- query = np.random.randn(8)
616
- results = lattice.query_nearest(query, k=5)
617
- for dist, val, addr in results:
618
- print(f" dist^2={dist:.4f}, value={val}, addr={addr}")
619
-
620
- # Stats
621
- print(lattice.stats())
622
- # {'total_entries': 1000, 'occupied_cells': 412, 'utilization': 0.412, ...}
623
- ```
624
-
625
- ### Using the MCP Tools (Claude Code)
626
-
627
- Once the MCP server is configured, you can call tools directly:
628
-
629
- ```
630
- > Compute F(20) using the H4 executor
631
-
632
- Claude calls h4_fibonacci(n=20) -> returns F(21) = 10946, correct,
633
- computed in 147 steps through the 4D H4 attention transformer.
634
-
635
- > Run a program that stores 42 to memory address 7, then loads it back
636
-
637
- Claude calls h4_compile_and_run with STORE_MEM/LOAD_MEM instructions,
638
- returns registers showing the value was stored and retrieved via E8 lattice.
639
-
640
- > Show me the E8 lattice memory stats
641
-
642
- Claude calls h4_lattice_memory(action="info") -> returns kissing number 240,
643
- projection eigenvalues cos(pi/5) = phi/2, Voronoi cell structure.
644
- ```
645
-
646
- ### Using the H4 Hybrid Attention Layer (Phase 5, PyTorch)
647
-
648
- ```python
649
- import torch
650
- from h4_hybrid_attention import H4AttentionLayer, H4TransformerBlock
651
-
652
- # Drop-in attention layer
653
- layer = H4AttentionLayer(d_model=64, n_heads=8, d_value=16, top_k=32)
654
- x = torch.randn(2, 128, 64) # (batch, seq_len, d_model)
655
-
656
- # Full attention (for short sequences)
657
- out = layer(x, use_tree=False)
658
-
659
- # Tree-accelerated attention (O(log t) candidate filtering)
660
- out = layer(x, use_tree=True)
661
-
662
- # With geometric diagnostics
663
- out, diag = layer(x, use_tree=False, return_diagnostics=True)
664
- print(f"Chamber entropy: {diag['chamber_entropy']:.3f}")
665
- print(f"Nudge rank (per head): {diag['nudge_rank']}")
666
- print(f"Geo alignment (per head): {diag['geo_alignment']}")
667
- print(f"Scan ratio: {diag['scan_ratio']:.4f}")
668
- ```
669
-
670
- ### Training an H4 Language Model (Phase 5)
671
-
672
- ```python
673
- from h4_language_model import H4LanguageModel
674
-
675
- model = H4LanguageModel(
676
- vocab_size=256, # character-level
677
- d_model=64,
678
- n_heads=8,
679
- n_layers=4,
680
- d_value=16,
681
- )
682
- print(model.count_params())
683
- # {'trainable': 116480, 'frozen': 0, 'buffers': 525401, 'total': 116480}
684
-
685
- # Forward pass
686
- input_ids = torch.randint(0, 256, (4, 128))
687
- logits = model(input_ids) # (4, 128, 256)
688
-
689
- # Autoregressive generation
690
- seed = torch.randint(0, 256, (1, 4))
691
- generated = model.generate(seed, max_new_tokens=100, temperature=0.8)
692
- ```
693
-
694
- ### Running the Autoresearch Loop (Phase 5)
695
-
696
- The `h4_program.md` defines an autonomous experiment protocol. Claude Code reads it and iterates:
697
-
698
- ```bash
699
- # Single experiment (2-minute budget)
700
- py python/train_cpu.py
701
-
702
- # Output includes parseable summary:
703
- # ---
704
- # val_bpb: 2.939237
705
- # chamber_entropy: 2.3290
706
- # avg_geo_alignment: 0.9652
707
- # num_params: 109760
708
- ```
709
-
710
- ### Geometric RAG Demo (Phase 7)
711
-
712
- ```bash
713
- # Interactive: point at documents, ask questions, get ranked passages
714
- py python/rag/demo.py --docs sample_docs/
715
-
716
- # Example output:
717
- # > What is the golden ratio?
718
- #
719
- # Rank 1 (score: 0.92): "The golden ratio, often denoted by phi..."
720
- # Source: golden_ratio.txt, chunk 0
721
- #
722
- # Retrieval: 7.8ms | Ranking: 12ms | Total: 20ms
723
- ```
724
-
725
- ```python
726
- # Programmatic usage
727
- from rag.pipeline import H4RAGPipeline
728
-
729
- pipeline = H4RAGPipeline(vocab_size=4096, stoi=stoi, itos=itos,
730
- d_model=128, n_layers=2, use_bitlinear=True)
731
- pipeline.index_directory('path/to/docs/')
732
- result = pipeline.answer("What is the golden ratio?")
733
- print(result.answer, result.sources, f"{result.total_time_ms:.0f}ms")
734
- ```
735
-
736
- ### Running Baseline Comparisons
737
-
738
- ```bash
739
- # Head-to-head: H4 vs softmax vs linear attention on Shakespeare
740
- py python/compare_baselines.py
741
-
742
- # Downloads Shakespeare automatically, trains all 4 configs (H4 float,
743
- # H4 ternary, softmax, linear) with identical model size and time budget,
744
- # prints ranked comparison table.
745
- ```
746
-
747
- ---
748
-
749
- ## Instruction Set Architecture
750
-
751
- | Opcode | Operands | Description | Encoding |
752
- |--------|----------|-------------|----------|
753
- | `LOAD` | a=imm, dest=reg | R[dest] = immediate value | 600-cell vertex[0] |
754
- | `ADD` | a=reg, b=reg, dest=reg | R[dest] = R[a] + R[b] | vertex[10] |
755
- | `SUB` | a=reg, b=reg, dest=reg | R[dest] = R[a] - R[b] | vertex[20] |
756
- | `MUL` | a=reg, b=reg, dest=reg | R[dest] = R[a] * R[b] | vertex[30] |
757
- | `STORE` | a=reg, dest=reg | R[dest] = R[a] (copy) | vertex[40] |
758
- | `JMP` | a=addr | IP = a | vertex[50] |
759
- | `JNZ` | a=reg, b=addr | if R[a] != 0: IP = b | vertex[60] |
760
- | `HALT` | - | Stop execution | vertex[70] |
761
- | `STORE_MEM` | a=reg, b=reg | mem[R[b]] = R[a] via E8 lattice | vertex[80] |
762
- | `LOAD_MEM` | a=reg, dest=reg | R[dest] = mem[R[a]] via E8 lattice | vertex[90] |
763
-
764
- **Registers:** 8 general-purpose (R0-R7), 64-bit floating point
765
-
766
- **State encoding:** Each opcode maps to a distinct 600-cell vertex, ensuring maximum angular separation between instruction types in 4D attention space.
767
-
768
- **Memory model:** STORE_MEM encodes the linear address as an 8D golden-angle spiral embedding, decodes it to the nearest E8 lattice point, and stores the value in that Voronoi cell. LOAD_MEM reverses the process, searching the primary cell plus 240 kissing neighbors.
769
-
770
- ---
771
-
772
- ## Benchmarks
773
-
774
- ### Rust (50,000 steps, release build)
775
-
776
- | Benchmark | steps/s | vs Python |
777
- |-----------|---------|-----------|
778
- | Random keys, exact (all 16 buckets) | 120 | 4x |
779
- | Random keys, approx (5/16 buckets) | 1,080 | 32x |
780
- | Random keys, exact + rayon parallel | 354 | 10x |
781
- | Random keys, approx + rayon parallel | 2,743 | 81x |
782
- | Structured (Wasm-like), exact | 113 | 3x |
783
- | Structured (Wasm-like), approx | 950 | 28x |
784
- | Structured (Wasm-like), approx + parallel | 2,600 | 76x |
785
-
786
- **Python PoC baseline:** ~34 steps/s
787
- **Theoretical O(log t) speedup vs O(t):** 1,765x at 50,000 steps
788
-
789
- ### Phase 4: E8 Lattice Memory (Rust, 10,000 steps)
790
-
791
- | Operation | Rate | Hit rate |
792
- |-----------|------|----------|
793
- | Store (E8 decode + bucket insert + H4 project) | 189,753 ops/s | - |
794
- | Load (E8 decode + 240-neighbor scan) | 68,122 ops/s | 100% |
795
- | Unified query (E8 -> H4 projected attention) | 143,313 ops/s | 100% |
796
-
797
- **Lattice utilization:**
798
- - Occupied cells: 106 (of ~10,000 entries)
799
- - Max bucket size: 240 (= kissing number)
800
- - Primary cell hit rate: 100%
801
- - All kissing vector norms verified: norm^2 = 2
802
-
803
- ### Python MCP Server (via Claude Code)
804
-
805
- | Operation | Latency |
806
- |-----------|---------|
807
- | Encoding throughput | ~32,550 states/s |
808
- | Forward pass (50 steps) | 0.001s |
809
- | Forward pass (100 steps) | 0.003s |
810
- | Forward pass (250 steps) | 0.020s |
811
- | Forward pass (500 steps) | 0.083s |
812
-
813
- ### Phase 5: H4 vs Softmax Attention (Python, CPU)
814
-
815
- ChamberTree scan ratio (fraction of keys examined per query) at various sequence lengths:
816
-
817
- | seq_len | Softmax (ms) | H4 full (ms) | H4 tree (ms) | Scan ratio | Notes |
818
- |---------|-------------|--------------|--------------|------------|-------|
819
- | 64 | 0.5 | 1.1 | 1.1 | 100% | Tree not used (too short) |
820
- | 128 | 0.5 | 1.3 | 932 | 43.6% | Python overhead dominates |
821
- | 256 | 1.3 | 2.6 | 1758 | 23.4% | Scan ratio halving per doubling |
822
- | 512 | 5.2 | 8.0 | 3831 | 12.1% | Logarithmic pruning confirmed |
823
- | 1024 | 22.1 | 29.2 | 8821 | 6.2% | Continuing log scaling |
824
- | 2048 | 82.6 | 104.4 | 23434 | 3.1% | 97% of keys pruned |
825
-
826
- **Key finding:** The scan ratio scales as O(log t / t), confirming logarithmic candidate pruning works. The Python ChamberTree has high per-node overhead; the compiled Rust implementation delivers the wall-clock speedup (see below).
827
-
828
- ### Rust ChamberTree Wall-Clock Benchmarks (256 queries, k=32, amortized tree build)
829
-
830
- | n_keys | Exact Brute-Force (ms) | ChamberTree Approx (ms) | Speedup | Top-k Recall |
831
- |--------|----------------------|------------------------|---------|-------------|
832
- | 1,024 | 10.2 | 2.6 | 3.9x | 82.5% |
833
- | 4,096 | 34.6 | 5.2 | 6.7x | 91.1% |
834
- | 16,384 | 155.9 | 18.0 | 8.7x | 95.4% |
835
- | 65,536 | 760.2 | 71.6 | **10.6x** | **98.3%** |
836
-
837
- **Key findings:** Speedup increases with sequence length (O(log t) vs O(t)). Recall *improves* with length --- the opposite of most approximate attention methods. At 65K keys, the tree examines 3.1% of candidates and finds 98.3% of the true top-k.
838
-
839
- ### Shakespeare Head-to-Head (120s CPU training, same infrastructure)
840
-
841
- | Model | Attention | Params | Val Loss | Perplexity |
842
- |-------|-----------|--------|----------|-----------|
843
- | Softmax | O(t^2) | 797K | **2.329** | 10.3 |
844
- | Linear | O(t) | 797K | 2.332 | 10.3 |
845
- | H4 Float | O(log t) | 699K | 2.376 | 10.8 |
846
- | H4 Ternary | O(log t) + 1.58-bit | 699K | 2.394 | 11.0 |
847
-
848
- H4 is 2% behind softmax with 13% fewer parameters. The gap is largely throughput-driven at short sequences; at longer contexts the O(log t) advantage reverses this.
849
-
850
- ### Phase 7: SQuAD Passage Ranking (870K ternary params, 10-min CPU)
851
-
852
- | Metric | H4 Ranker (870K) | Random Chance | Improvement |
853
- |--------|-------------------|--------------|-------------|
854
- | Recall@1 | **41.5%** | 3.1% | 12x |
855
- | Recall@5 | **75.9%** | 15.6% | 5x |
856
- | MRR | **0.57** | 0.13 | 4x |
857
-
858
- The 870K model was the minimum viable proof. At scale (3.7M params, overnight training):
859
-
860
- | Metric | 870K (10 min) | 3.7M (overnight) | Notes |
861
- |--------|-------------|-----------------|-------|
862
- | R@1 | 41.5% | ~37% | Bi-encoder ceiling |
863
- | R@5 | 75.9% | **100%** | Never misses the answer |
864
- | MRR | 0.57 | **0.93** | Answer averages rank 1-2 |
865
-
866
- The bi-encoder's job is retrieval, not precision ranking. R@5=100% and MRR=0.93 means the answer is always in the results. A pre-trained cross-encoder (MiniLM-L6, 22M params) reranks the top-5 candidates to **98.5% R@1**. The H4 geometry handles retrieval; the pre-trained model handles precision.
867
-
868
- ### Phase 5: Initial Training Diagnostics (d_model=64, 2-min CPU)
869
-
870
- | Metric | Start | End (2 min) | Target |
871
- |--------|-------|-------------|--------|
872
- | val_loss | 3.20 | 1.96 | lower |
873
- | val_bpb | 4.62 | 2.94 | lower |
874
- | chamber_entropy | - | 2.33 / 2.77 | high (uniform chamber usage) |
875
- | W_nudge rank | 1.0 | 1.68 | high (rank-1 = focused direction) |
876
- | geo_alignment | - | 0.965 | > 0.9 (aligns with 600-cell) |
877
-
878
- *Note: These are initial verification results at d_model=64. The autoresearch loop subsequently found val_bpb=0.062 at d_model=128. See [RESULTS.md](RESULTS.md) for the full 30-experiment sweep.*
879
-
880
- ### Autoresearch: Float vs Ternary (30 experiments, autonomous)
881
-
882
- | | Float32 best | Ternary best |
883
- |---|-------------|-------------|
884
- | val_bpb | **0.062** | **0.065** |
885
- | Gap | - | 0.003 (4.7%) |
886
- | d_model | 128 | 256 |
887
- | Layers | 6 | 4 |
888
- | Compression | 1x | ~17x |
889
- | Chamber preservation | - | 76.2% |
890
- | Experiments | 16 | 13 |
891
-
892
- *Full results, methodology, and findings in [RESULTS.md](RESULTS.md).*
893
-
894
- ### Phase 6: Initial Chamber Preservation (10,000 random queries per head, at initialization)
895
-
896
- | Head | Preservation | Status |
897
- |------|-------------|--------|
898
- | 0-7 | 97.2% - 98.8% | All OK (>90% threshold) |
899
- | Mean | **97.9%** | Near-lossless at initialization |
900
-
901
- *Note: After training at LR=5e-3, chamber preservation drops to 76.2% as the ternary model finds its own optimal geometric routing. Quality is preserved (0.003 bpb gap). See [RESULTS.md](RESULTS.md) for the full chamber preservation analysis.*
902
-
903
- ---
904
-
905
- ## MCP Server Integration
906
-
907
- The MCP server exposes 5 tools to Claude Code:
908
-
909
- ### h4_fibonacci(n)
910
-
911
- Compute Fibonacci sequence F(0)..F(n+1) through the H4 transformer executor. Returns the result, correctness verification, execution steps, and full sequence.
912
-
913
- ### h4_compile_and_run(instructions, max_steps)
914
-
915
- Execute a custom program. Each instruction is `{opcode, a, b, dest}`. Returns final register state, step count, halt status, and lattice memory statistics.
916
-
917
- ### h4_geometry_info(aspect)
918
-
919
- Query H4 polytope geometry. Aspects: `vertices`, `chambers`, `dot_products`, `golden_ratio`, `all`.
920
-
921
- ### h4_benchmark(n_steps)
922
-
923
- Profile encoding throughput and forward pass timing at different trace lengths.
924
-
925
- ### h4_lattice_memory(action, n_entries)
926
-
927
- Phase 4 E8 lattice diagnostics:
928
- - `action="info"`: Return E8 lattice constants, projection eigenvalues, ISA opcodes
929
- - `action="benchmark"`: Store + load n_entries, return utilization stats
930
-
931
- ---
932
-
933
- ## API Reference
934
-
935
- ### Python
936
-
937
- #### `E8LatticeIndex`
938
-
939
- ```python
940
- class E8LatticeIndex:
941
- def __init__(max_cell_size=240)
942
- def decode_to_lattice(point: ndarray) -> tuple # R8 -> E8 lattice point
943
- def insert(embedding_8d, value, address=None) # Store in Voronoi cell
944
- def project_to_h4(embedding_8d) -> ndarray # 8D -> 4D projection
945
- def query_nearest(query_8d, k=1, search_neighbors=True) -> List
946
- def load_by_address(address) -> Optional[tuple] # Linear address lookup
947
- def stats() -> Dict # Utilization statistics
948
- ```
949
-
950
- #### `H4Executor`
951
-
952
- ```python
953
- class H4Executor:
954
- def __init__(program: Program, d_model=32)
955
- def execute_instruction() # Single ISA step
956
- def run(max_steps=1000) -> Dict # Full execution loop
957
- # Attributes:
958
- # .registers: ndarray[8] # Register file
959
- # .lattice_memory: E8LatticeIndex # Phase 4 RAM
960
- # .trace: List[ndarray] # Execution trace
961
- ```
962
-
963
- #### `Program`
964
-
965
- ```python
966
- class Program:
967
- def add(opcode: str, a=0, b=0, dest=0)
968
- # Opcodes: LOAD, ADD, SUB, MUL, STORE, STORE_MEM, LOAD_MEM, JMP, JNZ, HALT
969
- ```
970
-
971
- #### `H4AttentionLayer` (Phase 5, PyTorch)
972
-
973
- ```python
974
- class H4AttentionLayer(nn.Module):
975
- def __init__(d_model, n_heads=8, d_value=16, top_k=32, dropout=0.0,
976
- use_bitlinear=False)
977
- def forward(x, use_tree=True, return_diagnostics=False)
978
- # x: (batch, seq_len, d_model) -> (batch, seq_len, d_model)
979
- # Frozen buffers: vertices (120x4), simple_roots (4x4), e8_h4_proj (4x8)
980
- # Trainable: W_q_proj, W_k_proj, W_v_proj, W_nudge, W_out, chamber_bonus
981
- # use_bitlinear=True swaps all Linear->BitLinear (ternary {-1,0,+1})
982
- ```
983
-
984
- #### `H4LanguageModel` (Phase 5, PyTorch)
985
-
986
- ```python
987
- class H4LanguageModel(nn.Module):
988
- def __init__(vocab_size, d_model=64, n_heads=8, n_layers=4, d_value=16,
989
- d_ffn=None, top_k=32, max_seq_len=8192, dropout=0.1,
990
- use_bitlinear=False)
991
- def forward(input_ids, use_tree=True, return_diagnostics=False)
992
- # input_ids: (batch, seq_len) -> logits: (batch, seq_len, vocab_size)
993
- def generate(input_ids, max_new_tokens=100, temperature=1.0, top_k_sample=0)
994
- def count_params() -> Dict # {'trainable': int, 'frozen': int, 'buffers': int}
995
- # use_bitlinear=True propagates ternary to all attention + FFN layers
996
- ```
997
-
998
- #### `PhiPositionalEncoding` (Phase 5, PyTorch)
999
-
1000
- ```python
1001
- class PhiPositionalEncoding(nn.Module):
1002
- def __init__(d_model, max_cached=8192)
1003
- def forward(seq_len, offset=0) -> Tensor # (seq_len, d_model)
1004
- def encode_position(position) -> Tensor # (d_model,)
1005
- # Uses golden-angle spiral: position n -> angle n * 2pi * phi^-1
1006
- # Beyond max_cached: Zeckendorf decomposition for O(log n) encoding
1007
- ```
1008
-
1009
- #### `BitLinear` (Phase 6, PyTorch)
1010
-
1011
- ```python
1012
- class BitLinear(nn.Module):
1013
- def __init__(in_features, out_features, bias=False)
1014
- def forward(x) -> Tensor # STE: quantized forward, float backward
1015
- def freeze() # Lock to pure ternary for inference
1016
- def unfreeze() # Return to training mode
1017
- @property
1018
- def ternary_stats -> Dict # {'neg1': float, 'zero': float, 'pos1': float}
1019
- # Weight quantization: scale = mean(|w|), w_q = RoundClip(w/scale, -1, +1)
1020
- # Activation quantization: per-token absmax to int8 [-127, 127]
1021
- ```
1022
-
1023
- ### Rust
1024
-
1025
- #### `LatticeMemory`
1026
-
1027
- ```rust
1028
- pub struct LatticeMemory {
1029
- pub fn new() -> Self
1030
- pub fn store(&mut self, embedding: Vec8, value: [f64; 4], address: u64)
1031
- pub fn load(&mut self, query: Vec8) -> Option<(f64, [f64; 4], u64, u64)>
1032
- pub fn load_by_address(&self, address: u64) -> Option<([f64; 4], u64)>
1033
- pub fn query_attention_exact(&self, query_8d: Vec8) -> Option<(f64, [f64; 4], u64)>
1034
- pub fn query_attention_approx(&self, query_8d: Vec8) -> Option<(f64, [f64; 4], u64)>
1035
- pub fn project(&self, embedding: Vec8) -> Vec4
1036
- pub fn stats(&self) -> LatticeMemoryStats
1037
- pub fn utilization(&self) -> f64
1038
- }
1039
- ```
1040
-
1041
- #### `LatticeAttention`
1042
-
1043
- ```rust
1044
- pub struct LatticeAttention {
1045
- pub fn new(d_model: usize) -> Self
1046
- pub fn insert(&mut self, embedding: &[f64])
1047
- pub fn store_mem(&mut self, embedding_8d: [f64; 8], value: [f64; 4], address: u64)
1048
- pub fn load_mem(&mut self, query_8d: [f64; 8]) -> Option<(f64, [f64; 4], u64, u64)>
1049
- pub fn query_exact(&self, embedding: &[f64]) -> Vec<Option<(f64, u64)>>
1050
- pub fn query_approx(&self, embedding: &[f64]) -> Vec<Option<(f64, u64)>>
1051
- pub fn query_unified(&self, query_8d: [f64; 8]) -> Option<(f64, [f64; 4], u64)>
1052
- pub fn memory_stats(&self) -> LatticeMemoryStats
1053
- }
1054
- ```
1055
-
1056
- #### `ChamberTree`
1057
-
1058
- ```rust
1059
- pub struct ChamberTree {
1060
- pub fn new(simple_roots: [Vec4; 4]) -> Self
1061
- pub fn insert(&mut self, key: Vec4, value: [f64; 4], timestamp: u64)
1062
- pub fn query_max_exact(&self, query: Vec4) -> Option<(f64, [f64; 4], u64)>
1063
- pub fn query_max_approx(&self, query: Vec4) -> Option<(f64, [f64; 4], u64)>
1064
- pub size: u64
1065
- }
1066
- ```
1067
-
1068
- #### E8 Lattice Functions
1069
-
1070
- ```rust
1071
- pub fn decode_to_e8(point: Vec8) -> LatticePoint // O(1) Voronoi cell decode
1072
- pub fn kissing_vectors() -> Vec<LatticePoint> // 240 nearest neighbors
1073
- pub fn lattice_add(a: LatticePoint, b: LatticePoint) -> LatticePoint
1074
- pub fn neighbor_shell(center: LatticePoint) -> Vec<LatticePoint>
1075
- ```
1076
-
1077
- ---
1078
-
1079
- ## Theory Deep Dive
1080
-
1081
- ### Why Transformers Are Computers
1082
-
1083
- The insight from Percepta ("Can LLMs Be Computers?") is that a transformer's components map directly to a von Neumann architecture:
1084
-
1085
- | Transformer | Computer |
1086
- |-------------|----------|
1087
- | KV cache | RAM |
1088
- | Attention query | Memory read (address decode) |
1089
- | KV insert | Memory write |
1090
- | FFN layer | ALU + instruction decode |
1091
- | Token sequence | Execution trace / clock |
1092
- | Softmax weights | Memory access pattern |
1093
-
1094
- H4 Polytopic Attention makes this concrete by replacing the O(t) softmax scan with O(log t) geometric lookup.
1095
-
1096
- ### Why the E8 -> H4 Projection Unifies Memory and Attention
1097
-
1098
- The E8 root system has a remarkable property: its Coxeter element has eigenvalues that are roots of unity of order 30 (the Coxeter number). When projected along the eigenspaces corresponding to cos(pi/5) and cos(2*pi/5), the 240 roots of E8 map to the vertices of the H4 polytope system.
1099
-
1100
- This means:
1101
- 1. An 8D memory embedding encodes a "full" representation in E8 space
1102
- 2. The projection to 4D preserves exactly the geometric structure that the H4 attention heads use
1103
- 3. Two memory entries that are "nearby" in E8 Voronoi geometry remain nearby after projection to H4 chamber space
1104
- 4. The kissing number 240 in E8 bounds the search space: you never need to check more than 240 neighbors
1105
-
1106
- This is not an arbitrary dimensionality reduction --- it is the unique projection that preserves the golden-ratio structure shared between E8 and H4.
1107
-
1108
- ### Lattice Memory vs. Hash Tables
1109
-
1110
- Traditional hash tables give O(1) lookup but destroy geometric locality --- similar keys can hash to wildly different buckets. E8 lattice decoding gives O(1) lookup while preserving locality:
1111
-
1112
- | Property | Hash Table | E8 Lattice Memory |
1113
- |----------|------------|-------------------|
1114
- | Lookup | O(1) | O(1) |
1115
- | Locality preservation | None | Voronoi cells are convex |
1116
- | Neighbor search | Not possible | 240 kissing vectors |
1117
- | Attention integration | Separate system | Same geometry via projection |
1118
- | Collision handling | Chaining/probing | Cell capacity (bounded by 240) |
1119
-
1120
- ### The Significance of 240
1121
-
1122
- The kissing number of E8 is 240 --- this is the maximum number of non-overlapping unit spheres that can touch a central unit sphere in 8D. It is also the number of roots of the E8 root system, the number of minimal vectors in the E8 lattice, and the bound on how many neighbor cells you ever need to check for a Voronoi cell query.
1123
-
1124
- The 240 roots decompose as:
1125
- - 112 = C(8,2) * 4 vectors of the form +-e_i +- e_j
1126
- - 128 = 2^8 / 2 half-integer vectors with even parity
1127
-
1128
- This decomposition mirrors the two cosets of E8 = D8 + (D8 + [1/2]^8).
1129
-
1130
- ### Concurrent Work: Percepta — "Can LLMs Be Computers?"
1131
-
1132
- **Percepta** (Tzamos et al., 2026) independently arrived at O(log t) attention through 2D convex hull geometry. They execute compiled C programs inside transformer weights at 32,000 tok/s on CPU --- millions of exact steps, zero errors.
1133
-
1134
- **The convergence:** Two independent groups identified the same bottleneck (linear attention cost) and arrived at the same solution (geometric sublinear lookup). They use 2D convex hull queries. We use 4D Coxeter chamber navigation. Both achieve O(log t).
1135
-
1136
- | | Percepta (2D) | H4 Polytopic (4D) |
1137
- |---|---|---|
1138
- | Geometry | 2D convex hull | 4D H4 polytope |
1139
- | Complexity | O(log t) | O(log t) |
1140
- | Purpose | Exact program execution | Language generation + RAG |
1141
- | Throughput | 32,000 tok/s (deterministic) | 585 tok/s (language) |
1142
-
1143
- **Why this matters:** Independent validation that geometric attention enables sublinear lookup. Not a task-specific trick --- a fundamental improvement.
1144
-
1145
- **Synthesis:** Their 2D execution path (exact arithmetic at 32K tok/s) + our 4D language path (generation + retrieval) = a hybrid system where the model computes 15 x 23 exactly in its own forward pass, then explains the answer in natural language. No external calculator needed.
1146
-
1147
- ### Lila-E8
1148
-
1149
- **Lila-E8** (concurrent work, 2025-2026) also uses the E8 lattice for attention, but in a fundamentally different way:
1150
-
1151
- | | Lila-E8 | H4 Polytopic Attention (this project) |
1152
- |---|---|---|
1153
- | **E8 role** | Attention bias (additive term in score matrix) | Memory addressing + algorithmic routing via E8->H4 projection |
1154
- | **Complexity** | O(t^2) --- full attention matrix still computed | **O(log t)** --- ChamberTree prunes 97% of candidates |
1155
- | **Mechanism** | E8 structure tells which tokens to upweight | E8->H4 projection partitions S^3 into navigable chambers |
1156
- | **Speed benefit** | Better quality at same cost | **10.6x wall-clock speedup** at 65K keys |
1157
-
1158
- Both approaches validate that E8 geometry is useful for attention. Lila-E8 improves attention quality within O(t^2). We use the E8->H4 projection (cos(pi/5) = phi/2) to make attention fundamentally faster. The approaches are complementary --- Lila-E8's bias could be applied within the candidate set our ChamberTree selects, combining quality with speed.
1159
-
1160
- ---
1161
-
1162
- ## Autoresearch Results
1163
-
1164
- Autonomous agents ran 42+ experiments across three tasks (language modeling, ternary optimization, passage ranking), all on CPU with no human intervention after launch.
1165
-
1166
- **Language modeling (30 experiments, ~56 min):**
1167
- - **1.752 -> 0.062 bpb** (float, 16 experiments): LR was the biggest lever (10x increase), followed by depth and dropout removal
1168
- - **0.088 -> 0.065 bpb** (ternary, 13 experiments): 2x width closed the quantization gap to 0.003
1169
- - **Dropout=0 is optimal**: the frozen geometric backbone IS the regularizer
1170
- - **Ternary wants 1.7x float LR**: STE quantization noise provides implicit regularization
1171
- - **Chamber preservation cliff at ~70%**: below this, geometric routing breaks down
1172
-
1173
- **Passage ranking (12 experiments, ~2 hrs):**
1174
- - **36.6% -> 41.5% R@1** on SQuAD: temperature was the only lever that mattered
1175
- - **Ternary contrastive learning needs 2x higher temperature** (0.15 vs 0.07) --- noisier ternary similarities need softer distributions
1176
- - **Throughput x quality-per-step** is the real objective on fixed time budgets (consistent finding across all three task types)
1177
-
1178
- Full methodology and experiment logs in [RESULTS.md](RESULTS.md).
1179
-
1180
- ---
1181
-
1182
- ## How This Was Built: Claude Code as Research Partner
1183
-
1184
- This entire project --- from the first 600-cell vertex generation through the final PPL 10.0 model on Hugging Face --- was built in a single extended session using [Claude Code](https://claude.com/claude-code), Anthropic's CLI agent for software engineering. The workflow demonstrates what's possible when an AI assistant has the right tools and the right human guidance.
1185
-
1186
- ### The Claude Code Workflow
1187
-
1188
- **Phase 1-6 (Architecture):** Claude Code wrote the core implementation files, verified shapes and gradients, ran benchmarks, and committed each phase with descriptive messages. Every piece was tested before moving to the next.
1189
-
1190
- **Autoresearch (42+ experiments):** Claude Code launched autonomous subagents that each ran 2-minute training experiments, parsed results, decided keep/discard, committed improvements, and moved to the next hypothesis. The float sweep (16 experiments) and ternary sweep (13 experiments) ran back-to-back. The ranking sweep (12 experiments) ran separately. No human intervention during sweeps --- the agents found LR scaling, dropout removal, temperature tuning, and the BitNet 2x-width law on their own.
1191
-
1192
- **Parallel tracks:** Three subagents ran simultaneously on non-overlapping files:
1193
- - Track A: Rust PyO3 ChamberTree bridge (compiled, benchmarked 10.6x speedup)
1194
- - Track B: Shakespeare baselines + data pipeline (head-to-head comparison)
1195
- - Track C: Full arXiv paper draft (~7,500 words with LaTeX math)
1196
-
1197
- **Overnight training:** Claude Code configured and launched 8-hour CPU training runs, saving hourly checkpoints with evaluation and generated samples. The PPL 10.0 result came from an overnight run that Claude Code set up, monitored at 30 minutes to de-risk, then let complete autonomously.
1198
-
1199
- **Cross-encoder reranker:** Based on an engineer's analysis of bi-encoder limitations (R@5=100% but R@1 plateauing), Claude Code built a cross-encoder that uses the PPL 10.0 checkpoint as backbone and fine-tunes on SQuAD binary classification --- a two-phase training strategy (freeze backbone, then unfreeze) implemented and running within minutes of the suggestion.
1200
-
1201
- ### Reproducibility
1202
-
1203
- Every step is reproducible:
1204
- 1. Clone the repo
1205
- 2. Install dependencies (`pip install numpy torch tiktoken`)
1206
- 3. Run any training script (`train_cpu.py`, `train_full_scale.py`, `rag/train_ranker.py`)
1207
- 4. The autoresearch protocol is documented in `h4_program.md`
1208
- 5. All experiment results are in git history with descriptive commit messages
1209
- 6. The HF model page has a 7-step guide from zero to trained model
1210
-
1211
- ### Acknowledgments
1212
-
1213
- This project was made possible by [Anthropic](https://anthropic.com) and the Claude model family. The ability of Claude Code (powered by Claude Opus) to write, test, debug, and iterate on complex mathematical software --- managing Rust FFI bridges, PyTorch autograd, E8 lattice geometry, and autonomous experiment loops --- is a testament to the work of Dario Amodei and the entire Anthropic team in building AI systems that are genuinely useful for technical work.
1214
-
1215
- The autoresearch methodology was inspired by [Andrej Karpathy's autoresearch](https://github.com/karpathy/autoresearch) project, adapted for CPU-only training with frozen geometric backbones.
1216
-
1217
- **Author:** Timothy McGirl
1218
- **AI Research Partner:** Claude Code (Claude Opus 4.6, Anthropic)
1219
-
1220
- ---
1221
-
1222
- ## Why This Matters: The RAG Cost Elimination
1223
-
1224
- RAG (retrieval-augmented generation) is what most companies actually pay for right now. Not creative writing, not code generation --- document search. "Find the answer to this question in our 10,000 internal documents." That's the workload, and it's expensive.
1225
-
1226
- ### Current enterprise RAG stack
1227
-
1228
- | Component | Service | Monthly Cost |
1229
- |-----------|---------|-------------|
1230
- | Embedding model | OpenAI ada-002 | $0.10/M tokens |
1231
- | Vector database | Pinecone / Weaviate | $70-300/month |
1232
- | LLM for generation | GPT-3.5/4 | $0.50-10/M tokens |
1233
- | **Total (mid-size company)** | | **$500-2,000/month** |
1234
-
1235
- ### H4 geometric RAG stack
1236
-
1237
- | Component | Implementation | Monthly Cost |
1238
- |-----------|---------------|-------------|
1239
- | Document retrieval | E8 lattice memory, 7.8ms/query | **$0** |
1240
- | Passage ranking | H4 bi-encoder (R@5=100%) + MiniLM reranker (R@1=98.5%) | **$0** |
1241
- | Text generation | Ternary H4 model, 585 tok/s, PPL 10.0 | **$0** |
1242
- | Vector database | Not needed --- the E8 lattice IS the database | **$0** |
1243
- | **Total** | Runs on existing laptop | **$0/month** |
1244
-
1245
- That's not a cost reduction. That's eliminating the cost entirely.
1246
-
1247
- ### Why the architecture matters for RAG
1248
-
1249
- In a standard RAG system, retrieval and generation are completely separate systems. You pay for an embedding model to encode documents, pay for a vector database to store and search them, then pay for a different LLM to read the results and generate an answer. Three separate systems, three separate costs, three separate failure points.
1250
-
1251
- H4 Polytopic Attention unifies all three through the E8->H4 projection. Documents go into the E8 lattice. Questions project through the same geometry to find relevant documents. The H4 attention model reads those documents and generates answers. **One geometry, one system, zero external dependencies.**
1252
-
1253
- ### The business case
1254
-
1255
- A company with 50,000 internal documents currently pays ~$500/month ($6,000/year) for hosted RAG. With H4 on a single office server:
1256
- - $0/month ongoing
1257
- - ~$1,500 one-time for a CPU server (if they don't have one)
1258
- - **Payback in 3 months**
1259
-
1260
- For companies with 500,000+ documents or heavy query volume ($2,000-5,000/month currently), payback is under a month.
1261
-
1262
- ### What's proven, what's next
1263
-
1264
- **Proven:**
1265
- - E8 retrieval: R@5=100%, 20ms (Voronoi cell + 240-neighbor search)
1266
- - H4 cross-encoder: **80% R@1 peak** on 5.9K SQuAD pairs (25M ternary, breakthrough)
1267
- - MiniLM reranking: R@1=98.5% on same candidates (production accuracy)
1268
- - Language generation: PPL 10.0 on TinyStories (24M ternary, beats 33M baseline)
1269
- - ChamberTree: 10.6x wall-clock speedup at 65K keys, 98.3% recall
1270
- - Router: 100% on 50 test cases (keyword classifier + ChamberTree sub-routing)
1271
- - Ternary quantization: 0.003 bpb gap, ~17x compression
1272
-
1273
- **Currently training (3 GPU pods in parallel):**
1274
- - Code specialist: SmolLM3-3B + QLoRA on 49K code examples (CodeAlpaca + CodeFeedback)
1275
- - Math specialist: SmolLM3-3B + QLoRA on 49K math examples (MetaMathQA + GSM8K)
1276
- - QA specialist: SmolLM3-3B + QLoRA on 78K QA examples (SQuAD + Natural Questions)
1277
- - All three on RunPod RTX 4080 SUPER, ~$10 total, finishing overnight
1278
-
1279
- **The shippable system:** Router (100%, <1ms) -> specialist (SmolLM3-3B + LoRA) -> E8 retrieval (R@5=100%, 20ms) -> MiniLM reranking (R@1=98.5%) -> answer. No GPU, no API, $0/month.
1280
-
1281
- ### Roadmap
1282
-
1283
- | Phase | Status | What |
1284
- |-------|--------|------|
1285
- | **Now** | Training overnight | 3 specialists on GPU (code, math, QA) |
1286
- | **Tomorrow** | Next | Wire specialists into router, validate, benchmark |
1287
- | **This week** | Planned | GGUF conversion for fast CPU inference, E8 Wikipedia index, end-to-end demo |
1288
- | **Next week** | Planned | Add tools (web search, PDF reader, calculator) |
1289
- | **After that** | Designed | [Continuous learning loop](OLYMPUS_CONTINUOUS_LEARNING.md) --- system identifies its own gaps and trains new specialists autonomously |
1290
-
1291
- ---
1292
-
1293
- ## Project Olympus: Frontier-Quality AI on CPU
1294
-
1295
- Everything above is the foundation. **[Project Olympus](PROJECT_OLYMPUS.md)** is the vision: a system that approaches frontier model quality running entirely on CPU, for the billions of people who can't afford GPU compute and API subscriptions.
1296
-
1297
- **The core insight:** Claude Opus memorizes everything in 200B+ params (~400GB). We build 4 focused specialists based on **SmolLM3-3B** (3B ternary each, ~600MB) that know their domain deeply and retrieve everything else from the E8 knowledge index in 20ms. A 3B model that can look up any fact is functionally equivalent to a 200B model that memorized those facts --- for the user, the answer is the same.
1298
-
1299
- **Base model:** SmolLM3-3B-Instruct (Apache 2.0, 11.2T training tokens, 128K context, dual-mode reasoning). The strongest open-source model at this size as of 2025-2026.
1300
-
1301
- **The 4 specialists:**
1302
-
1303
- | Specialist | Fine-tuning | Purpose |
1304
- |-----------|-------------|---------|
1305
- | General | None (SmolLM3 as-is) | Conversation, instructions, creative |
1306
- | Code | The Stack v2 + CodeAlpaca | Code generation, debugging |
1307
- | Math | MetaMathQA + GSM8K | Problem solving, reasoning |
1308
- | QA | SQuAD + NQ + TriviaQA | Factual answers from retrieved context |
1309
-
1310
- **What's already proven:**
1311
- - E8 retrieval: R@5=100% (the answer is always in the results)
1312
- - MiniLM reranking: R@1=98.5% (the right answer is picked first)
1313
- - H4 geometric routing: <1ms per query (ChamberTree chamber classification)
1314
- - Ternary quantization: 17x compression (3B model fits in ~600MB)
1315
- - CPU training via QLoRA: 3-5 days for all specialists
1316
-
1317
- **What it enables:**
1318
- - Factual QA at 85-90% (retrieval advantage --- looks up facts instead of hallucinating)
1319
- - Instruction following at 75-85% (good enough for most tasks)
1320
- - $0/month, private, local, runs on any laptop with 32GB RAM
1321
- - 100% legally clean: Apache 2.0 models + open datasets, no distillation
1322
-
1323
- **Cost to build:** ~$50-100 in cloud compute (or $0 with a laptop and ~14 days)
1324
-
1325
- **The 14-day plan:** Download SmolLM3 (day 1) -> fine-tune 3 specialists via QLoRA (days 2-4) -> progressive H4 attention swap (days 5-10) -> router + knowledge index + integration (days 11-14).
1326
-
1327
- **Beyond the initial 4 specialists:** The system is designed to grow. **[Continuous Learning](OLYMPUS_CONTINUOUS_LEARNING.md)** describes how the system identifies its own weaknesses (low confidence scores on a domain), automatically curates training data, trains new specialists via QLoRA, validates they outperform the general model, and deploys them --- all autonomously, using the same autoresearch pattern that already ran 42+ experiments without human intervention. Each new specialist costs $2-3 in GPU time.
1328
-
1329
- See **[PROJECT_OLYMPUS.md](PROJECT_OLYMPUS.md)** for the full plan: SmolLM3-3B selection rationale, complete legal audit of every dataset, QLoRA training configs, H4 progressive swap strategy, and honest quality expectations.
1330
-
1331
- *This is not a replacement for frontier models. It's an alternative for the billions of people who can't afford them.*
1332
-
1333
- ---
1334
-
1335
- ## Citation
1336
-
1337
- ```bibtex
1338
- @software{mcgirl2026h4polytopic,
1339
- author = {McGirl, Timothy},
1340
- title = {H4 Polytopic Attention: 4D Geometric Attention with O(log t) Queries via Coxeter Chamber Navigation and E8 Lattice-Indexed RAM},
1341
- year = {2026},
1342
- url = {https://github.com/grapheneaffiliate/h4-polytopic-attention},
1343
- }
1344
- ```
1345
-
1346
- ---
1347
-
1348
- ## License
1349
-
1350
- See repository for license details.
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - geometric-attention
5
+ - h4-polytope
6
+ - ternary-quantization
7
+ - project-olympus
8
+ - transformer-vm
9
+ - hilbert-modular-form
10
+ - e8-lattice
11
+ - percepta
12
+ language:
13
+ - en
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # H4 Polytopic Attention + Project Olympus
18
+
19
+ Geometric attention using the 600-cell polytope for O(log t) token lookup,
20
+ with ternary quantization and multi-specialist architecture.
21
+
22
+ ## What's Here
23
+
24
+ ### Checkpoints
25
+ | Model | Size | Description |
26
+ |-------|------|-------------|
27
+ | h4_fullscale_final.pt | 94MB | 24M ternary H4 LM, PPL 10.0 on TinyStories |
28
+ | h4_cross_encoder.pt | 98MB | 80% R@1 cross-encoder for reranking |
29
+ | olympus_code/final/ | 116MB | SmolLM3-3B LoRA, code specialist (loss 0.768) |
30
+ | olympus_math/final/ | 116MB | SmolLM3-3B LoRA, math specialist (loss 0.235) |
31
+ | olympus_qa/final/ | 116MB | SmolLM3-3B LoRA, QA specialist (loss 1.39) |
32
+
33
+ ### Verified Results
34
+ | Result | Value |
35
+ |--------|-------|
36
+ | H4 attention scan ratio | 3.1% at T=2048 (O(log t)) |
37
+ | Rust ChamberTree speedup | 10.6x at 65K keys, 98.3% recall |
38
+ | Ternary quantization gap | 0.003 bpb |
39
+ | Language generation | PPL 10.0 (beats 33M baseline) |
40
+ | Router accuracy | 100% on 50 test cases |
41
+ | Compiled arithmetic | 30/30 exact |
42
+ | Transformer-VM | 10.7K tok/s with OpenBLAS |
43
+ | Code verifier | Catches DP backtracking bugs via property checking |
44
+
45
+ ### Mathematical Discovery
46
+ **Galois Conjugation Theorem (machine-verified in Lean 4):**
47
+ For every E8 lattice vector v, the H4 and H4' projected norms are
48
+ Galois conjugates in Q(sqrt(5)). The E8 theta series decomposes as a
49
+ Hilbert modular form of weight (4,4) with palindromic coefficients.
50
+
51
+ Formal proof: [GSMLean/GaloisConjugation.lean](https://github.com/grapheneaffiliate/gsm-lean)
52
+
53
+ ### Architecture
54
+ Three-tier compute: transformer-vm (exact, 10.7K tok/s) > compiled arithmetic (fallback) > specialist LLM (language).
55
+ Code verification via property checking (sprint contract pattern).
56
+
57
+ ## Quick Start
58
+
59
+
60
+ ## Research Plan
61
+ See [docs/GEOMETRIC_INFERENCE.md](docs/GEOMETRIC_INFERENCE.md) for the
62
+ roadmap to Opus-level AI at 50 tok/s on consumer CPU via five multiplying
63
+ geometric optimizations.
64
+
65
+ ## License
66
+ Apache 2.0. Transformer-VM integration uses Percepta's Apache 2.0 code.