grapheneaffiliates
/

h4-polytopic-attention

+# Olympus Complete State — Session Handoff Document
+**Last updated:** 2026-03-25 (evening)
+**Purpose:** Everything a new Claude Code session needs to continue from exactly where we left off. Read this file first.
+---
+## Training: COMPLETE. Pods: STOPPED. Checkpoints: LOCAL.
+All three specialists finished training, checkpoints verified and downloaded, pods stopped.
+| Specialist | Final Loss | Runtime | Checkpoint | GGUF |
+|-----------|-----------|---------|------------|------|
+| **Code** | 0.768 | 7h24m | `checkpoints/olympus_code/final/` (116MB LoRA) | `checkpoints/gguf/olympus-code-q4_k_m.gguf` (1.8GB) |
+| **Math** | 0.235 | 7h29m | `checkpoints/olympus_math/final/` (116MB LoRA) | `checkpoints/gguf/olympus-math-q4_k_m.gguf` (1.8GB) |
+| **QA** | 1.39 | 7h52m | `checkpoints/olympus_qa/final/` (116MB LoRA) | `checkpoints/gguf/olympus-qa-q4_k_m.gguf` (1.8GB) |
+**Upgraded code specialist:** `checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf` (4.4GB)
+- Qwen2.5-Coder-7B-Instruct, Q4_K_M quantized
+- Correctly implements predecessor tracking in DP (the bug SmolLM3-3B couldn't fix)
+- ~3.9 tok/s on CPU (vs 7.7 tok/s for 3B, but correct code on first shot)
+**RunPod:** All pods stopped. Rotate API key.
+---
+## What's Running (Lattice App)
+```bash
+# Launch
+export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
+export PATH="/c/Users/atchi/h4-polytopic-attention/transformer-vm/openblas/bin:$PATH"
+py olympus/app.py
+# Open http://127.0.0.1:7860
+```
+### Three-Tier Compute Engine
+| Priority | Engine | Speed | Scope |
+|----------|--------|-------|-------|
+| 1 | **transformer-vm** | 10.7K tok/s | Exact: arithmetic, fib, prime, GCD, collatz, LIS |
+| 2 | **compiled_arithmetic** | ~5ms | Fallback: basic arithmetic, zero dependencies |
+| 3 | **Specialist LLMs (GGUF)** | 3-8 tok/s | Language: code, math reasoning, QA |
+### Smart Routing
+- Pure computation ("what is 15*23") → transformer-vm, instant, exact
+- Code request ("write a function for LIS") → transformer-vm computes ground truth + code specialist generates code + property checker verifies
+- Math reasoning ("solve x^2+3x-4=0") → math specialist
+- Factual questions → QA specialist
+### Code Verification Pipeline (Sprint Contract Pattern)
+1. **Generate** — specialist writes Python
+2. **Execute** — runs in subprocess sandbox
+3. **Properties** — checks mathematical invariants (increasing? subsequence? sorted?)
+4. **Fix** — if properties fail, feeds violation back for second attempt
+5. **Ground truth** — transformer-vm provides correct answer for comparison
+---
+## Transformer-VM Integration
+**Repo:** `transformer-vm/` (cloned from Percepta-Core/transformer-vm, Apache 2.0)
+**C++ engine:** Compiled with clang++ + OpenBLAS, 10.7K tok/s (was 7K without BLAS)
+**wasi-sdk:** `transformer-vm/wasi-sdk/` for C-to-WASM compilation
+### Compiled C Tools (exact computation)
+```
+olympus/wasm_tools/math/arithmetic.c   — +, -, *, /, %, ^ on integers
+olympus/wasm_tools/math/fibonacci.c    — fib(n)
+olympus/wasm_tools/math/prime_check.c  — primality test with smallest factor
+olympus/wasm_tools/math/gcd.c          — GCD + LCM via Euclidean algorithm
+olympus/wasm_tools/math/collatz.c      — Collatz sequence
+olympus/wasm_tools/code/lis.c          — Longest Increasing Subsequence (DP + predecessor)
+```
+### Adding New Compiled Tools
+1. Write C with `void compute(const char *input)` interface (see `runtime.h`)
+2. Put in `olympus/wasm_tools/<domain>/`
+3. Register in `olympus/tvm_engine.py` NAMED_OPS dict
+4. Done — exact execution, ~300ms per query
+### OpenBLAS Speedup
+Prebuilt OpenBLAS at `transformer-vm/openblas/`. The C++ engine was patched (`transformer_blas.cpp`) to use `cblas_dgemv` instead of scalar loops. Projection time went from 80.9s → 28.4s (2.85x), total throughput 7.1K → 10.7K tok/s.
+Remaining bottleneck: hull attention at 69% of runtime (std::set allocator pressure). That's Percepta's optimization to make.
+---
+## GGUF Conversion Pipeline
+```bash
+# Already done, but to reconvert:
+py olympus/convert_gguf.py              # Convert all specialists
+py olympus/convert_gguf.py --check      # Verify outputs exist
+py olympus/convert_gguf.py --specialist code --force  # Reconvert one
+```
+Requires: `peft`, `transformers`, `llama.cpp/` (cloned), `gguf` package.
+---
+## New Files This Session
+```
+olympus/tvm_engine.py              — Transformer-VM wrapper (compile C→WASM→execute)
+olympus/gguf_inference.py          — GGUF model loading + generation (SmolLM3 + Qwen)
+olympus/convert_gguf.py            — LoRA merge + GGUF conversion + quantization
+olympus/code_verifier.py           — Code execution sandbox + property checker
+olympus/wasm_tools/math/*.c        — 5 exact computation tools
+olympus/wasm_tools/code/lis.c      — Longest Increasing Subsequence
+```
+## Modified Files This Session
+```
+olympus/app.py                     — Lattice UI: transformer-vm + GGUF + verification pipeline
+olympus/router.py                  — Three-tier priority: tvm → compiled_arithmetic → specialist
+.gitignore                         — Exclude transformer-vm/, llama.cpp/, compiled WASM tokens
+```
+---
+## Verified Results (updated)
+| Result | Value | How to reproduce |
+|--------|-------|-----------------|
+| Transformer-VM throughput | 10.7K tok/s (OpenBLAS) | `cd transformer-vm && py -m uv run wasm-run` |
+| All 6 TVM examples | 6/6 PASS (hello, addition, collatz, fib, matching, sudoku) | Same as above |
+| Our compiled tools | 12/12 PASS | `py -c "from olympus.tvm_engine import TVMEngine; ..."` |
+| Code specialist (Qwen 7B) | Correct LIS with predecessor tracking | Lattice UI |
+| Math specialist | Correct garden area + fence posts | Lattice UI |
+| QA specialist | Correct tidal explanation | Lattice UI |
+| Property checker | Catches `[5,3,7,101]` as not increasing | `py -c "from olympus.code_verifier import check_output_properties; ..."` |
+| Router accuracy | 50/50 (100%) | `py olympus/router.py` |
+| Compiled arithmetic | 30/30 exact | `py olympus/compiled_arithmetic.py` |
+---
+## What To Do Next
+### Immediate:
+1. **Upload specialist LoRA adapters + GGUF to HuggingFace**
+2. **Add more compiled C tools** — sort, binary search, matrix operations
+3. **Build E8 Wikipedia index** for real knowledge retrieval in QA
+### This week:
+4. **Continuous learning loop** (OLYMPUS_CONTINUOUS_LEARNING.md)
+5. **Web search via Crawl4AI** for live information
+6. **String operations** compiled into C tools (regex, parsing)
+### Architecture improvements:
+7. **Hybrid code generation** — specialist generates structure, calls transformer-vm for algorithms
+8. **Evaluator model** — larger model checks smaller specialist output (Anthropic harness pattern)
+9. **GGUF for general specialist** — convert base SmolLM3-3B (no LoRA) for general chat
+---
+## Key External Dependencies
+| Dependency | Location | Purpose |
+|-----------|----------|---------|
+| transformer-vm | `transformer-vm/` (git clone) | Exact computation engine |
+| wasi-sdk | `transformer-vm/wasi-sdk/` | C-to-WASM compiler |
+| OpenBLAS | `transformer-vm/openblas/` | BLAS acceleration for C++ engine |
+| llama.cpp | `llama.cpp/` (git clone) | GGUF conversion + quantization |
+| Qwen2.5-Coder-7B | `checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf` | Code specialist (4.4GB) |
+---
+## How to Resume in a New Session
+```
+1. Read this file: OLYMPUS_STATE.md
+2. Training is DONE. Pods are STOPPED. Checkpoints are LOCAL.
+3. To launch Lattice:
+   export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
+   py olympus/app.py
+   Open http://127.0.0.1:7860
+4. Continue with "What To Do Next" list
+```