h4-polytopic-attention / OLYMPUS_STATE.md
grapheneaffiliates's picture
Upload OLYMPUS_STATE.md with huggingface_hub
73ccf45 verified

Olympus Complete State β€” Session Handoff Document

Last updated: 2026-03-25 (evening) Purpose: Everything a new Claude Code session needs to continue from exactly where we left off. Read this file first.


Training: COMPLETE. Pods: STOPPED. Checkpoints: LOCAL.

All three specialists finished training, checkpoints verified and downloaded, pods stopped.

Specialist Final Loss Runtime Checkpoint GGUF
Code 0.768 7h24m checkpoints/olympus_code/final/ (116MB LoRA) checkpoints/gguf/olympus-code-q4_k_m.gguf (1.8GB)
Math 0.235 7h29m checkpoints/olympus_math/final/ (116MB LoRA) checkpoints/gguf/olympus-math-q4_k_m.gguf (1.8GB)
QA 1.39 7h52m checkpoints/olympus_qa/final/ (116MB LoRA) checkpoints/gguf/olympus-qa-q4_k_m.gguf (1.8GB)

Upgraded code specialist: checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf (4.4GB)

  • Qwen2.5-Coder-7B-Instruct, Q4_K_M quantized
  • Correctly implements predecessor tracking in DP (the bug SmolLM3-3B couldn't fix)
  • ~3.9 tok/s on CPU (vs 7.7 tok/s for 3B, but correct code on first shot)

RunPod: All pods stopped. Rotate API key.


What's Running (Lattice App)

# Launch
export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
export PATH="/c/Users/atchi/h4-polytopic-attention/transformer-vm/openblas/bin:$PATH"
py olympus/app.py
# Open http://127.0.0.1:7860

Three-Tier Compute Engine

Priority Engine Speed Scope
1 transformer-vm 10.7K tok/s Exact: arithmetic, fib, prime, GCD, collatz, LIS
2 compiled_arithmetic ~5ms Fallback: basic arithmetic, zero dependencies
3 Specialist LLMs (GGUF) 3-8 tok/s Language: code, math reasoning, QA

Smart Routing

  • Pure computation ("what is 15*23") β†’ transformer-vm, instant, exact
  • Code request ("write a function for LIS") β†’ transformer-vm computes ground truth + code specialist generates code + property checker verifies
  • Math reasoning ("solve x^2+3x-4=0") β†’ math specialist
  • Factual questions β†’ QA specialist

Code Verification Pipeline (Sprint Contract Pattern)

  1. Generate β€” specialist writes Python
  2. Execute β€” runs in subprocess sandbox
  3. Properties β€” checks mathematical invariants (increasing? subsequence? sorted?)
  4. Fix β€” if properties fail, feeds violation back for second attempt
  5. Ground truth β€” transformer-vm provides correct answer for comparison

Transformer-VM Integration

Repo: transformer-vm/ (cloned from Percepta-Core/transformer-vm, Apache 2.0) C++ engine: Compiled with clang++ + OpenBLAS, 10.7K tok/s (was 7K without BLAS) wasi-sdk: transformer-vm/wasi-sdk/ for C-to-WASM compilation

Compiled C Tools (exact computation)

olympus/wasm_tools/math/arithmetic.c   β€” +, -, *, /, %, ^ on integers
olympus/wasm_tools/math/fibonacci.c    β€” fib(n)
olympus/wasm_tools/math/prime_check.c  β€” primality test with smallest factor
olympus/wasm_tools/math/gcd.c          β€” GCD + LCM via Euclidean algorithm
olympus/wasm_tools/math/collatz.c      β€” Collatz sequence
olympus/wasm_tools/code/lis.c          β€” Longest Increasing Subsequence (DP + predecessor)

Adding New Compiled Tools

  1. Write C with void compute(const char *input) interface (see runtime.h)
  2. Put in olympus/wasm_tools/<domain>/
  3. Register in olympus/tvm_engine.py NAMED_OPS dict
  4. Done β€” exact execution, ~300ms per query

OpenBLAS Speedup

Prebuilt OpenBLAS at transformer-vm/openblas/. The C++ engine was patched (transformer_blas.cpp) to use cblas_dgemv instead of scalar loops. Projection time went from 80.9s β†’ 28.4s (2.85x), total throughput 7.1K β†’ 10.7K tok/s.

Remaining bottleneck: hull attention at 69% of runtime (std::set allocator pressure). That's Percepta's optimization to make.


GGUF Conversion Pipeline

# Already done, but to reconvert:
py olympus/convert_gguf.py              # Convert all specialists
py olympus/convert_gguf.py --check      # Verify outputs exist
py olympus/convert_gguf.py --specialist code --force  # Reconvert one

Requires: peft, transformers, llama.cpp/ (cloned), gguf package.


New Files This Session

olympus/tvm_engine.py              — Transformer-VM wrapper (compile C→WASM→execute)
olympus/gguf_inference.py          β€” GGUF model loading + generation (SmolLM3 + Qwen)
olympus/convert_gguf.py            β€” LoRA merge + GGUF conversion + quantization
olympus/code_verifier.py           β€” Code execution sandbox + property checker
olympus/wasm_tools/math/*.c        β€” 5 exact computation tools
olympus/wasm_tools/code/lis.c      β€” Longest Increasing Subsequence

Modified Files This Session

olympus/app.py                     β€” Lattice UI: transformer-vm + GGUF + verification pipeline
olympus/router.py                  β€” Three-tier priority: tvm β†’ compiled_arithmetic β†’ specialist
.gitignore                         β€” Exclude transformer-vm/, llama.cpp/, compiled WASM tokens

Verified Results (updated)

Result Value How to reproduce
Transformer-VM throughput 10.7K tok/s (OpenBLAS) cd transformer-vm && py -m uv run wasm-run
All 6 TVM examples 6/6 PASS (hello, addition, collatz, fib, matching, sudoku) Same as above
Our compiled tools 12/12 PASS py -c "from olympus.tvm_engine import TVMEngine; ..."
Code specialist (Qwen 7B) Correct LIS with predecessor tracking Lattice UI
Math specialist Correct garden area + fence posts Lattice UI
QA specialist Correct tidal explanation Lattice UI
Property checker Catches [5,3,7,101] as not increasing py -c "from olympus.code_verifier import check_output_properties; ..."
Router accuracy 50/50 (100%) py olympus/router.py
Compiled arithmetic 30/30 exact py olympus/compiled_arithmetic.py

What To Do Next

Immediate:

  1. Upload specialist LoRA adapters + GGUF to HuggingFace
  2. Add more compiled C tools β€” sort, binary search, matrix operations
  3. Build E8 Wikipedia index for real knowledge retrieval in QA

This week:

  1. Continuous learning loop (OLYMPUS_CONTINUOUS_LEARNING.md)
  2. Web search via Crawl4AI for live information
  3. String operations compiled into C tools (regex, parsing)

Architecture improvements:

  1. Hybrid code generation β€” specialist generates structure, calls transformer-vm for algorithms
  2. Evaluator model β€” larger model checks smaller specialist output (Anthropic harness pattern)
  3. GGUF for general specialist β€” convert base SmolLM3-3B (no LoRA) for general chat

Key External Dependencies

Dependency Location Purpose
transformer-vm transformer-vm/ (git clone) Exact computation engine
wasi-sdk transformer-vm/wasi-sdk/ C-to-WASM compiler
OpenBLAS transformer-vm/openblas/ BLAS acceleration for C++ engine
llama.cpp llama.cpp/ (git clone) GGUF conversion + quantization
Qwen2.5-Coder-7B checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf Code specialist (4.4GB)

How to Resume in a New Session

1. Read this file: OLYMPUS_STATE.md
2. Training is DONE. Pods are STOPPED. Checkpoints are LOCAL.
3. To launch Lattice:
   export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
   py olympus/app.py
   Open http://127.0.0.1:7860
4. Continue with "What To Do Next" list