grapheneaffiliates commited on
Commit
73ccf45
Β·
verified Β·
1 Parent(s): 9fccde5

Upload OLYMPUS_STATE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. OLYMPUS_STATE.md +185 -0
OLYMPUS_STATE.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Olympus Complete State β€” Session Handoff Document
2
+
3
+ **Last updated:** 2026-03-25 (evening)
4
+ **Purpose:** Everything a new Claude Code session needs to continue from exactly where we left off. Read this file first.
5
+
6
+ ---
7
+
8
+ ## Training: COMPLETE. Pods: STOPPED. Checkpoints: LOCAL.
9
+
10
+ All three specialists finished training, checkpoints verified and downloaded, pods stopped.
11
+
12
+ | Specialist | Final Loss | Runtime | Checkpoint | GGUF |
13
+ |-----------|-----------|---------|------------|------|
14
+ | **Code** | 0.768 | 7h24m | `checkpoints/olympus_code/final/` (116MB LoRA) | `checkpoints/gguf/olympus-code-q4_k_m.gguf` (1.8GB) |
15
+ | **Math** | 0.235 | 7h29m | `checkpoints/olympus_math/final/` (116MB LoRA) | `checkpoints/gguf/olympus-math-q4_k_m.gguf` (1.8GB) |
16
+ | **QA** | 1.39 | 7h52m | `checkpoints/olympus_qa/final/` (116MB LoRA) | `checkpoints/gguf/olympus-qa-q4_k_m.gguf` (1.8GB) |
17
+
18
+ **Upgraded code specialist:** `checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf` (4.4GB)
19
+ - Qwen2.5-Coder-7B-Instruct, Q4_K_M quantized
20
+ - Correctly implements predecessor tracking in DP (the bug SmolLM3-3B couldn't fix)
21
+ - ~3.9 tok/s on CPU (vs 7.7 tok/s for 3B, but correct code on first shot)
22
+
23
+ **RunPod:** All pods stopped. Rotate API key.
24
+
25
+ ---
26
+
27
+ ## What's Running (Lattice App)
28
+
29
+ ```bash
30
+ # Launch
31
+ export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
32
+ export PATH="/c/Users/atchi/h4-polytopic-attention/transformer-vm/openblas/bin:$PATH"
33
+ py olympus/app.py
34
+ # Open http://127.0.0.1:7860
35
+ ```
36
+
37
+ ### Three-Tier Compute Engine
38
+
39
+ | Priority | Engine | Speed | Scope |
40
+ |----------|--------|-------|-------|
41
+ | 1 | **transformer-vm** | 10.7K tok/s | Exact: arithmetic, fib, prime, GCD, collatz, LIS |
42
+ | 2 | **compiled_arithmetic** | ~5ms | Fallback: basic arithmetic, zero dependencies |
43
+ | 3 | **Specialist LLMs (GGUF)** | 3-8 tok/s | Language: code, math reasoning, QA |
44
+
45
+ ### Smart Routing
46
+
47
+ - Pure computation ("what is 15*23") β†’ transformer-vm, instant, exact
48
+ - Code request ("write a function for LIS") β†’ transformer-vm computes ground truth + code specialist generates code + property checker verifies
49
+ - Math reasoning ("solve x^2+3x-4=0") β†’ math specialist
50
+ - Factual questions β†’ QA specialist
51
+
52
+ ### Code Verification Pipeline (Sprint Contract Pattern)
53
+
54
+ 1. **Generate** β€” specialist writes Python
55
+ 2. **Execute** β€” runs in subprocess sandbox
56
+ 3. **Properties** β€” checks mathematical invariants (increasing? subsequence? sorted?)
57
+ 4. **Fix** β€” if properties fail, feeds violation back for second attempt
58
+ 5. **Ground truth** β€” transformer-vm provides correct answer for comparison
59
+
60
+ ---
61
+
62
+ ## Transformer-VM Integration
63
+
64
+ **Repo:** `transformer-vm/` (cloned from Percepta-Core/transformer-vm, Apache 2.0)
65
+ **C++ engine:** Compiled with clang++ + OpenBLAS, 10.7K tok/s (was 7K without BLAS)
66
+ **wasi-sdk:** `transformer-vm/wasi-sdk/` for C-to-WASM compilation
67
+
68
+ ### Compiled C Tools (exact computation)
69
+
70
+ ```
71
+ olympus/wasm_tools/math/arithmetic.c β€” +, -, *, /, %, ^ on integers
72
+ olympus/wasm_tools/math/fibonacci.c β€” fib(n)
73
+ olympus/wasm_tools/math/prime_check.c β€” primality test with smallest factor
74
+ olympus/wasm_tools/math/gcd.c β€” GCD + LCM via Euclidean algorithm
75
+ olympus/wasm_tools/math/collatz.c β€” Collatz sequence
76
+ olympus/wasm_tools/code/lis.c β€” Longest Increasing Subsequence (DP + predecessor)
77
+ ```
78
+
79
+ ### Adding New Compiled Tools
80
+
81
+ 1. Write C with `void compute(const char *input)` interface (see `runtime.h`)
82
+ 2. Put in `olympus/wasm_tools/<domain>/`
83
+ 3. Register in `olympus/tvm_engine.py` NAMED_OPS dict
84
+ 4. Done β€” exact execution, ~300ms per query
85
+
86
+ ### OpenBLAS Speedup
87
+
88
+ Prebuilt OpenBLAS at `transformer-vm/openblas/`. The C++ engine was patched (`transformer_blas.cpp`) to use `cblas_dgemv` instead of scalar loops. Projection time went from 80.9s β†’ 28.4s (2.85x), total throughput 7.1K β†’ 10.7K tok/s.
89
+
90
+ Remaining bottleneck: hull attention at 69% of runtime (std::set allocator pressure). That's Percepta's optimization to make.
91
+
92
+ ---
93
+
94
+ ## GGUF Conversion Pipeline
95
+
96
+ ```bash
97
+ # Already done, but to reconvert:
98
+ py olympus/convert_gguf.py # Convert all specialists
99
+ py olympus/convert_gguf.py --check # Verify outputs exist
100
+ py olympus/convert_gguf.py --specialist code --force # Reconvert one
101
+ ```
102
+
103
+ Requires: `peft`, `transformers`, `llama.cpp/` (cloned), `gguf` package.
104
+
105
+ ---
106
+
107
+ ## New Files This Session
108
+
109
+ ```
110
+ olympus/tvm_engine.py — Transformer-VM wrapper (compile C→WASM→execute)
111
+ olympus/gguf_inference.py β€” GGUF model loading + generation (SmolLM3 + Qwen)
112
+ olympus/convert_gguf.py β€” LoRA merge + GGUF conversion + quantization
113
+ olympus/code_verifier.py β€” Code execution sandbox + property checker
114
+ olympus/wasm_tools/math/*.c β€” 5 exact computation tools
115
+ olympus/wasm_tools/code/lis.c β€” Longest Increasing Subsequence
116
+ ```
117
+
118
+ ## Modified Files This Session
119
+
120
+ ```
121
+ olympus/app.py β€” Lattice UI: transformer-vm + GGUF + verification pipeline
122
+ olympus/router.py β€” Three-tier priority: tvm β†’ compiled_arithmetic β†’ specialist
123
+ .gitignore β€” Exclude transformer-vm/, llama.cpp/, compiled WASM tokens
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Verified Results (updated)
129
+
130
+ | Result | Value | How to reproduce |
131
+ |--------|-------|-----------------|
132
+ | Transformer-VM throughput | 10.7K tok/s (OpenBLAS) | `cd transformer-vm && py -m uv run wasm-run` |
133
+ | All 6 TVM examples | 6/6 PASS (hello, addition, collatz, fib, matching, sudoku) | Same as above |
134
+ | Our compiled tools | 12/12 PASS | `py -c "from olympus.tvm_engine import TVMEngine; ..."` |
135
+ | Code specialist (Qwen 7B) | Correct LIS with predecessor tracking | Lattice UI |
136
+ | Math specialist | Correct garden area + fence posts | Lattice UI |
137
+ | QA specialist | Correct tidal explanation | Lattice UI |
138
+ | Property checker | Catches `[5,3,7,101]` as not increasing | `py -c "from olympus.code_verifier import check_output_properties; ..."` |
139
+ | Router accuracy | 50/50 (100%) | `py olympus/router.py` |
140
+ | Compiled arithmetic | 30/30 exact | `py olympus/compiled_arithmetic.py` |
141
+
142
+ ---
143
+
144
+ ## What To Do Next
145
+
146
+ ### Immediate:
147
+ 1. **Upload specialist LoRA adapters + GGUF to HuggingFace**
148
+ 2. **Add more compiled C tools** β€” sort, binary search, matrix operations
149
+ 3. **Build E8 Wikipedia index** for real knowledge retrieval in QA
150
+
151
+ ### This week:
152
+ 4. **Continuous learning loop** (OLYMPUS_CONTINUOUS_LEARNING.md)
153
+ 5. **Web search via Crawl4AI** for live information
154
+ 6. **String operations** compiled into C tools (regex, parsing)
155
+
156
+ ### Architecture improvements:
157
+ 7. **Hybrid code generation** β€” specialist generates structure, calls transformer-vm for algorithms
158
+ 8. **Evaluator model** β€” larger model checks smaller specialist output (Anthropic harness pattern)
159
+ 9. **GGUF for general specialist** β€” convert base SmolLM3-3B (no LoRA) for general chat
160
+
161
+ ---
162
+
163
+ ## Key External Dependencies
164
+
165
+ | Dependency | Location | Purpose |
166
+ |-----------|----------|---------|
167
+ | transformer-vm | `transformer-vm/` (git clone) | Exact computation engine |
168
+ | wasi-sdk | `transformer-vm/wasi-sdk/` | C-to-WASM compiler |
169
+ | OpenBLAS | `transformer-vm/openblas/` | BLAS acceleration for C++ engine |
170
+ | llama.cpp | `llama.cpp/` (git clone) | GGUF conversion + quantization |
171
+ | Qwen2.5-Coder-7B | `checkpoints/gguf/qwen2.5-coder-7b-instruct-q4_k_m.gguf` | Code specialist (4.4GB) |
172
+
173
+ ---
174
+
175
+ ## How to Resume in a New Session
176
+
177
+ ```
178
+ 1. Read this file: OLYMPUS_STATE.md
179
+ 2. Training is DONE. Pods are STOPPED. Checkpoints are LOCAL.
180
+ 3. To launch Lattice:
181
+ export CLANG_PATH="C:\Users\atchi\h4-polytopic-attention\transformer-vm\wasi-sdk\bin\clang.exe"
182
+ py olympus/app.py
183
+ Open http://127.0.0.1:7860
184
+ 4. Continue with "What To Do Next" list
185
+ ```