Upload PROJECT_OLYMPUS.md with huggingface_hub
Browse files- PROJECT_OLYMPUS.md +155 -118
PROJECT_OLYMPUS.md
CHANGED
|
@@ -1,27 +1,21 @@
|
|
| 1 |
# Project Olympus: Frontier-Quality AI on CPU
|
| 2 |
|
| 3 |
## Goal
|
| 4 |
-
Build a system that approaches frontier model quality (Claude Opus, GPT-4 class)
|
| 5 |
-
running entirely on CPU hardware, using only legally clean open-source models and data.
|
| 6 |
-
No GPU. No API dependency. No monthly cost. No legal risk.
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
| 9 |
|
| 10 |
## The Core Insight
|
| 11 |
|
| 12 |
-
Claude Opus is one giant model that memorizes everything in its weights.
|
| 13 |
-
We build many small specialists that know their domain deeply and retrieve
|
| 14 |
-
everything else from a geometric knowledge index.
|
| 15 |
|
| 16 |
The difference:
|
| 17 |
-
- Opus: 200B+ params
|
| 18 |
-
- Ours:
|
| 19 |
- The gap is filled by E8 lattice retrieval (R@5=100%) from a knowledge index.
|
| 20 |
|
| 21 |
-
A
|
| 22 |
-
to a 200B model that memorized those facts — for the user, the answer
|
| 23 |
-
is the same. The 200B model is faster at raw generation. Ours is cheaper,
|
| 24 |
-
private, and never hallucinates on indexed knowledge.
|
| 25 |
|
| 26 |
## What's Already Proven
|
| 27 |
|
|
@@ -37,51 +31,82 @@ This project builds on the H4 Polytopic Attention foundation (7 phases, all test
|
|
| 37 |
| CPU training | Proven | 24M ternary params, 8 hours, coherent English |
|
| 38 |
| Autoresearch | Proven | 42+ autonomous experiments, finds optimal configs |
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## Legal Foundation
|
| 41 |
|
| 42 |
-
**This is NOT distillation.** We do not use outputs from proprietary models as
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
### Base Models (published with explicit fine-tuning permission)
|
| 46 |
-
| Model | Params | License | Source |
|
| 47 |
-
|-------|--------|---------|--------|
|
| 48 |
-
| SmolLM2-1.7B | 1.7B | Apache 2.0 | huggingface/SmolLM2-1.7B-Instruct |
|
| 49 |
-
| SmolLM2-360M | 360M | Apache 2.0 | huggingface/SmolLM2-360M-Instruct |
|
| 50 |
-
| OLMo-1B | 1B | Apache 2.0 | allenai/OLMo-1B |
|
| 51 |
-
| OLMo-7B | 7B | Apache 2.0 | allenai/OLMo-7B |
|
| 52 |
-
| TinyLlama-1.1B | 1.1B | Apache 2.0 | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
| 53 |
-
| Qwen2.5-0.5B | 0.5B | Apache 2.0 | Qwen/Qwen2.5-0.5B-Instruct |
|
| 54 |
-
| Qwen2.5-1.5B | 1.5B | Apache 2.0 | Qwen/Qwen2.5-1.5B-Instruct |
|
| 55 |
-
|
| 56 |
-
Apache 2.0 means: use for any purpose, modify, distribute, commercial use.
|
| 57 |
-
These are not distilled — they were trained from scratch on open data by their
|
| 58 |
-
respective organizations and released specifically for the community to use.
|
| 59 |
-
|
| 60 |
-
### Training Data (all openly licensed)
|
| 61 |
-
| Dataset | Size | License | HuggingFace ID |
|
| 62 |
-
|---------|------|---------|----------------|
|
| 63 |
-
| SlimPajama | 627B tokens | Apache 2.0 | cerebras/SlimPajama-627B |
|
| 64 |
-
| FineWeb-Edu | 1.3T tokens | ODC-By 1.0 | HuggingFaceFW/fineweb-edu |
|
| 65 |
-
| The Stack v2 | 3.3T tokens | Per-file license | bigcode/the-stack-v2 |
|
| 66 |
-
| OpenWebMath | 14.7B tokens | ODC-By 1.0 | open-web-math/open-web-math |
|
| 67 |
-
| OpenAssistant 2 | 161K messages | Apache 2.0 | OpenAssistant/oasst2 |
|
| 68 |
-
| Dolly 15K | 15K instructions | CC-BY-SA | databricks/databricks-dolly-15k |
|
| 69 |
-
| FLAN Collection | Millions | Apache 2.0 | Muennighoff/flan |
|
| 70 |
-
| Natural Questions | 307K pairs | CC-BY-SA | google-research/natural-questions |
|
| 71 |
-
| SQuAD 2.0 | 150K pairs | CC-BY-SA | rajpurkar/squad_v2 |
|
| 72 |
-
| GSM8K | 8.5K problems | MIT | openai/gsm8k |
|
| 73 |
-
| ARC | 7.7K questions | CC-BY-SA | allenai/ai2_arc |
|
| 74 |
-
| Project Gutenberg | 70K books | Public domain | aleph_alpha/gutenberg |
|
| 75 |
-
| Wikipedia | ~4B tokens | CC-BY-SA | wikimedia/wikipedia |
|
| 76 |
-
| CNN/DailyMail | 300K articles | Apache 2.0 | abisee/cnn_dailymail |
|
| 77 |
-
|
| 78 |
-
### Reranking Model
|
| 79 |
| Model | Params | License | HuggingFace ID |
|
| 80 |
|-------|--------|---------|----------------|
|
|
|
|
| 81 |
| ms-marco-MiniLM-L-6-v2 | 22M | Apache 2.0 | cross-encoder/ms-marco-MiniLM-L-6-v2 |
|
| 82 |
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
## Architecture
|
| 87 |
|
|
@@ -95,16 +120,16 @@ the use case we're building. No terms of service violations. No gray areas.
|
|
| 95 |
| (16 chambers) | Coxeter chamber classification
|
| 96 |
+----------+----------+
|
| 97 |
|
|
| 98 |
-
+--------------
|
| 99 |
-
|
|
| 100 |
-
v
|
| 101 |
-
+--------
|
| 102 |
-
|
|
| 103 |
-
|
|
| 104 |
-
|
|
| 105 |
-
+---
|
| 106 |
-
|
| 107 |
-
|
| 108 |
|
|
| 109 |
v
|
| 110 |
+---------------------+
|
|
@@ -125,81 +150,93 @@ the use case we're building. No terms of service violations. No gray areas.
|
|
| 125 |
Response
|
| 126 |
```
|
| 127 |
|
| 128 |
-
### Why
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
-
|
| 131 |
-
You don't need 32GB to hold all 6 simultaneously. The router selects
|
| 132 |
-
the right specialist, loads it (or keeps hot specialists cached),
|
| 133 |
-
generates the response, and moves on.
|
| 134 |
|
| 135 |
-
|
| 136 |
-
writes better code than a 1.7B model fine-tuned on everything.
|
| 137 |
-
Specialization > generalization at small scale.
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
|
|
|
| 142 |
|
| 143 |
-
|
| 144 |
-
queries by their H4 chamber in <1ms. No routing model needed.
|
| 145 |
-
No additional parameters. Pure geometry.
|
| 146 |
|
| 147 |
-
##
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|---|-----------|-----------|----------------|--------|----------|
|
| 151 |
-
| 1 | General + Instructions | SmolLM2-1.7B | OpenAssistant + Dolly + FLAN | ~50M | 0-2 |
|
| 152 |
-
| 2 | Code | SmolLM2-1.7B | The Stack v2 (Python, JS, Rust, C) | ~100M | 3-5 |
|
| 153 |
-
| 3 | Math/Reasoning | SmolLM2-360M | GSM8K + ARC + MATH | ~20M | 6-7 |
|
| 154 |
-
| 4 | Factual QA | SmolLM2-360M | NQ + SQuAD + TriviaQA | ~30M | 8-9 |
|
| 155 |
-
| 5 | Creative Writing | SmolLM2-360M | Project Gutenberg | ~50M | 10-12 |
|
| 156 |
-
| 6 | Summarization | SmolLM2-360M | CNN/DailyMail + XSum | ~20M | 13-15 |
|
| 157 |
|
| 158 |
-
|
|
|
|
|
|
|
|
|
|
| 159 |
|
| 160 |
-
|
| 161 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
-
|
| 164 |
-
2. **Hybrid phase:** Both pathways train together, gate opens
|
| 165 |
-
3. **Full swap:** Gate = 1.0, remove original attention
|
| 166 |
-
4. **Quantize:** BitLinear ternary compression (13.8x)
|
| 167 |
|
| 168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
-
|
| 171 |
-
### Parallel (6 CPUs): ~3-4 weeks
|
| 172 |
-
### Cost: 6 cloud VMs × $0.05/hr × 14 days = **$100.80** (or $0 with 6 friends' laptops)
|
| 173 |
|
| 174 |
## Honest Quality Expectations
|
| 175 |
|
| 176 |
-
|
|
| 177 |
-
|------------|-------------|---------
|
| 178 |
-
|
|
| 179 |
-
|
|
| 180 |
-
|
|
| 181 |
-
|
|
| 182 |
-
|
|
| 183 |
-
| Long context |
|
| 184 |
-
|
|
| 185 |
-
| **
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
|
|
|
| 190 |
- 75-85% on instruction following (good enough for most tasks)
|
| 191 |
-
-
|
| 192 |
-
-
|
| 193 |
|
| 194 |
## The Vision
|
| 195 |
|
| 196 |
-
A laptop running
|
| 197 |
-
unlimited knowledge retrieval from E8 lattice memory. Not as good as
|
| 198 |
-
Claude Opus at everything. But good enough at most things, free to run,
|
| 199 |
-
private by default, and available to anyone with a computer.
|
| 200 |
|
| 201 |
-
**That's not a replacement for frontier models. It's an alternative
|
| 202 |
-
for the billions of people who can't afford them.**
|
| 203 |
|
| 204 |
---
|
| 205 |
|
|
|
|
| 1 |
# Project Olympus: Frontier-Quality AI on CPU
|
| 2 |
|
| 3 |
## Goal
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
Build a system that approaches frontier model quality (Claude Opus, GPT-4 class) running entirely on CPU hardware, using only legally clean open-source models and data. No GPU. No API dependency. No monthly cost. No legal risk.
|
| 6 |
+
|
| 7 |
+
**This is for the billions of people who can't afford frontier AI subscriptions and GPU compute.** Good-enough answers on free hardware beat perfect answers on expensive hardware --- for education, small business, developing nations, and anyone who values privacy and independence.
|
| 8 |
|
| 9 |
## The Core Insight
|
| 10 |
|
| 11 |
+
Claude Opus is one giant model that memorizes everything in its weights. We build focused specialists that know their domain deeply and retrieve everything else from a geometric knowledge index.
|
|
|
|
|
|
|
| 12 |
|
| 13 |
The difference:
|
| 14 |
+
- Opus: 200B+ params x 16 bits = ~400GB weights. Needs GPU cluster.
|
| 15 |
+
- Ours: 4 specialists x 3B params x 1.58 bits = ~2.4GB total. Runs on laptop.
|
| 16 |
- The gap is filled by E8 lattice retrieval (R@5=100%) from a knowledge index.
|
| 17 |
|
| 18 |
+
A 3B model that can look up any fact in 20ms is functionally equivalent to a 200B model that memorized those facts --- for the user, the answer is the same.
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## What's Already Proven
|
| 21 |
|
|
|
|
| 31 |
| CPU training | Proven | 24M ternary params, 8 hours, coherent English |
|
| 32 |
| Autoresearch | Proven | 42+ autonomous experiments, finds optimal configs |
|
| 33 |
|
| 34 |
+
## The Base Model: SmolLM3-3B-Instruct
|
| 35 |
+
|
| 36 |
+
**HuggingFace ID:** `HuggingFaceTB/SmolLM3-3B-Instruct`
|
| 37 |
+
|
| 38 |
+
SmolLM3-3B (July 2025) is the correct base model. Using anything smaller would leave performance on the table:
|
| 39 |
+
|
| 40 |
+
- **11.2T training tokens** (vs 2T for SmolLM2)
|
| 41 |
+
- **128K context window** (vs 8K for SmolLM2)
|
| 42 |
+
- **Dual-mode reasoning** (thinking + direct)
|
| 43 |
+
- **Outperforms** Llama 3.2 3B, Qwen 2.5 3B on every benchmark
|
| 44 |
+
- **Apache 2.0 license** --- full commercial use
|
| 45 |
+
- **Full training recipe published** (data mixtures, hyperparameters, ablations)
|
| 46 |
+
- **Tool calling support** built in
|
| 47 |
+
|
| 48 |
+
### Why SmolLM3-3B over other options
|
| 49 |
+
|
| 50 |
+
| Model | Params | License | Context | Trained on | Notes |
|
| 51 |
+
|-------|--------|---------|---------|------------|-------|
|
| 52 |
+
| **SmolLM3-3B** | **3B** | **Apache 2.0** | **128K** | **11.2T tokens** | **Best in class, fully open** |
|
| 53 |
+
| Phi-4-mini | 3.8B | MIT | 128K | Proprietary mix | Slightly larger, MIT is fine too |
|
| 54 |
+
| Qwen2.5-3B | 3B | Apache 2.0 | 32K | Unknown size | Older, lower benchmarks |
|
| 55 |
+
| Llama 3.2 3B | 3B | Llama License | 128K | ~10T? | Meta license has usage limits |
|
| 56 |
+
| SmolLM2-1.7B | 1.7B | Apache 2.0 | 8K | 2T tokens | Obsoleted by SmolLM3 |
|
| 57 |
+
|
| 58 |
+
### Ternary size
|
| 59 |
+
|
| 60 |
+
- Float32: 3B x 4 bytes = 12 GB
|
| 61 |
+
- Float16: 3B x 2 bytes = 6 GB
|
| 62 |
+
- **Ternary (1.58 bit): 3B x 0.2 bytes = ~600 MB**
|
| 63 |
+
- With optimizer states for fine-tuning: ~4-8 GB total in RAM
|
| 64 |
+
- **Fits comfortably in 32 GB RAM for fine-tuning on CPU**
|
| 65 |
+
|
| 66 |
## Legal Foundation
|
| 67 |
|
| 68 |
+
**This is NOT distillation.** We do not use outputs from proprietary models as training data. Every component is legally clean.
|
| 69 |
+
|
| 70 |
+
### Base Models (all Apache 2.0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
| Model | Params | License | HuggingFace ID |
|
| 72 |
|-------|--------|---------|----------------|
|
| 73 |
+
| SmolLM3-3B-Instruct | 3B | Apache 2.0 | HuggingFaceTB/SmolLM3-3B-Instruct |
|
| 74 |
| ms-marco-MiniLM-L-6-v2 | 22M | Apache 2.0 | cross-encoder/ms-marco-MiniLM-L-6-v2 |
|
| 75 |
|
| 76 |
+
### Fine-tuning Data (all openly licensed)
|
| 77 |
+
|
| 78 |
+
**Code Specialist:**
|
| 79 |
+
| Dataset | Size | License | HuggingFace ID |
|
| 80 |
+
|---------|------|---------|----------------|
|
| 81 |
+
| The Stack v2 (filtered) | ~100M tokens | Per-file | bigcode/the-stack-v2 |
|
| 82 |
+
| CodeAlpaca 20K | 20K instructions | Apache 2.0 | sahil2801/CodeAlpaca-20k |
|
| 83 |
+
| CodeFeedback | 66K examples | Apache 2.0 | m-a-p/CodeFeedback-Filtered-Instruction |
|
| 84 |
+
| Evol-Instruct-Code | 110K | Apache 2.0 | nickrosh/Evol-Instruct-Code-80k-v1 |
|
| 85 |
+
|
| 86 |
+
**Math/Reasoning Specialist:**
|
| 87 |
+
| Dataset | Size | License | HuggingFace ID |
|
| 88 |
+
|---------|------|---------|----------------|
|
| 89 |
+
| MetaMathQA | 395K | MIT | meta-math/MetaMathQA |
|
| 90 |
+
| OpenMathInstruct v2 | 1.8M | Permissive | nvidia/OpenMathInstruct-2 |
|
| 91 |
+
| GSM8K | 8.5K | MIT | openai/gsm8k |
|
| 92 |
+
| MATH | 12.5K | MIT | hendrycks/competition_math |
|
| 93 |
+
| ARC | 7.7K | CC-BY-SA | allenai/ai2_arc |
|
| 94 |
+
|
| 95 |
+
**QA/Retrieval Specialist:**
|
| 96 |
+
| Dataset | Size | License | HuggingFace ID |
|
| 97 |
+
|---------|------|---------|----------------|
|
| 98 |
+
| Natural Questions | 307K | CC-BY-SA | google-research-datasets/nq_open |
|
| 99 |
+
| SQuAD 2.0 | 150K | CC-BY-SA | rajpurkar/squad_v2 |
|
| 100 |
+
| TriviaQA | 95K | Apache 2.0 | mandarjoshi/trivia_qa |
|
| 101 |
+
| HotpotQA | 113K | CC-BY-SA | hotpot_qa |
|
| 102 |
+
|
| 103 |
+
**Knowledge Index:**
|
| 104 |
+
| Source | Size | License | Notes |
|
| 105 |
+
|--------|------|---------|-------|
|
| 106 |
+
| Wikipedia EN | ~4B tokens | CC-BY-SA | All human knowledge |
|
| 107 |
+
| Stack Overflow | ~10GB | CC-BY-SA | Programming Q&A |
|
| 108 |
+
| Project Gutenberg | 70K books | Public domain | Literature |
|
| 109 |
+
| User's own docs | Variable | N/A | Custom knowledge base |
|
| 110 |
|
| 111 |
## Architecture
|
| 112 |
|
|
|
|
| 120 |
| (16 chambers) | Coxeter chamber classification
|
| 121 |
+----------+----------+
|
| 122 |
|
|
| 123 |
+
+-------+-------+-------+
|
| 124 |
+
| | | |
|
| 125 |
+
v v v v
|
| 126 |
+
+--------+ +------+ +------+ +------+
|
| 127 |
+
|General | | Code | | Math | | QA | 4 specialists
|
| 128 |
+
| (3B) | | (3B) | | (3B) | | (3B)| SmolLM3-3B base
|
| 129 |
+
| as-is | | FT'd | | FT'd | | FT'd| Ternary weights
|
| 130 |
+
+---+----+ +--+---+ +--+---+ +--+---+
|
| 131 |
+
| | | |
|
| 132 |
+
+----------+--------+--------+
|
| 133 |
|
|
| 134 |
v
|
| 135 |
+---------------------+
|
|
|
|
| 150 |
Response
|
| 151 |
```
|
| 152 |
|
| 153 |
+
### Why 4 Specialists Instead of 6
|
| 154 |
+
|
| 155 |
+
With SmolLM3-3B as the base (much stronger than SmolLM2-1.7B), we don't need 6 specialists. The base model is already strong at conversation, creative writing, and summarization. We only specialize where it matters:
|
| 156 |
+
|
| 157 |
+
| # | Specialist | Base | Fine-tuning | Why Separate |
|
| 158 |
+
|---|-----------|------|-------------|--------------|
|
| 159 |
+
| 0 | General | SmolLM3-3B-Instruct AS-IS | None needed | Already instruction-tuned |
|
| 160 |
+
| 1 | Code | SmolLM3-3B + code data | ~200M tokens | Code needs 80%+ domain data |
|
| 161 |
+
| 2 | Math | SmolLM3-3B + math data | ~100M tokens | Weakest area for small models |
|
| 162 |
+
| 3 | QA | SmolLM3-3B + retrieval QA | ~150M tokens | Learn to answer FROM context |
|
| 163 |
+
|
| 164 |
+
**Total active RAM: ~600MB** (one specialist loaded at a time) + 90MB MiniLM + E8 index
|
| 165 |
+
|
| 166 |
+
## H4 Attention Integration
|
| 167 |
|
| 168 |
+
SmolLM3 uses GQA with 4 groups --- maps naturally to H4's 4 Coxeter simple roots.
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
+
**Progressive swap in 4 phases:**
|
|
|
|
|
|
|
| 171 |
|
| 172 |
+
1. **Adapter (Days 1-3):** Freeze SmolLM3, add H4 adapter parallel to each GQA layer. Gate starts at 0. Train only H4 params.
|
| 173 |
+
2. **Hybrid (Days 3-7):** Unfreeze SmolLM3 attention. Both paths train. Monitor which layers prefer H4.
|
| 174 |
+
3. **Selective swap (Days 7-10):** Layers with gate >0.8 keep only H4. Layers with gate <0.3 keep only original. Others stay hybrid.
|
| 175 |
+
4. **Ternary (Day 10):** Apply BitLinear to H4 layers. Export final model.
|
| 176 |
|
| 177 |
+
**What this gives you:** O(log t) attention for long sequences (SmolLM3's 128K context is O(t^2) via Flash Attention), ternary attention weights (600MB), and E8 lattice integration for retrieval.
|
|
|
|
|
|
|
| 178 |
|
| 179 |
+
## Fine-Tuning: QLoRA on CPU
|
| 180 |
|
| 181 |
+
Full fine-tuning of 3B params on CPU is slow. QLoRA is 3-6x faster because only 1-2% of parameters get gradients:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
| 183 |
+
| Method | Step time | Steps/day | Trainable params |
|
| 184 |
+
|--------|-----------|-----------|-----------------|
|
| 185 |
+
| Full fine-tune 3B on CPU | ~3s | ~28K | 3B (100%) |
|
| 186 |
+
| **QLoRA 3B on CPU** | **~0.5-1s** | **~86-170K** | **~20-50M (1-2%)** |
|
| 187 |
|
| 188 |
+
### Per-specialist training budget
|
| 189 |
+
| Specialist | Tokens | Steps | Time |
|
| 190 |
+
|------------|--------|-------|------|
|
| 191 |
+
| Code | 200M | ~50K | 1-2 days |
|
| 192 |
+
| Math | 100M | ~25K | 0.5-1 day |
|
| 193 |
+
| QA | 150M | ~37K | 1-1.5 days |
|
| 194 |
+
| **Total** | **450M** | **~112K** | **3-5 days** |
|
| 195 |
|
| 196 |
+
## The 14-Day Plan
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
+
| Day | Task | Validation |
|
| 199 |
+
|-----|------|------------|
|
| 200 |
+
| 1 | Download SmolLM3, verify, setup QLoRA | Generates text OK |
|
| 201 |
+
| 2 | Fine-tune code specialist | Writes Python functions |
|
| 202 |
+
| 3 | Fine-tune math specialist | Solves GSM8K problems |
|
| 203 |
+
| 4 | Fine-tune QA specialist | Answers from context |
|
| 204 |
+
| 5-6 | H4 progressive swap Phase 1 | Perplexity within 5% |
|
| 205 |
+
| 7-8 | H4 progressive swap Phase 2 | Gate values meaningful |
|
| 206 |
+
| 9-10 | H4 selective swap + ternary | Chamber preservation >80% |
|
| 207 |
+
| 11 | ChamberTree router | Routes correctly |
|
| 208 |
+
| 12 | E8 knowledge index (Wikipedia) | Retrieval finds facts |
|
| 209 |
+
| 13 | Integration + demo | End-to-end works |
|
| 210 |
+
| 14 | Benchmarks + upload to HF | Numbers documented |
|
| 211 |
|
| 212 |
+
**Cost:** 3-5 days specialist training + 6-9 days H4 swap = ~10-14 days total. On cloud: ~$50-100. On laptops: $0.
|
|
|
|
|
|
|
| 213 |
|
| 214 |
## Honest Quality Expectations
|
| 215 |
|
| 216 |
+
| Task | SmolLM3-3B base | + Specialist FT | + E8 Retrieval | Opus |
|
| 217 |
+
|------|----------------|-----------------|----------------|------|
|
| 218 |
+
| MMLU | ~60% | ~62% | ~70-75% | ~88% |
|
| 219 |
+
| HumanEval | ~45% | ~55-65% | N/A | ~85% |
|
| 220 |
+
| GSM8K | ~55% | ~65-75% | N/A | ~95% |
|
| 221 |
+
| TriviaQA | ~50% | ~55% | **~85-90%** | ~90% |
|
| 222 |
+
| Instruction | ~80% | ~82% | N/A | ~95% |
|
| 223 |
+
| Long context | Good to 128K | Same | Better | 200K |
|
| 224 |
+
| **Cost** | **$0** | **$0** | **$0** | **$$$** |
|
| 225 |
+
| **Privacy** | **Local** | **Local** | **Local** | **Cloud** |
|
| 226 |
+
|
| 227 |
+
The retrieval-augmented factual QA (85-90%) is where we compete directly with frontier models. Everything else is 60-85% of Opus.
|
| 228 |
+
|
| 229 |
+
**This is NOT Claude Opus quality across the board.** It IS:
|
| 230 |
+
- 85-90% on factual QA (retrieval advantage --- the model looks up facts instead of hallucinating)
|
| 231 |
- 75-85% on instruction following (good enough for most tasks)
|
| 232 |
+
- 55-75% on code and math (honest gap --- complex reasoning needs more params)
|
| 233 |
+
- Free, private, local, legally clean, and improvable by the community
|
| 234 |
|
| 235 |
## The Vision
|
| 236 |
|
| 237 |
+
A laptop running 4 focused specialists, routed by H4 geometry in <1ms, backed by unlimited knowledge retrieval from E8 lattice memory in 20ms, reranked to 98.5% accuracy. Not as good as Claude Opus at everything. But good enough at most things, free to run, private by default, and available to anyone with a computer.
|
|
|
|
|
|
|
|
|
|
| 238 |
|
| 239 |
+
**That's not a replacement for frontier models. It's an alternative for the billions of people who can't afford them.**
|
|
|
|
| 240 |
|
| 241 |
---
|
| 242 |
|