grapheneaffiliates commited on
Commit
ee46309
Β·
verified Β·
1 Parent(s): fa0f626

Upload OLYMPUS_CONTINUOUS_LEARNING.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. OLYMPUS_CONTINUOUS_LEARNING.md +261 -0
OLYMPUS_CONTINUOUS_LEARNING.md ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Olympus Continuous Learning: Self-Improving Specialist System
2
+
3
+ ## The Core Idea
4
+
5
+ The system identifies its own weaknesses, generates its own training data, trains its own specialists, and integrates them β€” with zero human intervention. The autoresearch pattern that discovered optimal configs in 30 experiments now discovers and fills capability gaps autonomously.
6
+
7
+ Individual specialists are frozen after training. But the system as a whole evolves continuously β€” new specialists appear, the router adapts, the knowledge index grows. Like a brain where individual neurons stabilize but circuits reshape constantly.
8
+
9
+ ## What Already Exists
10
+
11
+ Every component of this loop is already built and proven:
12
+
13
+ | Component | Status | What it does |
14
+ |-----------|--------|-------------|
15
+ | Autoresearch loop | Proven (42+ experiments) | Autonomous try β†’ measure β†’ keep/discard |
16
+ | QLoRA training | Proven (3 specialists training now) | Fine-tune SmolLM3-3B on any domain |
17
+ | Router | Proven (100% on test set) | Classify queries to specialists |
18
+ | E8 knowledge index | Proven (R@5=100%) | Store and retrieve any knowledge |
19
+ | Confidence scoring | Proven (MRR, R@1, perplexity) | Measure response quality |
20
+ | ChamberTree geometry | Proven (16 chambers, <1ms) | Geometric sub-routing |
21
+
22
+ ## The Continuous Learning Loop
23
+
24
+ ```
25
+ FOREVER:
26
+ 1. SERVE β€” Answer queries, track confidence on every response
27
+ 2. DETECT β€” Identify weak domains (low confidence, user corrections)
28
+ 3. CURATE β€” Generate training data for weak domains
29
+ 4. TRAIN β€” Fine-tune new specialist (QLoRA, automated)
30
+ 5. VALIDATE β€” Does the specialist outperform the general model?
31
+ 6. DEPLOY β€” If yes: add to router. If no: discard, try again.
32
+ 7. ADAPT β€” ChamberTree reorganizes for new specialist
33
+ 8. GOTO 1
34
+ ```
35
+
36
+ ### Step 1: SERVE β€” Confidence Tracking
37
+
38
+ Every response includes a confidence score computed from:
39
+
40
+ ```python
41
+ def compute_confidence(query, response, retrieval_results):
42
+ signals = {
43
+ # Model confidence: how sure is the LM about its tokens?
44
+ 'generation_entropy': mean_token_entropy(response),
45
+
46
+ # Retrieval confidence: did we find good context?
47
+ 'retrieval_score': retrieval_results.top_score,
48
+ 'retrieval_gap': retrieval_results.score[0] - retrieval_results.score[1],
49
+
50
+ # Router confidence: was the specialist choice clear?
51
+ 'router_confidence': router_result.confidence,
52
+
53
+ # Length signal: very short responses often mean uncertainty
54
+ 'response_length': len(response.tokens),
55
+ }
56
+
57
+ # Weighted combination
58
+ confidence = (
59
+ 0.3 * (1 - signals['generation_entropy']) +
60
+ 0.3 * signals['retrieval_score'] +
61
+ 0.2 * signals['router_confidence'] +
62
+ 0.2 * min(signals['response_length'] / 50, 1.0)
63
+ )
64
+
65
+ return confidence, signals
66
+ ```
67
+
68
+ Low confidence responses get logged with the query, domain, and failure signals.
69
+
70
+ ### Step 2: DETECT β€” Gap Identification
71
+
72
+ ```python
73
+ def detect_gaps(confidence_log, threshold=0.5, min_failures=20):
74
+ """
75
+ Identify domains where the system consistently underperforms.
76
+
77
+ A 'domain' is identified by:
78
+ - Keyword clustering of low-confidence queries
79
+ - Router chamber distribution of failures
80
+ - User correction patterns (if available)
81
+ """
82
+ # Cluster low-confidence queries by topic
83
+ weak_queries = [q for q, conf in confidence_log if conf < threshold]
84
+
85
+ # Simple keyword extraction for domain identification
86
+ domain_counts = Counter()
87
+ for query in weak_queries:
88
+ keywords = extract_keywords(query) # TF-IDF or simple frequency
89
+ for kw in keywords:
90
+ domain_counts[kw] += 1
91
+
92
+ # Domains with enough failures to justify a specialist
93
+ gaps = [
94
+ domain for domain, count in domain_counts.most_common(10)
95
+ if count >= min_failures
96
+ ]
97
+
98
+ return gaps # e.g., ['chemistry', 'legal', 'spanish']
99
+ ```
100
+
101
+ ### Step 3: CURATE β€” Automated Data Collection
102
+
103
+ ```python
104
+ def curate_training_data(domain, target_examples=10000):
105
+ """
106
+ Automatically gather training data for a new specialist.
107
+
108
+ Sources (in order of preference):
109
+ 1. Existing QA datasets on HuggingFace for this domain
110
+ 2. Wikipedia articles on this topic (already in E8 index)
111
+ 3. Filtered web text from open datasets (FineWeb-Edu, etc.)
112
+ """
113
+ data = []
114
+
115
+ # Check HuggingFace for domain-specific datasets
116
+ hf_datasets = search_huggingface(f"{domain} QA instruction")
117
+ for ds_name in hf_datasets[:3]:
118
+ ds = load_dataset(ds_name)
119
+ data.extend(format_as_instruction_pairs(ds))
120
+
121
+ # Pull relevant passages from E8 knowledge index
122
+ domain_passages = knowledge_index.query(domain, k=1000)
123
+ data.extend(generate_qa_from_passages(domain_passages))
124
+
125
+ # Filter for quality and dedup
126
+ data = deduplicate(data)
127
+ data = filter_quality(data, min_length=50)
128
+
129
+ return data[:target_examples]
130
+ ```
131
+
132
+ ### Step 4: TRAIN β€” Automated QLoRA
133
+
134
+ ```python
135
+ def train_specialist(domain, training_data):
136
+ """
137
+ Same recipe as the 3 specialists training now.
138
+ QLoRA on SmolLM3-3B, automated, no human intervention.
139
+ """
140
+ # Identical to olympus/train_specialist.py
141
+ config = {
142
+ 'base_model': 'HuggingFaceTB/SmolLM3-3B',
143
+ 'lora_r': 16,
144
+ 'lr': 2e-4,
145
+ 'epochs': 2,
146
+ 'max_seq_len': 1024,
147
+ }
148
+
149
+ # Train (GPU: ~2 hours, CPU: ~2 days)
150
+ checkpoint = run_qlora_training(config, training_data)
151
+
152
+ return checkpoint
153
+ ```
154
+
155
+ ### Step 5: VALIDATE β€” Does It Actually Help?
156
+
157
+ ```python
158
+ def validate_specialist(new_specialist, domain, test_queries):
159
+ """
160
+ Compare new specialist vs general model on domain-specific queries.
161
+
162
+ The specialist must BEAT the general model to be deployed.
163
+ This prevents regression β€” bad training data doesn't ship.
164
+ """
165
+ general_scores = []
166
+ specialist_scores = []
167
+
168
+ for query in test_queries:
169
+ # Score both responses
170
+ general_response = general_model.generate(query)
171
+ specialist_response = new_specialist.generate(query)
172
+
173
+ # Compare on multiple metrics
174
+ general_scores.append(score_response(query, general_response))
175
+ specialist_scores.append(score_response(query, specialist_response))
176
+
177
+ improvement = mean(specialist_scores) - mean(general_scores)
178
+
179
+ if improvement > 0.05: # 5% threshold
180
+ return 'deploy', improvement
181
+ else:
182
+ return 'discard', improvement
183
+ ```
184
+
185
+ ### Step 6: DEPLOY β€” Hot-Add to Router
186
+
187
+ ```python
188
+ def deploy_specialist(domain, checkpoint):
189
+ """
190
+ Add new specialist to the running system.
191
+
192
+ 1. Add domain keywords to router
193
+ 2. Assign ChamberTree chambers
194
+ 3. Load specialist (or keep on disk for lazy loading)
195
+ """
196
+ # Update router keywords
197
+ router.add_domain(domain, keywords=extract_domain_keywords(domain))
198
+
199
+ # Assign chambers (take from general's allocation or split)
200
+ router.assign_chambers(domain, chambers=[next_available_chamber()])
201
+
202
+ # Register checkpoint path
203
+ specialist_registry[domain] = checkpoint
204
+
205
+ print(f"Deployed {domain} specialist: {checkpoint}")
206
+ ```
207
+
208
+ ### Step 7: ADAPT β€” ChamberTree Reorganization
209
+
210
+ As new specialists are added, the 16-chamber space gets redistributed. The ChamberTree geometry naturally supports this β€” each specialist gets the chambers whose geometric encoding best matches its domain queries.
211
+
212
+ Over time, the chamber assignments are learned from real routing data rather than hard-coded. A tiny classifier trained on (query_chamber, correct_specialist) pairs replaces the static mapping.
213
+
214
+ ## Example: System Learns Chemistry
215
+
216
+ **Week 1:** User asks chemistry questions. System routes to general specialist. Answers are mediocre. Confidence scores average 0.35 on chemistry queries.
217
+
218
+ **Week 2:** Gap detection triggers: "chemistry" has 50+ low-confidence queries. System searches HuggingFace, finds chemistry QA datasets. Curates 8,000 instruction-response pairs.
219
+
220
+ **Week 3:** System runs QLoRA training on SmolLM3-3B with chemistry data. Takes 3 hours on GPU or 2 days on CPU. Validation shows 15% improvement over general model on chemistry questions.
221
+
222
+ **Week 3 (deploy):** Chemistry specialist added to router. Keywords: "molecule", "element", "reaction", "compound", "pH", "electron", "bond", etc. Assigned ChamberTree chambers 10-11 (split from creative's allocation).
223
+
224
+ **Week 4+:** Chemistry questions route to specialist. Confidence scores average 0.75. System works on the next gap (maybe legal, maybe medical, maybe the user's specific codebase).
225
+
226
+ ## The Scaling Path
227
+
228
+ | Specialists | RAM (ternary) | Disk | Coverage |
229
+ |-------------|--------------|------|----------|
230
+ | 4 (initial) | 600MB active | 2.4GB | General + code + math + QA |
231
+ | 8 | 600MB active | 4.8GB | + chemistry + legal + medical + language |
232
+ | 16 | 600MB active | 9.6GB | + domain-specific (user's docs, codebase) |
233
+ | 32 | 600MB active | 19.2GB | Comprehensive coverage |
234
+
235
+ Active RAM never grows β€” only one specialist loads at a time. Disk grows linearly. A 1TB drive holds 166 specialists. Each one costs $2-3 in GPU time to train.
236
+
237
+ ## What Needs to Be Built
238
+
239
+ 1. **Confidence scorer** β€” wrap every response with quality signals (partially exists in eval code)
240
+ 2. **Gap detector** β€” cluster low-confidence queries by domain (new, ~100 lines)
241
+ 3. **Data curator** β€” search HuggingFace + pull from E8 index (new, ~200 lines)
242
+ 4. **Training trigger** β€” auto-launch QLoRA when gap exceeds threshold (new, wraps existing training script)
243
+ 5. **Validation pipeline** β€” A/B test new specialist vs general (new, ~150 lines)
244
+ 6. **Hot deployment** β€” add specialist to router without restart (router already supports dynamic keyword addition)
245
+
246
+ Total new code: ~500 lines. Everything else reuses existing infrastructure.
247
+
248
+ ## Why This Works
249
+
250
+ The autoresearch loop already proved the pattern:
251
+ - **Autonomous experimentation**: 42+ experiments, zero human intervention
252
+ - **Quality-gated deployment**: keep what works, discard what doesn't
253
+ - **Incremental improvement**: each experiment builds on the best so far
254
+
255
+ Continuous specialist learning is the same pattern applied at the system level instead of the hyperparameter level. The infrastructure is identical. The difference is what's being optimized: not learning rate, but capability.
256
+
257
+ **The system doesn't just answer questions. It learns to answer questions it couldn't answer before.**
258
+
259
+ ---
260
+
261
+ *This document describes the continuous learning roadmap for Project Olympus. The foundation (specialists, router, retrieval, training pipeline) is built. The continuous learning loop is the next phase.*