Upload OLYMPUS_CONTINUOUS_LEARNING.md with huggingface_hub
Browse files- OLYMPUS_CONTINUOUS_LEARNING.md +261 -0
OLYMPUS_CONTINUOUS_LEARNING.md
ADDED
|
@@ -0,0 +1,261 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Olympus Continuous Learning: Self-Improving Specialist System
|
| 2 |
+
|
| 3 |
+
## The Core Idea
|
| 4 |
+
|
| 5 |
+
The system identifies its own weaknesses, generates its own training data, trains its own specialists, and integrates them β with zero human intervention. The autoresearch pattern that discovered optimal configs in 30 experiments now discovers and fills capability gaps autonomously.
|
| 6 |
+
|
| 7 |
+
Individual specialists are frozen after training. But the system as a whole evolves continuously β new specialists appear, the router adapts, the knowledge index grows. Like a brain where individual neurons stabilize but circuits reshape constantly.
|
| 8 |
+
|
| 9 |
+
## What Already Exists
|
| 10 |
+
|
| 11 |
+
Every component of this loop is already built and proven:
|
| 12 |
+
|
| 13 |
+
| Component | Status | What it does |
|
| 14 |
+
|-----------|--------|-------------|
|
| 15 |
+
| Autoresearch loop | Proven (42+ experiments) | Autonomous try β measure β keep/discard |
|
| 16 |
+
| QLoRA training | Proven (3 specialists training now) | Fine-tune SmolLM3-3B on any domain |
|
| 17 |
+
| Router | Proven (100% on test set) | Classify queries to specialists |
|
| 18 |
+
| E8 knowledge index | Proven (R@5=100%) | Store and retrieve any knowledge |
|
| 19 |
+
| Confidence scoring | Proven (MRR, R@1, perplexity) | Measure response quality |
|
| 20 |
+
| ChamberTree geometry | Proven (16 chambers, <1ms) | Geometric sub-routing |
|
| 21 |
+
|
| 22 |
+
## The Continuous Learning Loop
|
| 23 |
+
|
| 24 |
+
```
|
| 25 |
+
FOREVER:
|
| 26 |
+
1. SERVE β Answer queries, track confidence on every response
|
| 27 |
+
2. DETECT β Identify weak domains (low confidence, user corrections)
|
| 28 |
+
3. CURATE β Generate training data for weak domains
|
| 29 |
+
4. TRAIN β Fine-tune new specialist (QLoRA, automated)
|
| 30 |
+
5. VALIDATE β Does the specialist outperform the general model?
|
| 31 |
+
6. DEPLOY β If yes: add to router. If no: discard, try again.
|
| 32 |
+
7. ADAPT β ChamberTree reorganizes for new specialist
|
| 33 |
+
8. GOTO 1
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
### Step 1: SERVE β Confidence Tracking
|
| 37 |
+
|
| 38 |
+
Every response includes a confidence score computed from:
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
def compute_confidence(query, response, retrieval_results):
|
| 42 |
+
signals = {
|
| 43 |
+
# Model confidence: how sure is the LM about its tokens?
|
| 44 |
+
'generation_entropy': mean_token_entropy(response),
|
| 45 |
+
|
| 46 |
+
# Retrieval confidence: did we find good context?
|
| 47 |
+
'retrieval_score': retrieval_results.top_score,
|
| 48 |
+
'retrieval_gap': retrieval_results.score[0] - retrieval_results.score[1],
|
| 49 |
+
|
| 50 |
+
# Router confidence: was the specialist choice clear?
|
| 51 |
+
'router_confidence': router_result.confidence,
|
| 52 |
+
|
| 53 |
+
# Length signal: very short responses often mean uncertainty
|
| 54 |
+
'response_length': len(response.tokens),
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
# Weighted combination
|
| 58 |
+
confidence = (
|
| 59 |
+
0.3 * (1 - signals['generation_entropy']) +
|
| 60 |
+
0.3 * signals['retrieval_score'] +
|
| 61 |
+
0.2 * signals['router_confidence'] +
|
| 62 |
+
0.2 * min(signals['response_length'] / 50, 1.0)
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
return confidence, signals
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Low confidence responses get logged with the query, domain, and failure signals.
|
| 69 |
+
|
| 70 |
+
### Step 2: DETECT β Gap Identification
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
def detect_gaps(confidence_log, threshold=0.5, min_failures=20):
|
| 74 |
+
"""
|
| 75 |
+
Identify domains where the system consistently underperforms.
|
| 76 |
+
|
| 77 |
+
A 'domain' is identified by:
|
| 78 |
+
- Keyword clustering of low-confidence queries
|
| 79 |
+
- Router chamber distribution of failures
|
| 80 |
+
- User correction patterns (if available)
|
| 81 |
+
"""
|
| 82 |
+
# Cluster low-confidence queries by topic
|
| 83 |
+
weak_queries = [q for q, conf in confidence_log if conf < threshold]
|
| 84 |
+
|
| 85 |
+
# Simple keyword extraction for domain identification
|
| 86 |
+
domain_counts = Counter()
|
| 87 |
+
for query in weak_queries:
|
| 88 |
+
keywords = extract_keywords(query) # TF-IDF or simple frequency
|
| 89 |
+
for kw in keywords:
|
| 90 |
+
domain_counts[kw] += 1
|
| 91 |
+
|
| 92 |
+
# Domains with enough failures to justify a specialist
|
| 93 |
+
gaps = [
|
| 94 |
+
domain for domain, count in domain_counts.most_common(10)
|
| 95 |
+
if count >= min_failures
|
| 96 |
+
]
|
| 97 |
+
|
| 98 |
+
return gaps # e.g., ['chemistry', 'legal', 'spanish']
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### Step 3: CURATE β Automated Data Collection
|
| 102 |
+
|
| 103 |
+
```python
|
| 104 |
+
def curate_training_data(domain, target_examples=10000):
|
| 105 |
+
"""
|
| 106 |
+
Automatically gather training data for a new specialist.
|
| 107 |
+
|
| 108 |
+
Sources (in order of preference):
|
| 109 |
+
1. Existing QA datasets on HuggingFace for this domain
|
| 110 |
+
2. Wikipedia articles on this topic (already in E8 index)
|
| 111 |
+
3. Filtered web text from open datasets (FineWeb-Edu, etc.)
|
| 112 |
+
"""
|
| 113 |
+
data = []
|
| 114 |
+
|
| 115 |
+
# Check HuggingFace for domain-specific datasets
|
| 116 |
+
hf_datasets = search_huggingface(f"{domain} QA instruction")
|
| 117 |
+
for ds_name in hf_datasets[:3]:
|
| 118 |
+
ds = load_dataset(ds_name)
|
| 119 |
+
data.extend(format_as_instruction_pairs(ds))
|
| 120 |
+
|
| 121 |
+
# Pull relevant passages from E8 knowledge index
|
| 122 |
+
domain_passages = knowledge_index.query(domain, k=1000)
|
| 123 |
+
data.extend(generate_qa_from_passages(domain_passages))
|
| 124 |
+
|
| 125 |
+
# Filter for quality and dedup
|
| 126 |
+
data = deduplicate(data)
|
| 127 |
+
data = filter_quality(data, min_length=50)
|
| 128 |
+
|
| 129 |
+
return data[:target_examples]
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
### Step 4: TRAIN β Automated QLoRA
|
| 133 |
+
|
| 134 |
+
```python
|
| 135 |
+
def train_specialist(domain, training_data):
|
| 136 |
+
"""
|
| 137 |
+
Same recipe as the 3 specialists training now.
|
| 138 |
+
QLoRA on SmolLM3-3B, automated, no human intervention.
|
| 139 |
+
"""
|
| 140 |
+
# Identical to olympus/train_specialist.py
|
| 141 |
+
config = {
|
| 142 |
+
'base_model': 'HuggingFaceTB/SmolLM3-3B',
|
| 143 |
+
'lora_r': 16,
|
| 144 |
+
'lr': 2e-4,
|
| 145 |
+
'epochs': 2,
|
| 146 |
+
'max_seq_len': 1024,
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
# Train (GPU: ~2 hours, CPU: ~2 days)
|
| 150 |
+
checkpoint = run_qlora_training(config, training_data)
|
| 151 |
+
|
| 152 |
+
return checkpoint
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
### Step 5: VALIDATE β Does It Actually Help?
|
| 156 |
+
|
| 157 |
+
```python
|
| 158 |
+
def validate_specialist(new_specialist, domain, test_queries):
|
| 159 |
+
"""
|
| 160 |
+
Compare new specialist vs general model on domain-specific queries.
|
| 161 |
+
|
| 162 |
+
The specialist must BEAT the general model to be deployed.
|
| 163 |
+
This prevents regression β bad training data doesn't ship.
|
| 164 |
+
"""
|
| 165 |
+
general_scores = []
|
| 166 |
+
specialist_scores = []
|
| 167 |
+
|
| 168 |
+
for query in test_queries:
|
| 169 |
+
# Score both responses
|
| 170 |
+
general_response = general_model.generate(query)
|
| 171 |
+
specialist_response = new_specialist.generate(query)
|
| 172 |
+
|
| 173 |
+
# Compare on multiple metrics
|
| 174 |
+
general_scores.append(score_response(query, general_response))
|
| 175 |
+
specialist_scores.append(score_response(query, specialist_response))
|
| 176 |
+
|
| 177 |
+
improvement = mean(specialist_scores) - mean(general_scores)
|
| 178 |
+
|
| 179 |
+
if improvement > 0.05: # 5% threshold
|
| 180 |
+
return 'deploy', improvement
|
| 181 |
+
else:
|
| 182 |
+
return 'discard', improvement
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
### Step 6: DEPLOY β Hot-Add to Router
|
| 186 |
+
|
| 187 |
+
```python
|
| 188 |
+
def deploy_specialist(domain, checkpoint):
|
| 189 |
+
"""
|
| 190 |
+
Add new specialist to the running system.
|
| 191 |
+
|
| 192 |
+
1. Add domain keywords to router
|
| 193 |
+
2. Assign ChamberTree chambers
|
| 194 |
+
3. Load specialist (or keep on disk for lazy loading)
|
| 195 |
+
"""
|
| 196 |
+
# Update router keywords
|
| 197 |
+
router.add_domain(domain, keywords=extract_domain_keywords(domain))
|
| 198 |
+
|
| 199 |
+
# Assign chambers (take from general's allocation or split)
|
| 200 |
+
router.assign_chambers(domain, chambers=[next_available_chamber()])
|
| 201 |
+
|
| 202 |
+
# Register checkpoint path
|
| 203 |
+
specialist_registry[domain] = checkpoint
|
| 204 |
+
|
| 205 |
+
print(f"Deployed {domain} specialist: {checkpoint}")
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
### Step 7: ADAPT β ChamberTree Reorganization
|
| 209 |
+
|
| 210 |
+
As new specialists are added, the 16-chamber space gets redistributed. The ChamberTree geometry naturally supports this β each specialist gets the chambers whose geometric encoding best matches its domain queries.
|
| 211 |
+
|
| 212 |
+
Over time, the chamber assignments are learned from real routing data rather than hard-coded. A tiny classifier trained on (query_chamber, correct_specialist) pairs replaces the static mapping.
|
| 213 |
+
|
| 214 |
+
## Example: System Learns Chemistry
|
| 215 |
+
|
| 216 |
+
**Week 1:** User asks chemistry questions. System routes to general specialist. Answers are mediocre. Confidence scores average 0.35 on chemistry queries.
|
| 217 |
+
|
| 218 |
+
**Week 2:** Gap detection triggers: "chemistry" has 50+ low-confidence queries. System searches HuggingFace, finds chemistry QA datasets. Curates 8,000 instruction-response pairs.
|
| 219 |
+
|
| 220 |
+
**Week 3:** System runs QLoRA training on SmolLM3-3B with chemistry data. Takes 3 hours on GPU or 2 days on CPU. Validation shows 15% improvement over general model on chemistry questions.
|
| 221 |
+
|
| 222 |
+
**Week 3 (deploy):** Chemistry specialist added to router. Keywords: "molecule", "element", "reaction", "compound", "pH", "electron", "bond", etc. Assigned ChamberTree chambers 10-11 (split from creative's allocation).
|
| 223 |
+
|
| 224 |
+
**Week 4+:** Chemistry questions route to specialist. Confidence scores average 0.75. System works on the next gap (maybe legal, maybe medical, maybe the user's specific codebase).
|
| 225 |
+
|
| 226 |
+
## The Scaling Path
|
| 227 |
+
|
| 228 |
+
| Specialists | RAM (ternary) | Disk | Coverage |
|
| 229 |
+
|-------------|--------------|------|----------|
|
| 230 |
+
| 4 (initial) | 600MB active | 2.4GB | General + code + math + QA |
|
| 231 |
+
| 8 | 600MB active | 4.8GB | + chemistry + legal + medical + language |
|
| 232 |
+
| 16 | 600MB active | 9.6GB | + domain-specific (user's docs, codebase) |
|
| 233 |
+
| 32 | 600MB active | 19.2GB | Comprehensive coverage |
|
| 234 |
+
|
| 235 |
+
Active RAM never grows β only one specialist loads at a time. Disk grows linearly. A 1TB drive holds 166 specialists. Each one costs $2-3 in GPU time to train.
|
| 236 |
+
|
| 237 |
+
## What Needs to Be Built
|
| 238 |
+
|
| 239 |
+
1. **Confidence scorer** β wrap every response with quality signals (partially exists in eval code)
|
| 240 |
+
2. **Gap detector** β cluster low-confidence queries by domain (new, ~100 lines)
|
| 241 |
+
3. **Data curator** β search HuggingFace + pull from E8 index (new, ~200 lines)
|
| 242 |
+
4. **Training trigger** β auto-launch QLoRA when gap exceeds threshold (new, wraps existing training script)
|
| 243 |
+
5. **Validation pipeline** β A/B test new specialist vs general (new, ~150 lines)
|
| 244 |
+
6. **Hot deployment** β add specialist to router without restart (router already supports dynamic keyword addition)
|
| 245 |
+
|
| 246 |
+
Total new code: ~500 lines. Everything else reuses existing infrastructure.
|
| 247 |
+
|
| 248 |
+
## Why This Works
|
| 249 |
+
|
| 250 |
+
The autoresearch loop already proved the pattern:
|
| 251 |
+
- **Autonomous experimentation**: 42+ experiments, zero human intervention
|
| 252 |
+
- **Quality-gated deployment**: keep what works, discard what doesn't
|
| 253 |
+
- **Incremental improvement**: each experiment builds on the best so far
|
| 254 |
+
|
| 255 |
+
Continuous specialist learning is the same pattern applied at the system level instead of the hyperparameter level. The infrastructure is identical. The difference is what's being optimized: not learning rate, but capability.
|
| 256 |
+
|
| 257 |
+
**The system doesn't just answer questions. It learns to answer questions it couldn't answer before.**
|
| 258 |
+
|
| 259 |
+
---
|
| 260 |
+
|
| 261 |
+
*This document describes the continuous learning roadmap for Project Olympus. The foundation (specialists, router, retrieval, training pipeline) is built. The continuous learning loop is the next phase.*
|