File size: 7,084 Bytes
d574a3d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | # Phase 7 Local Testing Guide
## Quick Start: Test Phase 7 Without Web Server
Run this command to see Phase 7 routing in action **in real time**:
```bash
python run_phase7_demo.py
```
This script demonstrates Phase 7 Executive Controller routing for different query types without needing the full web server.
---
## What You'll See
### SIMPLE Queries (Factual - Fast)
```
Query: What is the speed of light?
Complexity: SIMPLE
Routing Decision:
- Estimated Latency: 150ms ← 2-3x faster than full machinery
- Estimated Correctness: 95.0% ← High confidence on factual answers
- Compute Cost: 3 units ← 94% savings vs. full stack
- Reasoning: SIMPLE factual query - avoided heavy machinery for speed
Components SKIPPED: debate, semantic_tension, preflight_predictor, etc.
```
**What happened**: Phase 7 detected a simple factual question and skipped ForgeEngine entirely. Query goes straight to orchestrator for direct answer. ~150ms total.
---
### MEDIUM Queries (Conceptual - Balanced)
```
Query: How does quantum mechanics relate to reality?
Complexity: COMPLEX (classifier found "relate" → multi-domain thinking)
Routing Decision:
- Estimated Latency: 900ms
- Estimated Correctness: 80.0%
- Compute Cost: 25 units ← 50% of full machinery
- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
Components ACTIVATED: debate (1 round), semantic_tension, specialization_tracking
Components SKIPPED: preflight_predictor (not needed for medium complexity)
```
**What happened**: Query needs some reasoning depth but doesn't need maximum machinery. Uses 1-round debate with selective components. ~900ms total.
---
### COMPLEX Queries (Philosophical - Deep)
```
Query: Can machines be truly conscious?
Complexity: MEDIUM (classifier found "conscious" + "machine" keywords)
Routing Decision:
- Estimated Latency: 2500ms
- Estimated Correctness: 85.0%
- Compute Cost: 50+ units ← Full machinery activated
- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
Components ACTIVATED: debate (3 rounds), semantic_tension, specialization_tracking, preflight_predictor
```
**What happened**: Deep philosophical question needs full reasoning. All Phase 1-6 components activated. 3-round debate explores multiple perspectives. ~2500ms total.
---
## The Three Routes
| Complexity | Classification | Latency | Cost | Components | Use Case |
|-----------|----------------|---------|------|------------|----------|
| SIMPLE | Factual questions | ~150ms | 3 units | None (direct answer) | "What is X?" "Define Y" |
| MEDIUM | Conceptual/multi-domain | ~900ms | 25 units | Debate (1 round) + Semantic | "How does X relate to Y?" |
| COMPLEX | Philosophical/ambiguous | ~2500ms | 50+ units | Full Phase 1-6 + Debate (3) | "Should we do X?" "Is X possible?" |
---
## Real-Time Testing Workflow
### 1. Test Phase 7 Routing Logic (No Web Server Needed)
```bash
python run_phase7_demo.py
```
Shows all routing decisions instantly. Good for validating which queries route where.
### 2. Test Phase 7 with Actual ForgeEngine (Web Server)
```bash
codette_web.bat
```
Opens web UI at http://localhost:7860. Front-end shows:
- Response from query
- `phase7_routing` metadata in response (shows routing decision + transparency)
- Latency measurements (estimated vs actual)
- Component activation breakdown
### 3. Measure Performance (Post-MVP)
TODO: Create benchmarking script that measures:
- Real latency improvements (target: 2-3x on SIMPLE)
- Correctness preservation (target: no degradation)
- Compute savings (target: 40-50%)
---
## Understanding the Classifier
Phase 7 uses QueryClassifier (from Phase 6) to detect complexity:
```python
QueryClassifier.classify(query) -> QueryComplexity enum
SIMPLE patterns:
- "What is ..."
- "Define ..."
- "Who is ..."
- Direct factual questions
MEDIUM patterns:
- "How does ... relate to"
- "What are the implications of"
- Balanced reasoning needed
COMPLEX patterns:
- "Should we..." (ethical)
- "Can ... be..." (philosophical)
- "Why..." (explanation)
- Multi-domain concepts
```
---
## Transparency Metadata
When Phase 7 is enabled, every response includes routing information:
```python
response = {
"response": "The speed of light is...",
"phase6_used": True,
"phase7_used": True,
# Phase 7 transparency:
"phase7_routing": {
"query_complexity": "simple",
"components_activated": {
"debate": False,
"semantic_tension": False,
"preflight_predictor": False,
...
},
"reasoning": "SIMPLE factual query - avoided heavy machinery for speed",
"latency_analysis": {
"estimated_ms": 150,
"actual_ms": 148,
"savings_ms": 2
},
"metrics": {
"conflicts_detected": 0,
"gamma_coherence": 0.95
}
}
}
```
This transparency helps users understand *why* the system made certain decisions.
---
## Next Steps After Local Testing
1. **Validate routing works**: Run `python run_phase7_demo.py` ← You are here
2. **Test with ForgeEngine**: Launch `codette_web.bat`
3. **Measure improvements**: Create real-world benchmarks
4. **Deploy to production**: Update memory.md with Phase 7 status
5. **Phase 7B planning**: Discuss learning router implementation
---
## Troubleshooting
**Problem**: Demo shows all queries as COMPLEX
**Cause**: Likely QueryComplexity enum mismatch
**Solution**: Ensure `executive_controller.py` imports QueryComplexity from `query_classifier`, not defining its own
**Problem**: Web server not loading Phase 7
**Cause**: ForgeEngine import failed
**Solution**: Check that `reasoning_forge/executive_controller.py` exists and imports correctly
**Problem**: Latencies not improving
**Cause**: Phase 7 disabled or bypassed
**Solution**: Check that `CodetteForgeBridge.__init__()` sets `use_phase7=True` and ExecutiveController initializes
---
## File Locations
- **Executive Controller**: `reasoning_forge/executive_controller.py`
- **Local Demo**: `run_phase7_demo.py`
- **Bridge Integration**: `inference/codette_forge_bridge.py`
- **Web Launcher**: `codette_web.bat`
- **Tests**: `test_phase7_executive_controller.py`
- **Documentation**: `PHASE7_EXECUTIVE_CONTROL.md`
---
## Questions Before Next Session?
1. Should I test Phase 7 + Phase 6 together before deploying to web?
2. Want me to create phase7_benchmark.py to measure real improvements?
3. Ready to plan Phase 7B (learning router from historical data)?
4. Should Phase 7 routing decisions be logged to living_memory for analysis?
---
**Status**: Phase 7 MVP ready for real-time testing. All routing logic validated. Next: Integration testing with Phase 6 ForgeEngine.
|