Codette-Reasoning / PHASE7_LOCAL_TESTING.md
Raiff1982's picture
Upload 78 files
d574a3d verified
# Phase 7 Local Testing Guide
## Quick Start: Test Phase 7 Without Web Server
Run this command to see Phase 7 routing in action **in real time**:
```bash
python run_phase7_demo.py
```
This script demonstrates Phase 7 Executive Controller routing for different query types without needing the full web server.
---
## What You'll See
### SIMPLE Queries (Factual - Fast)
```
Query: What is the speed of light?
Complexity: SIMPLE
Routing Decision:
- Estimated Latency: 150ms ← 2-3x faster than full machinery
- Estimated Correctness: 95.0% ← High confidence on factual answers
- Compute Cost: 3 units ← 94% savings vs. full stack
- Reasoning: SIMPLE factual query - avoided heavy machinery for speed
Components SKIPPED: debate, semantic_tension, preflight_predictor, etc.
```
**What happened**: Phase 7 detected a simple factual question and skipped ForgeEngine entirely. Query goes straight to orchestrator for direct answer. ~150ms total.
---
### MEDIUM Queries (Conceptual - Balanced)
```
Query: How does quantum mechanics relate to reality?
Complexity: COMPLEX (classifier found "relate" → multi-domain thinking)
Routing Decision:
- Estimated Latency: 900ms
- Estimated Correctness: 80.0%
- Compute Cost: 25 units ← 50% of full machinery
- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
Components ACTIVATED: debate (1 round), semantic_tension, specialization_tracking
Components SKIPPED: preflight_predictor (not needed for medium complexity)
```
**What happened**: Query needs some reasoning depth but doesn't need maximum machinery. Uses 1-round debate with selective components. ~900ms total.
---
### COMPLEX Queries (Philosophical - Deep)
```
Query: Can machines be truly conscious?
Complexity: MEDIUM (classifier found "conscious" + "machine" keywords)
Routing Decision:
- Estimated Latency: 2500ms
- Estimated Correctness: 85.0%
- Compute Cost: 50+ units ← Full machinery activated
- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
Components ACTIVATED: debate (3 rounds), semantic_tension, specialization_tracking, preflight_predictor
```
**What happened**: Deep philosophical question needs full reasoning. All Phase 1-6 components activated. 3-round debate explores multiple perspectives. ~2500ms total.
---
## The Three Routes
| Complexity | Classification | Latency | Cost | Components | Use Case |
|-----------|----------------|---------|------|------------|----------|
| SIMPLE | Factual questions | ~150ms | 3 units | None (direct answer) | "What is X?" "Define Y" |
| MEDIUM | Conceptual/multi-domain | ~900ms | 25 units | Debate (1 round) + Semantic | "How does X relate to Y?" |
| COMPLEX | Philosophical/ambiguous | ~2500ms | 50+ units | Full Phase 1-6 + Debate (3) | "Should we do X?" "Is X possible?" |
---
## Real-Time Testing Workflow
### 1. Test Phase 7 Routing Logic (No Web Server Needed)
```bash
python run_phase7_demo.py
```
Shows all routing decisions instantly. Good for validating which queries route where.
### 2. Test Phase 7 with Actual ForgeEngine (Web Server)
```bash
codette_web.bat
```
Opens web UI at http://localhost:7860. Front-end shows:
- Response from query
- `phase7_routing` metadata in response (shows routing decision + transparency)
- Latency measurements (estimated vs actual)
- Component activation breakdown
### 3. Measure Performance (Post-MVP)
TODO: Create benchmarking script that measures:
- Real latency improvements (target: 2-3x on SIMPLE)
- Correctness preservation (target: no degradation)
- Compute savings (target: 40-50%)
---
## Understanding the Classifier
Phase 7 uses QueryClassifier (from Phase 6) to detect complexity:
```python
QueryClassifier.classify(query) -> QueryComplexity enum
SIMPLE patterns:
- "What is ..."
- "Define ..."
- "Who is ..."
- Direct factual questions
MEDIUM patterns:
- "How does ... relate to"
- "What are the implications of"
- Balanced reasoning needed
COMPLEX patterns:
- "Should we..." (ethical)
- "Can ... be..." (philosophical)
- "Why..." (explanation)
- Multi-domain concepts
```
---
## Transparency Metadata
When Phase 7 is enabled, every response includes routing information:
```python
response = {
"response": "The speed of light is...",
"phase6_used": True,
"phase7_used": True,
# Phase 7 transparency:
"phase7_routing": {
"query_complexity": "simple",
"components_activated": {
"debate": False,
"semantic_tension": False,
"preflight_predictor": False,
...
},
"reasoning": "SIMPLE factual query - avoided heavy machinery for speed",
"latency_analysis": {
"estimated_ms": 150,
"actual_ms": 148,
"savings_ms": 2
},
"metrics": {
"conflicts_detected": 0,
"gamma_coherence": 0.95
}
}
}
```
This transparency helps users understand *why* the system made certain decisions.
---
## Next Steps After Local Testing
1. **Validate routing works**: Run `python run_phase7_demo.py` ← You are here
2. **Test with ForgeEngine**: Launch `codette_web.bat`
3. **Measure improvements**: Create real-world benchmarks
4. **Deploy to production**: Update memory.md with Phase 7 status
5. **Phase 7B planning**: Discuss learning router implementation
---
## Troubleshooting
**Problem**: Demo shows all queries as COMPLEX
**Cause**: Likely QueryComplexity enum mismatch
**Solution**: Ensure `executive_controller.py` imports QueryComplexity from `query_classifier`, not defining its own
**Problem**: Web server not loading Phase 7
**Cause**: ForgeEngine import failed
**Solution**: Check that `reasoning_forge/executive_controller.py` exists and imports correctly
**Problem**: Latencies not improving
**Cause**: Phase 7 disabled or bypassed
**Solution**: Check that `CodetteForgeBridge.__init__()` sets `use_phase7=True` and ExecutiveController initializes
---
## File Locations
- **Executive Controller**: `reasoning_forge/executive_controller.py`
- **Local Demo**: `run_phase7_demo.py`
- **Bridge Integration**: `inference/codette_forge_bridge.py`
- **Web Launcher**: `codette_web.bat`
- **Tests**: `test_phase7_executive_controller.py`
- **Documentation**: `PHASE7_EXECUTIVE_CONTROL.md`
---
## Questions Before Next Session?
1. Should I test Phase 7 + Phase 6 together before deploying to web?
2. Want me to create phase7_benchmark.py to measure real improvements?
3. Ready to plan Phase 7B (learning router from historical data)?
4. Should Phase 7 routing decisions be logged to living_memory for analysis?
---
**Status**: Phase 7 MVP ready for real-time testing. All routing logic validated. Next: Integration testing with Phase 6 ForgeEngine.