| # Phase 7 Local Testing Guide | |
| ## Quick Start: Test Phase 7 Without Web Server | |
| Run this command to see Phase 7 routing in action **in real time**: | |
| ```bash | |
| python run_phase7_demo.py | |
| ``` | |
| This script demonstrates Phase 7 Executive Controller routing for different query types without needing the full web server. | |
| --- | |
| ## What You'll See | |
| ### SIMPLE Queries (Factual - Fast) | |
| ``` | |
| Query: What is the speed of light? | |
| Complexity: SIMPLE | |
| Routing Decision: | |
| - Estimated Latency: 150ms ← 2-3x faster than full machinery | |
| - Estimated Correctness: 95.0% ← High confidence on factual answers | |
| - Compute Cost: 3 units ← 94% savings vs. full stack | |
| - Reasoning: SIMPLE factual query - avoided heavy machinery for speed | |
| Components SKIPPED: debate, semantic_tension, preflight_predictor, etc. | |
| ``` | |
| **What happened**: Phase 7 detected a simple factual question and skipped ForgeEngine entirely. Query goes straight to orchestrator for direct answer. ~150ms total. | |
| --- | |
| ### MEDIUM Queries (Conceptual - Balanced) | |
| ``` | |
| Query: How does quantum mechanics relate to reality? | |
| Complexity: COMPLEX (classifier found "relate" → multi-domain thinking) | |
| Routing Decision: | |
| - Estimated Latency: 900ms | |
| - Estimated Correctness: 80.0% | |
| - Compute Cost: 25 units ← 50% of full machinery | |
| - Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis | |
| Components ACTIVATED: debate (1 round), semantic_tension, specialization_tracking | |
| Components SKIPPED: preflight_predictor (not needed for medium complexity) | |
| ``` | |
| **What happened**: Query needs some reasoning depth but doesn't need maximum machinery. Uses 1-round debate with selective components. ~900ms total. | |
| --- | |
| ### COMPLEX Queries (Philosophical - Deep) | |
| ``` | |
| Query: Can machines be truly conscious? | |
| Complexity: MEDIUM (classifier found "conscious" + "machine" keywords) | |
| Routing Decision: | |
| - Estimated Latency: 2500ms | |
| - Estimated Correctness: 85.0% | |
| - Compute Cost: 50+ units ← Full machinery activated | |
| - Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis | |
| Components ACTIVATED: debate (3 rounds), semantic_tension, specialization_tracking, preflight_predictor | |
| ``` | |
| **What happened**: Deep philosophical question needs full reasoning. All Phase 1-6 components activated. 3-round debate explores multiple perspectives. ~2500ms total. | |
| --- | |
| ## The Three Routes | |
| | Complexity | Classification | Latency | Cost | Components | Use Case | | |
| |-----------|----------------|---------|------|------------|----------| | |
| | SIMPLE | Factual questions | ~150ms | 3 units | None (direct answer) | "What is X?" "Define Y" | | |
| | MEDIUM | Conceptual/multi-domain | ~900ms | 25 units | Debate (1 round) + Semantic | "How does X relate to Y?" | | |
| | COMPLEX | Philosophical/ambiguous | ~2500ms | 50+ units | Full Phase 1-6 + Debate (3) | "Should we do X?" "Is X possible?" | | |
| --- | |
| ## Real-Time Testing Workflow | |
| ### 1. Test Phase 7 Routing Logic (No Web Server Needed) | |
| ```bash | |
| python run_phase7_demo.py | |
| ``` | |
| Shows all routing decisions instantly. Good for validating which queries route where. | |
| ### 2. Test Phase 7 with Actual ForgeEngine (Web Server) | |
| ```bash | |
| codette_web.bat | |
| ``` | |
| Opens web UI at http://localhost:7860. Front-end shows: | |
| - Response from query | |
| - `phase7_routing` metadata in response (shows routing decision + transparency) | |
| - Latency measurements (estimated vs actual) | |
| - Component activation breakdown | |
| ### 3. Measure Performance (Post-MVP) | |
| TODO: Create benchmarking script that measures: | |
| - Real latency improvements (target: 2-3x on SIMPLE) | |
| - Correctness preservation (target: no degradation) | |
| - Compute savings (target: 40-50%) | |
| --- | |
| ## Understanding the Classifier | |
| Phase 7 uses QueryClassifier (from Phase 6) to detect complexity: | |
| ```python | |
| QueryClassifier.classify(query) -> QueryComplexity enum | |
| SIMPLE patterns: | |
| - "What is ..." | |
| - "Define ..." | |
| - "Who is ..." | |
| - Direct factual questions | |
| MEDIUM patterns: | |
| - "How does ... relate to" | |
| - "What are the implications of" | |
| - Balanced reasoning needed | |
| COMPLEX patterns: | |
| - "Should we..." (ethical) | |
| - "Can ... be..." (philosophical) | |
| - "Why..." (explanation) | |
| - Multi-domain concepts | |
| ``` | |
| --- | |
| ## Transparency Metadata | |
| When Phase 7 is enabled, every response includes routing information: | |
| ```python | |
| response = { | |
| "response": "The speed of light is...", | |
| "phase6_used": True, | |
| "phase7_used": True, | |
| # Phase 7 transparency: | |
| "phase7_routing": { | |
| "query_complexity": "simple", | |
| "components_activated": { | |
| "debate": False, | |
| "semantic_tension": False, | |
| "preflight_predictor": False, | |
| ... | |
| }, | |
| "reasoning": "SIMPLE factual query - avoided heavy machinery for speed", | |
| "latency_analysis": { | |
| "estimated_ms": 150, | |
| "actual_ms": 148, | |
| "savings_ms": 2 | |
| }, | |
| "metrics": { | |
| "conflicts_detected": 0, | |
| "gamma_coherence": 0.95 | |
| } | |
| } | |
| } | |
| ``` | |
| This transparency helps users understand *why* the system made certain decisions. | |
| --- | |
| ## Next Steps After Local Testing | |
| 1. **Validate routing works**: Run `python run_phase7_demo.py` ← You are here | |
| 2. **Test with ForgeEngine**: Launch `codette_web.bat` | |
| 3. **Measure improvements**: Create real-world benchmarks | |
| 4. **Deploy to production**: Update memory.md with Phase 7 status | |
| 5. **Phase 7B planning**: Discuss learning router implementation | |
| --- | |
| ## Troubleshooting | |
| **Problem**: Demo shows all queries as COMPLEX | |
| **Cause**: Likely QueryComplexity enum mismatch | |
| **Solution**: Ensure `executive_controller.py` imports QueryComplexity from `query_classifier`, not defining its own | |
| **Problem**: Web server not loading Phase 7 | |
| **Cause**: ForgeEngine import failed | |
| **Solution**: Check that `reasoning_forge/executive_controller.py` exists and imports correctly | |
| **Problem**: Latencies not improving | |
| **Cause**: Phase 7 disabled or bypassed | |
| **Solution**: Check that `CodetteForgeBridge.__init__()` sets `use_phase7=True` and ExecutiveController initializes | |
| --- | |
| ## File Locations | |
| - **Executive Controller**: `reasoning_forge/executive_controller.py` | |
| - **Local Demo**: `run_phase7_demo.py` | |
| - **Bridge Integration**: `inference/codette_forge_bridge.py` | |
| - **Web Launcher**: `codette_web.bat` | |
| - **Tests**: `test_phase7_executive_controller.py` | |
| - **Documentation**: `PHASE7_EXECUTIVE_CONTROL.md` | |
| --- | |
| ## Questions Before Next Session? | |
| 1. Should I test Phase 7 + Phase 6 together before deploying to web? | |
| 2. Want me to create phase7_benchmark.py to measure real improvements? | |
| 3. Ready to plan Phase 7B (learning router from historical data)? | |
| 4. Should Phase 7 routing decisions be logged to living_memory for analysis? | |
| --- | |
| **Status**: Phase 7 MVP ready for real-time testing. All routing logic validated. Next: Integration testing with Phase 6 ForgeEngine. | |