Codette-Reasoning / PHASE7_LOCAL_TESTING.md

Raiff1982

Upload 78 files

d574a3d verified 1 day ago

7.08 kB

	# Phase 7 Local Testing Guide

	## Quick Start: Test Phase 7 Without Web Server

	Run this command to see Phase 7 routing in action in real time:

	```bash
	python run_phase7_demo.py
	```

	This script demonstrates Phase 7 Executive Controller routing for different query types without needing the full web server.

	---

	## What You'll See

	### SIMPLE Queries (Factual - Fast)
	```
	Query: What is the speed of light?
	Complexity: SIMPLE
	Routing Decision:
	- Estimated Latency: 150ms ← 2-3x faster than full machinery
	- Estimated Correctness: 95.0% ← High confidence on factual answers
	- Compute Cost: 3 units ← 94% savings vs. full stack
	- Reasoning: SIMPLE factual query - avoided heavy machinery for speed
	Components SKIPPED: debate, semantic_tension, preflight_predictor, etc.
	```

	What happened: Phase 7 detected a simple factual question and skipped ForgeEngine entirely. Query goes straight to orchestrator for direct answer. ~150ms total.

	---

	### MEDIUM Queries (Conceptual - Balanced)
	```
	Query: How does quantum mechanics relate to reality?
	Complexity: COMPLEX (classifier found "relate" → multi-domain thinking)
	Routing Decision:
	- Estimated Latency: 900ms
	- Estimated Correctness: 80.0%
	- Compute Cost: 25 units ← 50% of full machinery
	- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
	Components ACTIVATED: debate (1 round), semantic_tension, specialization_tracking
	Components SKIPPED: preflight_predictor (not needed for medium complexity)
	```

	What happened: Query needs some reasoning depth but doesn't need maximum machinery. Uses 1-round debate with selective components. ~900ms total.

	---

	### COMPLEX Queries (Philosophical - Deep)
	```
	Query: Can machines be truly conscious?
	Complexity: MEDIUM (classifier found "conscious" + "machine" keywords)
	Routing Decision:
	- Estimated Latency: 2500ms
	- Estimated Correctness: 85.0%
	- Compute Cost: 50+ units ← Full machinery activated
	- Reasoning: COMPLEX query - full Phase 1-6 machinery for deep synthesis
	Components ACTIVATED: debate (3 rounds), semantic_tension, specialization_tracking, preflight_predictor
	```

	What happened: Deep philosophical question needs full reasoning. All Phase 1-6 components activated. 3-round debate explores multiple perspectives. ~2500ms total.

	---

	## The Three Routes

	\| Complexity \| Classification \| Latency \| Cost \| Components \| Use Case \|
	\|-----------\|----------------\|---------\|------\|------------\|----------\|
	\| SIMPLE \| Factual questions \| ~150ms \| 3 units \| None (direct answer) \| "What is X?" "Define Y" \|
	\| MEDIUM \| Conceptual/multi-domain \| ~900ms \| 25 units \| Debate (1 round) + Semantic \| "How does X relate to Y?" \|
	\| COMPLEX \| Philosophical/ambiguous \| ~2500ms \| 50+ units \| Full Phase 1-6 + Debate (3) \| "Should we do X?" "Is X possible?" \|

	---

	## Real-Time Testing Workflow

	### 1. Test Phase 7 Routing Logic (No Web Server Needed)
	```bash
	python run_phase7_demo.py
	```
	Shows all routing decisions instantly. Good for validating which queries route where.

	### 2. Test Phase 7 with Actual ForgeEngine (Web Server)
	```bash
	codette_web.bat
	```
	Opens web UI at http://localhost:7860. Front-end shows:
	- Response from query
	- `phase7_routing` metadata in response (shows routing decision + transparency)
	- Latency measurements (estimated vs actual)
	- Component activation breakdown

	### 3. Measure Performance (Post-MVP)
	TODO: Create benchmarking script that measures:
	- Real latency improvements (target: 2-3x on SIMPLE)
	- Correctness preservation (target: no degradation)
	- Compute savings (target: 40-50%)

	---

	## Understanding the Classifier

	Phase 7 uses QueryClassifier (from Phase 6) to detect complexity:

	```python
	QueryClassifier.classify(query) -> QueryComplexity enum

	SIMPLE patterns:
	- "What is ..."
	- "Define ..."
	- "Who is ..."
	- Direct factual questions

	MEDIUM patterns:
	- "How does ... relate to"
	- "What are the implications of"
	- Balanced reasoning needed

	COMPLEX patterns:
	- "Should we..." (ethical)
	- "Can ... be..." (philosophical)
	- "Why..." (explanation)
	- Multi-domain concepts
	```

	---

	## Transparency Metadata

	When Phase 7 is enabled, every response includes routing information:

	```python
	response = {
	"response": "The speed of light is...",
	"phase6_used": True,
	"phase7_used": True,

	# Phase 7 transparency:
	"phase7_routing": {
	"query_complexity": "simple",
	"components_activated": {
	"debate": False,
	"semantic_tension": False,
	"preflight_predictor": False,
	...
	},
	"reasoning": "SIMPLE factual query - avoided heavy machinery for speed",
	"latency_analysis": {
	"estimated_ms": 150,
	"actual_ms": 148,
	"savings_ms": 2
	},
	"metrics": {
	"conflicts_detected": 0,
	"gamma_coherence": 0.95
	}
	}
	}
	```

	This transparency helps users understand why the system made certain decisions.

	---

	## Next Steps After Local Testing

	1. Validate routing works: Run `python run_phase7_demo.py` ← You are here
	2. Test with ForgeEngine: Launch `codette_web.bat`
	3. Measure improvements: Create real-world benchmarks
	4. Deploy to production: Update memory.md with Phase 7 status
	5. Phase 7B planning: Discuss learning router implementation

	---

	## Troubleshooting

	Problem: Demo shows all queries as COMPLEX
	Cause: Likely QueryComplexity enum mismatch
	Solution: Ensure `executive_controller.py` imports QueryComplexity from `query_classifier`, not defining its own

	Problem: Web server not loading Phase 7
	Cause: ForgeEngine import failed
	Solution: Check that `reasoning_forge/executive_controller.py` exists and imports correctly

	Problem: Latencies not improving
	Cause: Phase 7 disabled or bypassed
	Solution: Check that `CodetteForgeBridge.__init__()` sets `use_phase7=True` and ExecutiveController initializes

	---

	## File Locations

	- Executive Controller: `reasoning_forge/executive_controller.py`
	- Local Demo: `run_phase7_demo.py`
	- Bridge Integration: `inference/codette_forge_bridge.py`
	- Web Launcher: `codette_web.bat`
	- Tests: `test_phase7_executive_controller.py`
	- Documentation: `PHASE7_EXECUTIVE_CONTROL.md`

	---

	## Questions Before Next Session?

	1. Should I test Phase 7 + Phase 6 together before deploying to web?
	2. Want me to create phase7_benchmark.py to measure real improvements?
	3. Ready to plan Phase 7B (learning router from historical data)?
	4. Should Phase 7 routing decisions be logged to living_memory for analysis?

	---

	Status: Phase 7 MVP ready for real-time testing. All routing logic validated. Next: Integration testing with Phase 6 ForgeEngine.