Codette-Reasoning / AGENT_LLM_INTEGRATION_SUMMARY.md

Raiff1982

Upload 78 files

d574a3d verified 2 days ago

5.33 kB

	# Agent LLM Integration — Real Inference via Adapters

	## What Changed

	All reasoning agents in Codette now use real LLM inference via trained LoRA adapters instead of template substitution.

	### Before
	```python
	# Template-based (generic)
	def analyze(self, concept: str) -> str:
	template = self.select_template(concept)
	return template.replace("{concept}", concept)
	```

	Problem: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate.

	### After
	```python
	# LLM-based (specific)
	def analyze(self, concept: str) -> str:
	if self.orchestrator and self.adapter_name:
	# Call LLM with this agent's specific adapter
	return self._analyze_with_llm(concept)
	# Fallback to templates if LLM unavailable
	return self._analyze_with_template(concept)
	```

	Benefit: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality.

	## Files Modified

	### Core Agent Files
	- `reasoning_forge/agents/base_agent.py`
	- Added `orchestrator` parameter to `__init__`
	- Implemented `_analyze_with_llm()` for real inference
	- Kept `_analyze_with_template()` as fallback
	- `analyze()` now tries LLM first, falls back to templates

	- All agent subclasses: Added `adapter_name` attribute
	- `newton_agent.py`: `adapter_name = "newton"`
	- `quantum_agent.py`: `adapter_name = "quantum"`
	- `davinci_agent.py`: `adapter_name = "davinci"`
	- `philosophy_agent.py`: `adapter_name = "philosophy"`
	- `empathy_agent.py`: `adapter_name = "empathy"`
	- `ethics_agent.py`: `adapter_name = "philosophy"` (shared)
	- `critic_agent.py`: `adapter_name = "multi_perspective"` + new `evaluate_ensemble_with_llm()` method

	### Orchestrator Integration
	- `reasoning_forge/forge_engine.py`
	- Added `orchestrator` parameter to `__init__`
	- Lazy-loads `CodetteOrchestrator` if not provided
	- Passes orchestrator to all agent constructors
	- Graceful fallback to template mode if LLM unavailable

	## How It Works

	### Startup Flow
	```
	ForgeEngine.__init__()
	→ Lazy-load CodetteOrchestrator (first call ~60s)
	→ Instantiate agents with orchestrator
	→ forge_with_debate(query)
	→ For each agent: agent.analyze(concept)
	→ If orchestrator available: Call LLM with adapter
	→ Else: Use templates (backward compatible)
	```

	### LLM Inference Flow
	```
	agent.analyze(concept)
	1. Check: do we have orchestrator + adapter_name?
	2. If yes: orchestrator.generate(
	query=concept,
	adapter_name="newton", # Newton-specific reasoning
	system_prompt=template, # Guides the reasoning
	enable_tools=False
	)
	3. If no: Fall back to template substitution
	4. Return domain-specific analysis
	```

	## Adapter Mapping

	\| Agent \| Adapter \| Purpose \|
	\|-------\|---------\|---------\|
	\| Newton \| `newton` \| Physics, mathematics, causal reasoning \|
	\| Quantum \| `quantum` \| Probabilistic, uncertainty, superposition \|
	\| DaVinci \| `davinci` \| Creative invention, cross-domain synthesis \|
	\| Philosophy \| `philosophy` \| Epistemology, ontology, conceptual foundations \|
	\| Empathy \| `empathy` \| Emotional intelligence, human impact \|
	\| Ethics \| `philosophy` \| Moral reasoning, consequences (shared adapter) \|
	\| Critic \| `multi_perspective` \| Meta-evaluation, ensemble critique \|

	## Testing

	Run the integration test:
	```bash
	python test_agent_llm_integration.py
	```

	This verifies:
	1. ForgeEngine loads with orchestrator
	2. Agents receive orchestrator instance
	3. Single agent generates real LLM response
	4. Multi-agent ensemble works
	5. Debate mode produces coherent synthesis

	## Performance Impact

	- First debate: ~60s (orchestrator initialization)
	- Subsequent debates: ~30-60s (LLM inference time)
	- Agent initialization: <1ms (orchestrator already loaded)

	## Backward Compatibility

	If the LLM/orchestrator is unavailable:
	1. ForgeEngine logs a warning
	2. Agents automatically fall back to templates
	3. System continues to work (with lower quality)

	This allows:
	- Testing without the LLM loaded
	- Fast template-based iteration
	- Graceful degradation

	## Expected Quality Improvements

	With real LLM-based agents:
	- Correctness: Should increase (domain-specific reasoning)
	- Depth: Should increase (richer debate fuel)
	- Synthesis: Should improve (agents actually understand concepts)
	- Contradictions: Should decrease (coherent reasoning per adapter)

	## Next Steps

	1. Run `test_agent_llm_integration.py` to verify setup
	2. Run evaluation: `python evaluation/run_evaluation_sprint.py --questions 5`
	3. Compare results to previous template-based baseline
	4. Iterate on Phase 6 control mechanisms with real agents

	## Files Available

	- Test: `test_agent_llm_integration.py` — Integration validation
	- Models:
	- Base: `bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf`
	- Adapters: `adapters/*.gguf` (8 LoRA adapters, ~27 MB each)
	- Alternative: `hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf`