Codette-Reasoning / AGENT_LLM_INTEGRATION_SUMMARY.md
Raiff1982's picture
Upload 78 files
d574a3d verified
# Agent LLM Integration — Real Inference via Adapters
## What Changed
All reasoning agents in Codette now use **real LLM inference** via trained LoRA adapters instead of template substitution.
### Before
```python
# Template-based (generic)
def analyze(self, concept: str) -> str:
template = self.select_template(concept)
return template.replace("{concept}", concept)
```
**Problem**: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate.
### After
```python
# LLM-based (specific)
def analyze(self, concept: str) -> str:
if self.orchestrator and self.adapter_name:
# Call LLM with this agent's specific adapter
return self._analyze_with_llm(concept)
# Fallback to templates if LLM unavailable
return self._analyze_with_template(concept)
```
**Benefit**: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality.
## Files Modified
### Core Agent Files
- **`reasoning_forge/agents/base_agent.py`**
- Added `orchestrator` parameter to `__init__`
- Implemented `_analyze_with_llm()` for real inference
- Kept `_analyze_with_template()` as fallback
- `analyze()` now tries LLM first, falls back to templates
- **All agent subclasses**: Added `adapter_name` attribute
- `newton_agent.py`: `adapter_name = "newton"`
- `quantum_agent.py`: `adapter_name = "quantum"`
- `davinci_agent.py`: `adapter_name = "davinci"`
- `philosophy_agent.py`: `adapter_name = "philosophy"`
- `empathy_agent.py`: `adapter_name = "empathy"`
- `ethics_agent.py`: `adapter_name = "philosophy"` (shared)
- `critic_agent.py`: `adapter_name = "multi_perspective"` + new `evaluate_ensemble_with_llm()` method
### Orchestrator Integration
- **`reasoning_forge/forge_engine.py`**
- Added `orchestrator` parameter to `__init__`
- Lazy-loads `CodetteOrchestrator` if not provided
- Passes orchestrator to all agent constructors
- Graceful fallback to template mode if LLM unavailable
## How It Works
### Startup Flow
```
ForgeEngine.__init__()
→ Lazy-load CodetteOrchestrator (first call ~60s)
→ Instantiate agents with orchestrator
→ forge_with_debate(query)
→ For each agent: agent.analyze(concept)
→ If orchestrator available: Call LLM with adapter
→ Else: Use templates (backward compatible)
```
### LLM Inference Flow
```
agent.analyze(concept)
1. Check: do we have orchestrator + adapter_name?
2. If yes: orchestrator.generate(
query=concept,
adapter_name="newton", # Newton-specific reasoning
system_prompt=template, # Guides the reasoning
enable_tools=False
)
3. If no: Fall back to template substitution
4. Return domain-specific analysis
```
## Adapter Mapping
| Agent | Adapter | Purpose |
|-------|---------|---------|
| Newton | `newton` | Physics, mathematics, causal reasoning |
| Quantum | `quantum` | Probabilistic, uncertainty, superposition |
| DaVinci | `davinci` | Creative invention, cross-domain synthesis |
| Philosophy | `philosophy` | Epistemology, ontology, conceptual foundations |
| Empathy | `empathy` | Emotional intelligence, human impact |
| Ethics | `philosophy` | Moral reasoning, consequences (shared adapter) |
| Critic | `multi_perspective` | Meta-evaluation, ensemble critique |
## Testing
Run the integration test:
```bash
python test_agent_llm_integration.py
```
This verifies:
1. ForgeEngine loads with orchestrator
2. Agents receive orchestrator instance
3. Single agent generates real LLM response
4. Multi-agent ensemble works
5. Debate mode produces coherent synthesis
## Performance Impact
- **First debate**: ~60s (orchestrator initialization)
- **Subsequent debates**: ~30-60s (LLM inference time)
- **Agent initialization**: <1ms (orchestrator already loaded)
## Backward Compatibility
If the LLM/orchestrator is unavailable:
1. ForgeEngine logs a warning
2. Agents automatically fall back to templates
3. System continues to work (with lower quality)
This allows:
- Testing without the LLM loaded
- Fast template-based iteration
- Graceful degradation
## Expected Quality Improvements
With real LLM-based agents:
- **Correctness**: Should increase (domain-specific reasoning)
- **Depth**: Should increase (richer debate fuel)
- **Synthesis**: Should improve (agents actually understand concepts)
- **Contradictions**: Should decrease (coherent reasoning per adapter)
## Next Steps
1. Run `test_agent_llm_integration.py` to verify setup
2. Run evaluation: `python evaluation/run_evaluation_sprint.py --questions 5`
3. Compare results to previous template-based baseline
4. Iterate on Phase 6 control mechanisms with real agents
## Files Available
- **Test**: `test_agent_llm_integration.py` — Integration validation
- **Models**:
- Base: `bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf`
- Adapters: `adapters/*.gguf` (8 LoRA adapters, ~27 MB each)
- Alternative: `hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf`