# Agent LLM Integration — Real Inference via Adapters

## What Changed

All reasoning agents in Codette now use **real LLM inference** via trained LoRA adapters instead of template substitution.

### Before
```python
# Template-based (generic)
def analyze(self, concept: str) -> str:
    template = self.select_template(concept)
    return template.replace("{concept}", concept)
```

**Problem**: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate.

### After
```python
# LLM-based (specific)
def analyze(self, concept: str) -> str:
    if self.orchestrator and self.adapter_name:
        # Call LLM with this agent's specific adapter
        return self._analyze_with_llm(concept)
    # Fallback to templates if LLM unavailable
    return self._analyze_with_template(concept)
```

**Benefit**: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality.

## Files Modified

### Core Agent Files
- **`reasoning_forge/agents/base_agent.py`**
  - Added `orchestrator` parameter to `__init__`
  - Implemented `_analyze_with_llm()` for real inference
  - Kept `_analyze_with_template()` as fallback
  - `analyze()` now tries LLM first, falls back to templates

- **All agent subclasses**: Added `adapter_name` attribute
  - `newton_agent.py`: `adapter_name = "newton"`
  - `quantum_agent.py`: `adapter_name = "quantum"`
  - `davinci_agent.py`: `adapter_name = "davinci"`
  - `philosophy_agent.py`: `adapter_name = "philosophy"`
  - `empathy_agent.py`: `adapter_name = "empathy"`
  - `ethics_agent.py`: `adapter_name = "philosophy"` (shared)
  - `critic_agent.py`: `adapter_name = "multi_perspective"` + new `evaluate_ensemble_with_llm()` method

### Orchestrator Integration
- **`reasoning_forge/forge_engine.py`**
  - Added `orchestrator` parameter to `__init__`
  - Lazy-loads `CodetteOrchestrator` if not provided
  - Passes orchestrator to all agent constructors
  - Graceful fallback to template mode if LLM unavailable

## How It Works

### Startup Flow
```
ForgeEngine.__init__()
  → Lazy-load CodetteOrchestrator (first call ~60s)
  → Instantiate agents with orchestrator
  → forge_with_debate(query)
    → For each agent: agent.analyze(concept)
      → If orchestrator available: Call LLM with adapter
      → Else: Use templates (backward compatible)
```

### LLM Inference Flow
```
agent.analyze(concept)
  1. Check: do we have orchestrator + adapter_name?
  2. If yes: orchestrator.generate(
       query=concept,
       adapter_name="newton",  # Newton-specific reasoning
       system_prompt=template,  # Guides the reasoning
       enable_tools=False
     )
  3. If no: Fall back to template substitution
  4. Return domain-specific analysis
```

## Adapter Mapping

| Agent | Adapter | Purpose |
|-------|---------|---------|
| Newton | `newton` | Physics, mathematics, causal reasoning |
| Quantum | `quantum` | Probabilistic, uncertainty, superposition |
| DaVinci | `davinci` | Creative invention, cross-domain synthesis |
| Philosophy | `philosophy` | Epistemology, ontology, conceptual foundations |
| Empathy | `empathy` | Emotional intelligence, human impact |
| Ethics | `philosophy` | Moral reasoning, consequences (shared adapter) |
| Critic | `multi_perspective` | Meta-evaluation, ensemble critique |

## Testing

Run the integration test:
```bash
python test_agent_llm_integration.py
```

This verifies:
1. ForgeEngine loads with orchestrator
2. Agents receive orchestrator instance
3. Single agent generates real LLM response
4. Multi-agent ensemble works
5. Debate mode produces coherent synthesis

## Performance Impact

- **First debate**: ~60s (orchestrator initialization)
- **Subsequent debates**: ~30-60s (LLM inference time)
- **Agent initialization**: <1ms (orchestrator already loaded)

## Backward Compatibility

If the LLM/orchestrator is unavailable:
1. ForgeEngine logs a warning
2. Agents automatically fall back to templates
3. System continues to work (with lower quality)

This allows:
- Testing without the LLM loaded
- Fast template-based iteration
- Graceful degradation

## Expected Quality Improvements

With real LLM-based agents:
- **Correctness**: Should increase (domain-specific reasoning)
- **Depth**: Should increase (richer debate fuel)
- **Synthesis**: Should improve (agents actually understand concepts)
- **Contradictions**: Should decrease (coherent reasoning per adapter)

## Next Steps

1. Run `test_agent_llm_integration.py` to verify setup
2. Run evaluation: `python evaluation/run_evaluation_sprint.py --questions 5`
3. Compare results to previous template-based baseline
4. Iterate on Phase 6 control mechanisms with real agents

## Files Available

- **Test**: `test_agent_llm_integration.py` — Integration validation
- **Models**:
  - Base: `bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf`
  - Adapters: `adapters/*.gguf` (8 LoRA adapters, ~27 MB each)
  - Alternative: `hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf`