Agent LLM Integration — Real Inference via Adapters

What Changed

All reasoning agents in Codette now use real LLM inference via trained LoRA adapters instead of template substitution.

Before

# Template-based (generic)
def analyze(self, concept: str) -> str:
    template = self.select_template(concept)
    return template.replace("{concept}", concept)

Problem: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate.

After

# LLM-based (specific)
def analyze(self, concept: str) -> str:
    if self.orchestrator and self.adapter_name:
        # Call LLM with this agent's specific adapter
        return self._analyze_with_llm(concept)
    # Fallback to templates if LLM unavailable
    return self._analyze_with_template(concept)

Benefit: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality.

Files Modified

Core Agent Files

reasoning_forge/agents/base_agent.py
- Added orchestrator parameter to __init__
- Implemented _analyze_with_llm() for real inference
- Kept _analyze_with_template() as fallback
- analyze() now tries LLM first, falls back to templates
All agent subclasses: Added adapter_name attribute
- newton_agent.py: adapter_name = "newton"
- quantum_agent.py: adapter_name = "quantum"
- davinci_agent.py: adapter_name = "davinci"
- philosophy_agent.py: adapter_name = "philosophy"
- empathy_agent.py: adapter_name = "empathy"
- ethics_agent.py: adapter_name = "philosophy" (shared)
- critic_agent.py: adapter_name = "multi_perspective" + new evaluate_ensemble_with_llm() method

Orchestrator Integration

reasoning_forge/forge_engine.py
- Added orchestrator parameter to __init__
- Lazy-loads CodetteOrchestrator if not provided
- Passes orchestrator to all agent constructors
- Graceful fallback to template mode if LLM unavailable

How It Works

Startup Flow

ForgeEngine.__init__()
  → Lazy-load CodetteOrchestrator (first call ~60s)
  → Instantiate agents with orchestrator
  → forge_with_debate(query)
    → For each agent: agent.analyze(concept)
      → If orchestrator available: Call LLM with adapter
      → Else: Use templates (backward compatible)

LLM Inference Flow

agent.analyze(concept)
  1. Check: do we have orchestrator + adapter_name?
  2. If yes: orchestrator.generate(
       query=concept,
       adapter_name="newton",  # Newton-specific reasoning
       system_prompt=template,  # Guides the reasoning
       enable_tools=False
     )
  3. If no: Fall back to template substitution
  4. Return domain-specific analysis

Adapter Mapping

Agent	Adapter	Purpose
Newton	`newton`	Physics, mathematics, causal reasoning
Quantum	`quantum`	Probabilistic, uncertainty, superposition
DaVinci	`davinci`	Creative invention, cross-domain synthesis
Philosophy	`philosophy`	Epistemology, ontology, conceptual foundations
Empathy	`empathy`	Emotional intelligence, human impact
Ethics	`philosophy`	Moral reasoning, consequences (shared adapter)
Critic	`multi_perspective`	Meta-evaluation, ensemble critique

Testing

Run the integration test:

python test_agent_llm_integration.py

This verifies:

ForgeEngine loads with orchestrator
Agents receive orchestrator instance
Single agent generates real LLM response
Multi-agent ensemble works
Debate mode produces coherent synthesis

Performance Impact

First debate: ~60s (orchestrator initialization)
Subsequent debates: ~30-60s (LLM inference time)
Agent initialization: <1ms (orchestrator already loaded)

Backward Compatibility

If the LLM/orchestrator is unavailable:

ForgeEngine logs a warning
Agents automatically fall back to templates
System continues to work (with lower quality)

This allows:

Testing without the LLM loaded
Fast template-based iteration
Graceful degradation

Expected Quality Improvements

With real LLM-based agents:

Correctness: Should increase (domain-specific reasoning)
Depth: Should increase (richer debate fuel)
Synthesis: Should improve (agents actually understand concepts)
Contradictions: Should decrease (coherent reasoning per adapter)

Next Steps

Run test_agent_llm_integration.py to verify setup
Run evaluation: python evaluation/run_evaluation_sprint.py --questions 5
Compare results to previous template-based baseline
Iterate on Phase 6 control mechanisms with real agents

Files Available

Test: test_agent_llm_integration.py — Integration validation
Models:
- Base: bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
- Adapters: adapters/*.gguf (8 LoRA adapters, ~27 MB each)
- Alternative: hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf