Codette-Reasoning / AGENT_LLM_INTEGRATION_SUMMARY.md
Raiff1982's picture
Upload 78 files
d574a3d verified

Agent LLM Integration — Real Inference via Adapters

What Changed

All reasoning agents in Codette now use real LLM inference via trained LoRA adapters instead of template substitution.

Before

# Template-based (generic)
def analyze(self, concept: str) -> str:
    template = self.select_template(concept)
    return template.replace("{concept}", concept)

Problem: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate.

After

# LLM-based (specific)
def analyze(self, concept: str) -> str:
    if self.orchestrator and self.adapter_name:
        # Call LLM with this agent's specific adapter
        return self._analyze_with_llm(concept)
    # Fallback to templates if LLM unavailable
    return self._analyze_with_template(concept)

Benefit: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality.

Files Modified

Core Agent Files

  • reasoning_forge/agents/base_agent.py

    • Added orchestrator parameter to __init__
    • Implemented _analyze_with_llm() for real inference
    • Kept _analyze_with_template() as fallback
    • analyze() now tries LLM first, falls back to templates
  • All agent subclasses: Added adapter_name attribute

    • newton_agent.py: adapter_name = "newton"
    • quantum_agent.py: adapter_name = "quantum"
    • davinci_agent.py: adapter_name = "davinci"
    • philosophy_agent.py: adapter_name = "philosophy"
    • empathy_agent.py: adapter_name = "empathy"
    • ethics_agent.py: adapter_name = "philosophy" (shared)
    • critic_agent.py: adapter_name = "multi_perspective" + new evaluate_ensemble_with_llm() method

Orchestrator Integration

  • reasoning_forge/forge_engine.py
    • Added orchestrator parameter to __init__
    • Lazy-loads CodetteOrchestrator if not provided
    • Passes orchestrator to all agent constructors
    • Graceful fallback to template mode if LLM unavailable

How It Works

Startup Flow

ForgeEngine.__init__()
  → Lazy-load CodetteOrchestrator (first call ~60s)
  → Instantiate agents with orchestrator
  → forge_with_debate(query)
    → For each agent: agent.analyze(concept)
      → If orchestrator available: Call LLM with adapter
      → Else: Use templates (backward compatible)

LLM Inference Flow

agent.analyze(concept)
  1. Check: do we have orchestrator + adapter_name?
  2. If yes: orchestrator.generate(
       query=concept,
       adapter_name="newton",  # Newton-specific reasoning
       system_prompt=template,  # Guides the reasoning
       enable_tools=False
     )
  3. If no: Fall back to template substitution
  4. Return domain-specific analysis

Adapter Mapping

Agent Adapter Purpose
Newton newton Physics, mathematics, causal reasoning
Quantum quantum Probabilistic, uncertainty, superposition
DaVinci davinci Creative invention, cross-domain synthesis
Philosophy philosophy Epistemology, ontology, conceptual foundations
Empathy empathy Emotional intelligence, human impact
Ethics philosophy Moral reasoning, consequences (shared adapter)
Critic multi_perspective Meta-evaluation, ensemble critique

Testing

Run the integration test:

python test_agent_llm_integration.py

This verifies:

  1. ForgeEngine loads with orchestrator
  2. Agents receive orchestrator instance
  3. Single agent generates real LLM response
  4. Multi-agent ensemble works
  5. Debate mode produces coherent synthesis

Performance Impact

  • First debate: ~60s (orchestrator initialization)
  • Subsequent debates: ~30-60s (LLM inference time)
  • Agent initialization: <1ms (orchestrator already loaded)

Backward Compatibility

If the LLM/orchestrator is unavailable:

  1. ForgeEngine logs a warning
  2. Agents automatically fall back to templates
  3. System continues to work (with lower quality)

This allows:

  • Testing without the LLM loaded
  • Fast template-based iteration
  • Graceful degradation

Expected Quality Improvements

With real LLM-based agents:

  • Correctness: Should increase (domain-specific reasoning)
  • Depth: Should increase (richer debate fuel)
  • Synthesis: Should improve (agents actually understand concepts)
  • Contradictions: Should decrease (coherent reasoning per adapter)

Next Steps

  1. Run test_agent_llm_integration.py to verify setup
  2. Run evaluation: python evaluation/run_evaluation_sprint.py --questions 5
  3. Compare results to previous template-based baseline
  4. Iterate on Phase 6 control mechanisms with real agents

Files Available

  • Test: test_agent_llm_integration.py — Integration validation
  • Models:
    • Base: bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
    • Adapters: adapters/*.gguf (8 LoRA adapters, ~27 MB each)
    • Alternative: hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf