| # Agent LLM Integration — Real Inference via Adapters | |
| ## What Changed | |
| All reasoning agents in Codette now use **real LLM inference** via trained LoRA adapters instead of template substitution. | |
| ### Before | |
| ```python | |
| # Template-based (generic) | |
| def analyze(self, concept: str) -> str: | |
| template = self.select_template(concept) | |
| return template.replace("{concept}", concept) | |
| ``` | |
| **Problem**: Agents generated the same generic text for ANY concept, just with the concept name substituted. This produced non-specific, often contradictory reasoning that actually reduced correctness in debate. | |
| ### After | |
| ```python | |
| # LLM-based (specific) | |
| def analyze(self, concept: str) -> str: | |
| if self.orchestrator and self.adapter_name: | |
| # Call LLM with this agent's specific adapter | |
| return self._analyze_with_llm(concept) | |
| # Fallback to templates if LLM unavailable | |
| return self._analyze_with_template(concept) | |
| ``` | |
| **Benefit**: Agents now reason using the actual concept content, generating domain-specific insights that strengthen debate quality. | |
| ## Files Modified | |
| ### Core Agent Files | |
| - **`reasoning_forge/agents/base_agent.py`** | |
| - Added `orchestrator` parameter to `__init__` | |
| - Implemented `_analyze_with_llm()` for real inference | |
| - Kept `_analyze_with_template()` as fallback | |
| - `analyze()` now tries LLM first, falls back to templates | |
| - **All agent subclasses**: Added `adapter_name` attribute | |
| - `newton_agent.py`: `adapter_name = "newton"` | |
| - `quantum_agent.py`: `adapter_name = "quantum"` | |
| - `davinci_agent.py`: `adapter_name = "davinci"` | |
| - `philosophy_agent.py`: `adapter_name = "philosophy"` | |
| - `empathy_agent.py`: `adapter_name = "empathy"` | |
| - `ethics_agent.py`: `adapter_name = "philosophy"` (shared) | |
| - `critic_agent.py`: `adapter_name = "multi_perspective"` + new `evaluate_ensemble_with_llm()` method | |
| ### Orchestrator Integration | |
| - **`reasoning_forge/forge_engine.py`** | |
| - Added `orchestrator` parameter to `__init__` | |
| - Lazy-loads `CodetteOrchestrator` if not provided | |
| - Passes orchestrator to all agent constructors | |
| - Graceful fallback to template mode if LLM unavailable | |
| ## How It Works | |
| ### Startup Flow | |
| ``` | |
| ForgeEngine.__init__() | |
| → Lazy-load CodetteOrchestrator (first call ~60s) | |
| → Instantiate agents with orchestrator | |
| → forge_with_debate(query) | |
| → For each agent: agent.analyze(concept) | |
| → If orchestrator available: Call LLM with adapter | |
| → Else: Use templates (backward compatible) | |
| ``` | |
| ### LLM Inference Flow | |
| ``` | |
| agent.analyze(concept) | |
| 1. Check: do we have orchestrator + adapter_name? | |
| 2. If yes: orchestrator.generate( | |
| query=concept, | |
| adapter_name="newton", # Newton-specific reasoning | |
| system_prompt=template, # Guides the reasoning | |
| enable_tools=False | |
| ) | |
| 3. If no: Fall back to template substitution | |
| 4. Return domain-specific analysis | |
| ``` | |
| ## Adapter Mapping | |
| | Agent | Adapter | Purpose | | |
| |-------|---------|---------| | |
| | Newton | `newton` | Physics, mathematics, causal reasoning | | |
| | Quantum | `quantum` | Probabilistic, uncertainty, superposition | | |
| | DaVinci | `davinci` | Creative invention, cross-domain synthesis | | |
| | Philosophy | `philosophy` | Epistemology, ontology, conceptual foundations | | |
| | Empathy | `empathy` | Emotional intelligence, human impact | | |
| | Ethics | `philosophy` | Moral reasoning, consequences (shared adapter) | | |
| | Critic | `multi_perspective` | Meta-evaluation, ensemble critique | | |
| ## Testing | |
| Run the integration test: | |
| ```bash | |
| python test_agent_llm_integration.py | |
| ``` | |
| This verifies: | |
| 1. ForgeEngine loads with orchestrator | |
| 2. Agents receive orchestrator instance | |
| 3. Single agent generates real LLM response | |
| 4. Multi-agent ensemble works | |
| 5. Debate mode produces coherent synthesis | |
| ## Performance Impact | |
| - **First debate**: ~60s (orchestrator initialization) | |
| - **Subsequent debates**: ~30-60s (LLM inference time) | |
| - **Agent initialization**: <1ms (orchestrator already loaded) | |
| ## Backward Compatibility | |
| If the LLM/orchestrator is unavailable: | |
| 1. ForgeEngine logs a warning | |
| 2. Agents automatically fall back to templates | |
| 3. System continues to work (with lower quality) | |
| This allows: | |
| - Testing without the LLM loaded | |
| - Fast template-based iteration | |
| - Graceful degradation | |
| ## Expected Quality Improvements | |
| With real LLM-based agents: | |
| - **Correctness**: Should increase (domain-specific reasoning) | |
| - **Depth**: Should increase (richer debate fuel) | |
| - **Synthesis**: Should improve (agents actually understand concepts) | |
| - **Contradictions**: Should decrease (coherent reasoning per adapter) | |
| ## Next Steps | |
| 1. Run `test_agent_llm_integration.py` to verify setup | |
| 2. Run evaluation: `python evaluation/run_evaluation_sprint.py --questions 5` | |
| 3. Compare results to previous template-based baseline | |
| 4. Iterate on Phase 6 control mechanisms with real agents | |
| ## Files Available | |
| - **Test**: `test_agent_llm_integration.py` — Integration validation | |
| - **Models**: | |
| - Base: `bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf` | |
| - Adapters: `adapters/*.gguf` (8 LoRA adapters, ~27 MB each) | |
| - Alternative: `hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF/llama-3.2-1b-instruct-q8_0.gguf` | |