DeepBoner / docs /bugs /P0_SYNTHESIS_PROVIDER_MISMATCH.md
VibecoderMcSwaggins's picture
fix: P0 provider mismatch and code quality audit fixes (#102)
599a754 unverified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

P0 - Systemic Provider Mismatch Across All Modes

Status: RESOLVED Priority: P0 (Blocker for Free Tier/Demo) Found: 2025-11-30 (during Audit) Resolved: 2025-11-30 Component: Multiple files across orchestrators, agents, services

Resolution Summary

The critical provider mismatch bug has been fixed by implementing auto-detection in src/agent_factory/judges.py. The get_model() function now checks for actual API key availability (has_openai_key, has_anthropic_key, has_huggingface_key) instead of relying on the static settings.llm_provider configuration.

Fix Details

  • Auto-Detection Implemented: get_model() prioritizes OpenAI > Anthropic > HuggingFace based on available keys.
  • Fail-Fast on No Keys: If no API keys are configured, get_model() raises ConfigurationError with clear message.
  • HuggingFace Requires Token: Free Tier via HuggingFaceModel requires HF_TOKEN (PydanticAI requirement).
  • Synthesis Fallback: When get_model() fails, synthesis gracefully falls back to template.
  • Audit Fixes Applied:
    • Replaced manual os.getenv checks with centralized settings properties in src/app.py.
    • Added logging to src/services/statistical_analyzer.py (fixed silent pass).
    • Narrowed exception handling in src/tools/pubmed.py.
    • Optimized string search in src/tools/code_execution.py.

Key Clarification

The Free Tier in Simple Mode uses HFInferenceJudgeHandler (which uses huggingface_hub.InferenceClient) for judging - this does NOT require HF_TOKEN. However, synthesis via get_model() uses PydanticAI's HuggingFaceModel which DOES require HF_TOKEN. When no tokens are configured, synthesis falls back to the template-based summary (which is still useful).

Verification

  • Unit Tests: 5 new TDD tests in tests/unit/agent_factory/test_get_model_auto_detect.py pass.
  • All Tests: 309 tests pass (make check succeeds).
  • Regression Tests: Fixed and verified tests/unit/agent_factory/test_judges_factory.py.

Symptom (Archive)

When running in "Simple Mode" (Free Tier / No API Key), the synthesis step fails to generate a narrative and falls back to a structured summary template. The user sees:

> ⚠️ Note: AI narrative synthesis unavailable. Showing structured summary.
> _Error: OpenAIError_

Affected Files (COMPREHENSIVE AUDIT)

Files Calling get_model() Directly (9 locations)

File Line Context Impact
simple.py 547 Synthesis step Free Tier broken
statistical_analyzer.py 75 Analysis agent Free Tier broken
judge_agent_llm.py 18 LLM Judge Free Tier broken
graph/nodes.py 177 LangGraph hypothesis Free Tier broken
graph/nodes.py 249 LangGraph synthesis Free Tier broken
report_agent.py 45 Report generation Free Tier broken
hypothesis_agent.py 44 Hypothesis generation Free Tier broken
judges.py 100 JudgeHandler default OK (accepts param)

Files Hardcoding OpenAIChatClient (Architecturally OpenAI-Only)

File Lines Context
advanced.py 100, 121 Manager client
magentic_agents.py 29, 70, 129, 173 All 4 agents
retrieval_agent.py 62 Retrieval agent
code_executor_agent.py 52 Code executor
llm_factory.py 42 Factory default

Note: Advanced mode is architecturally locked to OpenAI via agent_framework.openai.OpenAIChatClient. This is by design - see app.py:188-194 which falls back to Simple mode if no OpenAI key. However, users are not clearly informed of this limitation.

Root Cause

Settings/Runtime Sync Gap - Two Separate Backend Selection Systems.

The codebase has two independent systems for selecting the LLM backend:

  1. settings.llm_provider (config.py default: "openai")
  2. app.py runtime detection via os.getenv() checks

These are never synchronized, causing the Judge and Synthesis steps to use different backends.

Detailed Call Chain

  1. src/app.py:115-126 (runtime detection):

    # app.py bypasses settings entirely for JudgeHandler selection
    elif os.getenv("OPENAI_API_KEY"):
        judge_handler = JudgeHandler(model=None, domain=domain)
    elif os.getenv("ANTHROPIC_API_KEY"):
        judge_handler = JudgeHandler(model=None, domain=domain)
    else:
        judge_handler = HFInferenceJudgeHandler(domain=domain)  # Free Tier
    

    Note: This creates the correct handler but does NOT update settings.llm_provider.

  2. src/orchestrators/simple.py:546-552 (synthesis step):

    from src.agent_factory.judges import get_model
    agent: Agent[None, str] = Agent(model=get_model(), ...)  # <-- BUG!
    

    Synthesis calls get_model() directly instead of using the injected judge's model.

  3. src/agent_factory/judges.py:56-78 (get_model()):

    def get_model() -> Any:
        llm_provider = settings.llm_provider  # <-- Reads from settings (still "openai")
        # ...
        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)  # <-- None!
        return OpenAIChatModel(settings.openai_model, provider=openai_provider)
    

    Result: Creates OpenAI model with api_key=None β†’ OpenAIError

Why Free Tier Fails

Step System Used Backend Selected
JudgeHandler app.py runtime HFInferenceJudgeHandler βœ…
Synthesis settings.llm_provider OpenAI (default) ❌

The Judge works because app.py explicitly creates HFInferenceJudgeHandler. Synthesis fails because it calls get_model() which reads settings.llm_provider = "openai" (unchanged from default).

Impact

  • User Experience: Free tier users (Demo users) never see the high-quality narrative synthesis, only the fallback.
  • System Integrity: The orchestrator ignores the runtime backend selection.

Implemented Fix

Strategy: Fix get_model() to Auto-Detect Available Provider

Actual Implementation (Merged)

File: src/agent_factory/judges.py

This is the single point of fix that resolves all 7 broken get_model() call sites.

def get_model() -> Any:
    """Get the LLM model based on available API keys.

    Priority order:
    1. OpenAI (if OPENAI_API_KEY set)
    2. Anthropic (if ANTHROPIC_API_KEY set)
    3. HuggingFace (if HF_TOKEN set)

    Raises:
        ConfigurationError: If no API keys are configured.

    Note: settings.llm_provider is ignored in favor of actual key availability.
    This ensures the model matches what app.py selected for JudgeHandler.
    """
    from src.utils.exceptions import ConfigurationError

    # Priority 1: OpenAI (most common, best tool calling)
    if settings.has_openai_key:
        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
        return OpenAIChatModel(settings.openai_model, provider=openai_provider)

    # Priority 2: Anthropic
    if settings.has_anthropic_key:
        provider = AnthropicProvider(api_key=settings.anthropic_api_key)
        return AnthropicModel(settings.anthropic_model, provider=provider)

    # Priority 3: HuggingFace (requires HF_TOKEN)
    if settings.has_huggingface_key:
        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
        return HuggingFaceModel(model_name, provider=hf_provider)

    # No keys configured - fail fast with clear error
    raise ConfigurationError(
        "No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
    )

Why this works:

  • Single fix location updates all 7 broken call sites
  • Matches app.py's detection logic (key availability, not settings.llm_provider)
  • HuggingFace works when HF_TOKEN is available
  • Raises clear error when no keys configured (callers can catch and fallback)
  • No changes needed to orchestrators, agents, or services

What This Does NOT Fix (By Design)

Advanced Mode remains OpenAI-only. The following files use agent_framework.openai.OpenAIChatClient which only supports OpenAI:

  • advanced.py (Manager + agents)
  • magentic_agents.py (SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent)
  • retrieval_agent.py, code_executor_agent.py

This is by design - the Microsoft Agent Framework library (agent-framework-core) only provides OpenAIChatClient. To support other providers in Advanced mode would require:

  1. Wait for agent-framework to add Anthropic/HuggingFace clients, OR
  2. Write our own ChatClient implementations (significant effort)

The current app.py behavior is correct: it falls back to Simple mode when no OpenAI key is present (lines 188-194). The UI message could be clearer about why.

Test Plan (Implemented)

Unit Tests (Verified Passing)

# tests/unit/agent_factory/test_get_model_auto_detect.py

import pytest
from src.agent_factory.judges import get_model
from src.utils.config import settings
from src.utils.exceptions import ConfigurationError

class TestGetModelAutoDetect:
    """Test that get_model() auto-detects available providers."""

    def test_returns_openai_when_key_present(self, monkeypatch):
        """OpenAI key present β†’ OpenAI model."""
        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", None)
        model = get_model()
        assert isinstance(model, OpenAIChatModel)

    def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
        """Only Anthropic key β†’ Anthropic model."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
        monkeypatch.setattr(settings, "hf_token", None)
        model = get_model()
        assert isinstance(model, AnthropicModel)

    def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
        """HF_TOKEN present (no paid keys) β†’ HuggingFace model."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", "hf_test_token")
        model = get_model()
        assert isinstance(model, HuggingFaceModel)

    def test_raises_error_when_no_keys(self, monkeypatch):
        """No keys at all β†’ ConfigurationError."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", None)
        with pytest.raises(ConfigurationError) as exc_info:
            get_model()
        assert "No LLM API key configured" in str(exc_info.value)

    def test_openai_takes_priority_over_anthropic(self, monkeypatch):
        """Both keys present β†’ OpenAI wins."""
        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
        model = get_model()
        assert isinstance(model, OpenAIChatModel)

Full Test Suite

$ make check
# 309 passed in 238.16s (0:03:58)
# All checks passed!

Manual Verification

  1. Unset all API keys: unset OPENAI_API_KEY ANTHROPIC_API_KEY HF_TOKEN
  2. Run app: uv run python -m src.app
  3. Submit query: "What drugs improve female libido?"
  4. Verify: Synthesis falls back to template (shows ConfigurationError in logs, but user sees structured summary)