Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
P0 - Systemic Provider Mismatch Across All Modes
Status: RESOLVED Priority: P0 (Blocker for Free Tier/Demo) Found: 2025-11-30 (during Audit) Resolved: 2025-11-30 Component: Multiple files across orchestrators, agents, services
Resolution Summary
The critical provider mismatch bug has been fixed by implementing auto-detection in src/agent_factory/judges.py.
The get_model() function now checks for actual API key availability (has_openai_key, has_anthropic_key, has_huggingface_key)
instead of relying on the static settings.llm_provider configuration.
Fix Details
- Auto-Detection Implemented:
get_model()prioritizes OpenAI > Anthropic > HuggingFace based on available keys. - Fail-Fast on No Keys: If no API keys are configured,
get_model()raisesConfigurationErrorwith clear message. - HuggingFace Requires Token: Free Tier via
HuggingFaceModelrequiresHF_TOKEN(PydanticAI requirement). - Synthesis Fallback: When
get_model()fails, synthesis gracefully falls back to template. - Audit Fixes Applied:
- Replaced manual
os.getenvchecks with centralizedsettingsproperties insrc/app.py. - Added logging to
src/services/statistical_analyzer.py(fixed silentpass). - Narrowed exception handling in
src/tools/pubmed.py. - Optimized string search in
src/tools/code_execution.py.
- Replaced manual
Key Clarification
The Free Tier in Simple Mode uses HFInferenceJudgeHandler (which uses huggingface_hub.InferenceClient)
for judging - this does NOT require HF_TOKEN. However, synthesis via get_model() uses PydanticAI's
HuggingFaceModel which DOES require HF_TOKEN. When no tokens are configured, synthesis falls back to
the template-based summary (which is still useful).
Verification
- Unit Tests: 5 new TDD tests in
tests/unit/agent_factory/test_get_model_auto_detect.pypass. - All Tests: 309 tests pass (
make checksucceeds). - Regression Tests: Fixed and verified
tests/unit/agent_factory/test_judges_factory.py.
Symptom (Archive)
When running in "Simple Mode" (Free Tier / No API Key), the synthesis step fails to generate a narrative and falls back to a structured summary template. The user sees:
> β οΈ Note: AI narrative synthesis unavailable. Showing structured summary.
> _Error: OpenAIError_
Affected Files (COMPREHENSIVE AUDIT)
Files Calling get_model() Directly (9 locations)
| File | Line | Context | Impact |
|---|---|---|---|
simple.py |
547 | Synthesis step | Free Tier broken |
statistical_analyzer.py |
75 | Analysis agent | Free Tier broken |
judge_agent_llm.py |
18 | LLM Judge | Free Tier broken |
graph/nodes.py |
177 | LangGraph hypothesis | Free Tier broken |
graph/nodes.py |
249 | LangGraph synthesis | Free Tier broken |
report_agent.py |
45 | Report generation | Free Tier broken |
hypothesis_agent.py |
44 | Hypothesis generation | Free Tier broken |
judges.py |
100 | JudgeHandler default | OK (accepts param) |
Files Hardcoding OpenAIChatClient (Architecturally OpenAI-Only)
| File | Lines | Context |
|---|---|---|
advanced.py |
100, 121 | Manager client |
magentic_agents.py |
29, 70, 129, 173 | All 4 agents |
retrieval_agent.py |
62 | Retrieval agent |
code_executor_agent.py |
52 | Code executor |
llm_factory.py |
42 | Factory default |
Note: Advanced mode is architecturally locked to OpenAI via agent_framework.openai.OpenAIChatClient. This is by design - see app.py:188-194 which falls back to Simple mode if no OpenAI key. However, users are not clearly informed of this limitation.
Root Cause
Settings/Runtime Sync Gap - Two Separate Backend Selection Systems.
The codebase has two independent systems for selecting the LLM backend:
settings.llm_provider(config.py default: "openai")app.pyruntime detection viaos.getenv()checks
These are never synchronized, causing the Judge and Synthesis steps to use different backends.
Detailed Call Chain
src/app.py:115-126(runtime detection):# app.py bypasses settings entirely for JudgeHandler selection elif os.getenv("OPENAI_API_KEY"): judge_handler = JudgeHandler(model=None, domain=domain) elif os.getenv("ANTHROPIC_API_KEY"): judge_handler = JudgeHandler(model=None, domain=domain) else: judge_handler = HFInferenceJudgeHandler(domain=domain) # Free TierNote: This creates the correct handler but does NOT update
settings.llm_provider.src/orchestrators/simple.py:546-552(synthesis step):from src.agent_factory.judges import get_model agent: Agent[None, str] = Agent(model=get_model(), ...) # <-- BUG!Synthesis calls
get_model()directly instead of using the injected judge's model.src/agent_factory/judges.py:56-78(get_model()):def get_model() -> Any: llm_provider = settings.llm_provider # <-- Reads from settings (still "openai") # ... openai_provider = OpenAIProvider(api_key=settings.openai_api_key) # <-- None! return OpenAIChatModel(settings.openai_model, provider=openai_provider)Result: Creates OpenAI model with
api_key=NoneβOpenAIError
Why Free Tier Fails
| Step | System Used | Backend Selected |
|---|---|---|
| JudgeHandler | app.py runtime |
HFInferenceJudgeHandler β |
| Synthesis | settings.llm_provider |
OpenAI (default) β |
The Judge works because app.py explicitly creates HFInferenceJudgeHandler.
Synthesis fails because it calls get_model() which reads settings.llm_provider = "openai" (unchanged from default).
Impact
- User Experience: Free tier users (Demo users) never see the high-quality narrative synthesis, only the fallback.
- System Integrity: The orchestrator ignores the runtime backend selection.
Implemented Fix
Strategy: Fix get_model() to Auto-Detect Available Provider
Actual Implementation (Merged)
File: src/agent_factory/judges.py
This is the single point of fix that resolves all 7 broken get_model() call sites.
def get_model() -> Any:
"""Get the LLM model based on available API keys.
Priority order:
1. OpenAI (if OPENAI_API_KEY set)
2. Anthropic (if ANTHROPIC_API_KEY set)
3. HuggingFace (if HF_TOKEN set)
Raises:
ConfigurationError: If no API keys are configured.
Note: settings.llm_provider is ignored in favor of actual key availability.
This ensures the model matches what app.py selected for JudgeHandler.
"""
from src.utils.exceptions import ConfigurationError
# Priority 1: OpenAI (most common, best tool calling)
if settings.has_openai_key:
openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
return OpenAIChatModel(settings.openai_model, provider=openai_provider)
# Priority 2: Anthropic
if settings.has_anthropic_key:
provider = AnthropicProvider(api_key=settings.anthropic_api_key)
return AnthropicModel(settings.anthropic_model, provider=provider)
# Priority 3: HuggingFace (requires HF_TOKEN)
if settings.has_huggingface_key:
model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
return HuggingFaceModel(model_name, provider=hf_provider)
# No keys configured - fail fast with clear error
raise ConfigurationError(
"No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
)
Why this works:
- Single fix location updates all 7 broken call sites
- Matches app.py's detection logic (key availability, not settings.llm_provider)
- HuggingFace works when HF_TOKEN is available
- Raises clear error when no keys configured (callers can catch and fallback)
- No changes needed to orchestrators, agents, or services
What This Does NOT Fix (By Design)
Advanced Mode remains OpenAI-only. The following files use agent_framework.openai.OpenAIChatClient which only supports OpenAI:
advanced.py(Manager + agents)magentic_agents.py(SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent)retrieval_agent.py,code_executor_agent.py
This is by design - the Microsoft Agent Framework library (agent-framework-core) only provides OpenAIChatClient. To support other providers in Advanced mode would require:
- Wait for
agent-frameworkto add Anthropic/HuggingFace clients, OR - Write our own
ChatClientimplementations (significant effort)
The current app.py behavior is correct: it falls back to Simple mode when no OpenAI key is present (lines 188-194). The UI message could be clearer about why.
Test Plan (Implemented)
Unit Tests (Verified Passing)
# tests/unit/agent_factory/test_get_model_auto_detect.py
import pytest
from src.agent_factory.judges import get_model
from src.utils.config import settings
from src.utils.exceptions import ConfigurationError
class TestGetModelAutoDetect:
"""Test that get_model() auto-detects available providers."""
def test_returns_openai_when_key_present(self, monkeypatch):
"""OpenAI key present β OpenAI model."""
monkeypatch.setattr(settings, "openai_api_key", "sk-test")
monkeypatch.setattr(settings, "anthropic_api_key", None)
monkeypatch.setattr(settings, "hf_token", None)
model = get_model()
assert isinstance(model, OpenAIChatModel)
def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
"""Only Anthropic key β Anthropic model."""
monkeypatch.setattr(settings, "openai_api_key", None)
monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
monkeypatch.setattr(settings, "hf_token", None)
model = get_model()
assert isinstance(model, AnthropicModel)
def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
"""HF_TOKEN present (no paid keys) β HuggingFace model."""
monkeypatch.setattr(settings, "openai_api_key", None)
monkeypatch.setattr(settings, "anthropic_api_key", None)
monkeypatch.setattr(settings, "hf_token", "hf_test_token")
model = get_model()
assert isinstance(model, HuggingFaceModel)
def test_raises_error_when_no_keys(self, monkeypatch):
"""No keys at all β ConfigurationError."""
monkeypatch.setattr(settings, "openai_api_key", None)
monkeypatch.setattr(settings, "anthropic_api_key", None)
monkeypatch.setattr(settings, "hf_token", None)
with pytest.raises(ConfigurationError) as exc_info:
get_model()
assert "No LLM API key configured" in str(exc_info.value)
def test_openai_takes_priority_over_anthropic(self, monkeypatch):
"""Both keys present β OpenAI wins."""
monkeypatch.setattr(settings, "openai_api_key", "sk-test")
monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
model = get_model()
assert isinstance(model, OpenAIChatModel)
Full Test Suite
$ make check
# 309 passed in 238.16s (0:03:58)
# All checks passed!
Manual Verification
- Unset all API keys:
unset OPENAI_API_KEY ANTHROPIC_API_KEY HF_TOKEN - Run app:
uv run python -m src.app - Submit query: "What drugs improve female libido?"
- Verify: Synthesis falls back to template (shows
ConfigurationErrorin logs, but user sees structured summary)