File size: 11,725 Bytes
599a754
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# P0 - Systemic Provider Mismatch Across All Modes

**Status:** RESOLVED
**Priority:** P0 (Blocker for Free Tier/Demo)
**Found:** 2025-11-30 (during Audit)
**Resolved:** 2025-11-30
**Component:** Multiple files across orchestrators, agents, services

## Resolution Summary

The critical provider mismatch bug has been fixed by implementing auto-detection in `src/agent_factory/judges.py`.
The `get_model()` function now checks for actual API key availability (`has_openai_key`, `has_anthropic_key`, `has_huggingface_key`)
instead of relying on the static `settings.llm_provider` configuration.

### Fix Details

- **Auto-Detection Implemented**: `get_model()` prioritizes OpenAI > Anthropic > HuggingFace based on *available keys*.
- **Fail-Fast on No Keys**: If no API keys are configured, `get_model()` raises `ConfigurationError` with clear message.
- **HuggingFace Requires Token**: Free Tier via `HuggingFaceModel` requires `HF_TOKEN` (PydanticAI requirement).
- **Synthesis Fallback**: When `get_model()` fails, synthesis gracefully falls back to template.
- **Audit Fixes Applied**:
    - Replaced manual `os.getenv` checks with centralized `settings` properties in `src/app.py`.
    - Added logging to `src/services/statistical_analyzer.py` (fixed silent `pass`).
    - Narrowed exception handling in `src/tools/pubmed.py`.
    - Optimized string search in `src/tools/code_execution.py`.

### Key Clarification

The **Free Tier** in Simple Mode uses `HFInferenceJudgeHandler` (which uses `huggingface_hub.InferenceClient`)
for judging - this does NOT require `HF_TOKEN`. However, synthesis via `get_model()` uses PydanticAI's
`HuggingFaceModel` which DOES require `HF_TOKEN`. When no tokens are configured, synthesis falls back to
the template-based summary (which is still useful).

### Verification

- **Unit Tests**: 5 new TDD tests in `tests/unit/agent_factory/test_get_model_auto_detect.py` pass.
- **All Tests**: 309 tests pass (`make check` succeeds).
- **Regression Tests**: Fixed and verified `tests/unit/agent_factory/test_judges_factory.py`.

---

## Symptom (Archive)

When running in "Simple Mode" (Free Tier / No API Key), the synthesis step fails to generate a narrative and falls back to a structured summary template. The user sees:

```text
> ⚠️ Note: AI narrative synthesis unavailable. Showing structured summary.
> _Error: OpenAIError_
```

## Affected Files (COMPREHENSIVE AUDIT)

### Files Calling `get_model()` Directly (9 locations)

| File | Line | Context | Impact |
|------|------|---------|--------|
| `simple.py` | 547 | Synthesis step | Free Tier broken |
| `statistical_analyzer.py` | 75 | Analysis agent | Free Tier broken |
| `judge_agent_llm.py` | 18 | LLM Judge | Free Tier broken |
| `graph/nodes.py` | 177 | LangGraph hypothesis | Free Tier broken |
| `graph/nodes.py` | 249 | LangGraph synthesis | Free Tier broken |
| `report_agent.py` | 45 | Report generation | Free Tier broken |
| `hypothesis_agent.py` | 44 | Hypothesis generation | Free Tier broken |
| `judges.py` | 100 | JudgeHandler default | OK (accepts param) |

### Files Hardcoding `OpenAIChatClient` (Architecturally OpenAI-Only)

| File | Lines | Context |
|------|-------|---------|
| `advanced.py` | 100, 121 | Manager client |
| `magentic_agents.py` | 29, 70, 129, 173 | All 4 agents |
| `retrieval_agent.py` | 62 | Retrieval agent |
| `code_executor_agent.py` | 52 | Code executor |
| `llm_factory.py` | 42 | Factory default |

**Note:** Advanced mode is architecturally locked to OpenAI via `agent_framework.openai.OpenAIChatClient`. This is by design - see `app.py:188-194` which falls back to Simple mode if no OpenAI key. However, users are not clearly informed of this limitation.

## Root Cause

**Settings/Runtime Sync Gap - Two Separate Backend Selection Systems.**

The codebase has **two independent** systems for selecting the LLM backend:
1. `settings.llm_provider` (config.py default: "openai")
2. `app.py` runtime detection via `os.getenv()` checks

These are **never synchronized**, causing the Judge and Synthesis steps to use different backends.

### Detailed Call Chain

1.  **`src/app.py:115-126`** (runtime detection):
    ```python
    # app.py bypasses settings entirely for JudgeHandler selection
    elif os.getenv("OPENAI_API_KEY"):
        judge_handler = JudgeHandler(model=None, domain=domain)
    elif os.getenv("ANTHROPIC_API_KEY"):
        judge_handler = JudgeHandler(model=None, domain=domain)
    else:
        judge_handler = HFInferenceJudgeHandler(domain=domain)  # Free Tier
    ```
    **Note:** This creates the correct handler but does NOT update `settings.llm_provider`.

2.  **`src/orchestrators/simple.py:546-552`** (synthesis step):
    ```python
    from src.agent_factory.judges import get_model
    agent: Agent[None, str] = Agent(model=get_model(), ...)  # <-- BUG!
    ```
    Synthesis calls `get_model()` directly instead of using the injected judge's model.

3.  **`src/agent_factory/judges.py:56-78`** (`get_model()`):
    ```python
    def get_model() -> Any:
        llm_provider = settings.llm_provider  # <-- Reads from settings (still "openai")
        # ...
        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)  # <-- None!
        return OpenAIChatModel(settings.openai_model, provider=openai_provider)
    ```
    **Result:** Creates OpenAI model with `api_key=None` β†’ `OpenAIError`

### Why Free Tier Fails

| Step | System Used | Backend Selected |
|------|-------------|------------------|
| JudgeHandler | `app.py` runtime | HFInferenceJudgeHandler βœ… |
| Synthesis | `settings.llm_provider` | OpenAI (default) ❌ |

The Judge works because app.py explicitly creates `HFInferenceJudgeHandler`.
Synthesis fails because it calls `get_model()` which reads `settings.llm_provider = "openai"` (unchanged from default).

## Impact

-   **User Experience:** Free tier users (Demo users) never see the high-quality narrative synthesis, only the fallback.
-   **System Integrity:** The orchestrator ignores the runtime backend selection.

## Implemented Fix

**Strategy: Fix `get_model()` to Auto-Detect Available Provider**

### Actual Implementation (Merged)

**File:** `src/agent_factory/judges.py`

This is the **single point of fix** that resolves all 7 broken `get_model()` call sites.

```python
def get_model() -> Any:
    """Get the LLM model based on available API keys.

    Priority order:
    1. OpenAI (if OPENAI_API_KEY set)
    2. Anthropic (if ANTHROPIC_API_KEY set)
    3. HuggingFace (if HF_TOKEN set)

    Raises:
        ConfigurationError: If no API keys are configured.

    Note: settings.llm_provider is ignored in favor of actual key availability.
    This ensures the model matches what app.py selected for JudgeHandler.
    """
    from src.utils.exceptions import ConfigurationError

    # Priority 1: OpenAI (most common, best tool calling)
    if settings.has_openai_key:
        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
        return OpenAIChatModel(settings.openai_model, provider=openai_provider)

    # Priority 2: Anthropic
    if settings.has_anthropic_key:
        provider = AnthropicProvider(api_key=settings.anthropic_api_key)
        return AnthropicModel(settings.anthropic_model, provider=provider)

    # Priority 3: HuggingFace (requires HF_TOKEN)
    if settings.has_huggingface_key:
        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
        return HuggingFaceModel(model_name, provider=hf_provider)

    # No keys configured - fail fast with clear error
    raise ConfigurationError(
        "No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
    )
```

**Why this works:**
- Single fix location updates all 7 broken call sites
- Matches app.py's detection logic (key availability, not settings.llm_provider)
- HuggingFace works when HF_TOKEN is available
- Raises clear error when no keys configured (callers can catch and fallback)
- No changes needed to orchestrators, agents, or services

### What This Does NOT Fix (By Design)

**Advanced Mode remains OpenAI-only.** The following files use `agent_framework.openai.OpenAIChatClient` which only supports OpenAI:

- `advanced.py` (Manager + agents)
- `magentic_agents.py` (SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent)
- `retrieval_agent.py`, `code_executor_agent.py`

This is **by design** - the Microsoft Agent Framework library (`agent-framework-core`) only provides `OpenAIChatClient`. To support other providers in Advanced mode would require:
1. Wait for `agent-framework` to add Anthropic/HuggingFace clients, OR
2. Write our own `ChatClient` implementations (significant effort)

**The current app.py behavior is correct:** it falls back to Simple mode when no OpenAI key is present (lines 188-194). The UI message could be clearer about why.

## Test Plan (Implemented)

### Unit Tests (Verified Passing)

```python
# tests/unit/agent_factory/test_get_model_auto_detect.py

import pytest
from src.agent_factory.judges import get_model
from src.utils.config import settings
from src.utils.exceptions import ConfigurationError

class TestGetModelAutoDetect:
    """Test that get_model() auto-detects available providers."""

    def test_returns_openai_when_key_present(self, monkeypatch):
        """OpenAI key present β†’ OpenAI model."""
        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", None)
        model = get_model()
        assert isinstance(model, OpenAIChatModel)

    def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
        """Only Anthropic key β†’ Anthropic model."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
        monkeypatch.setattr(settings, "hf_token", None)
        model = get_model()
        assert isinstance(model, AnthropicModel)

    def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
        """HF_TOKEN present (no paid keys) β†’ HuggingFace model."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", "hf_test_token")
        model = get_model()
        assert isinstance(model, HuggingFaceModel)

    def test_raises_error_when_no_keys(self, monkeypatch):
        """No keys at all β†’ ConfigurationError."""
        monkeypatch.setattr(settings, "openai_api_key", None)
        monkeypatch.setattr(settings, "anthropic_api_key", None)
        monkeypatch.setattr(settings, "hf_token", None)
        with pytest.raises(ConfigurationError) as exc_info:
            get_model()
        assert "No LLM API key configured" in str(exc_info.value)

    def test_openai_takes_priority_over_anthropic(self, monkeypatch):
        """Both keys present β†’ OpenAI wins."""
        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
        model = get_model()
        assert isinstance(model, OpenAIChatModel)
```

### Full Test Suite

```bash
$ make check
# 309 passed in 238.16s (0:03:58)
# All checks passed!
```

### Manual Verification

1. **Unset all API keys**: `unset OPENAI_API_KEY ANTHROPIC_API_KEY HF_TOKEN`
2. **Run app**: `uv run python -m src.app`
3. **Submit query**: "What drugs improve female libido?"
4. **Verify**: Synthesis falls back to template (shows `ConfigurationError` in logs, but user sees structured summary)