Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on 14 days ago

Commit

5867d1f

unverified ·

2 Parent(s): 9c9d382 4bfa475

Merge pull request #119 from The-Obstacle-Is-The-Way/claude/fix-openai-key-routing-01BG9DYmtFkjtjGWhzj9UNXi

Browse files

Files changed (24) hide show

AGENTS.md +14 -2
CLAUDE.md +14 -2
GEMINI.md +14 -2
P2_7B_MODEL_GARBAGE_OUTPUT.md +224 -0
P2_ARCHITECTURAL_BYOK_GAPS.md +100 -0
P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md +160 -0
pyproject.toml +2 -0
src/agent_factory/judges.py +48 -27
src/agents/judge_agent_llm.py +7 -2
src/agents/magentic_agents.py +12 -4
src/clients/factory.py +27 -8
src/clients/huggingface.py +3 -2
src/orchestrators/advanced.py +4 -1
src/orchestrators/factory.py +1 -1
src/orchestrators/hierarchical.py +12 -5
src/orchestrators/langgraph_orchestrator.py +23 -8
src/services/llamaindex_rag.py +27 -6
src/utils/llm_factory.py +1 -6
src/utils/service_loader.py +28 -13
tests/unit/agent_factory/test_get_model_auto_detect.py +33 -19
tests/unit/agent_factory/test_judges_factory.py +27 -24
tests/unit/clients/test_chat_client_factory.py +65 -0
tests/unit/services/test_service_loader.py +18 -1
uv.lock +5 -3

AGENTS.md CHANGED Viewed

@@ -104,10 +104,22 @@ DeepBonerError (base)
 Default models in `src/utils/config.py`:
-- **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
-- **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

 Default models in `src/utils/config.py`:
+- **OpenAI:** `gpt-5` - Flagship model
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
+**NOTE:** Anthropic is NOT supported (no embeddings API). See `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md`.
+---
+## ⚠️ OpenAI API Keys
+**If you have a valid OpenAI API key, it will work. Period.**
+- BYOK (Bring Your Own Key) auto-detects `sk-...` prefix and routes to OpenAI
+- If you get errors, the key is **invalid or expired** - NOT an access tier issue
+- **NEVER suggest "access tier" or "upgrade your plan"** - this is not how OpenAI works for API keys
+- Valid keys work. Invalid keys don't. That's it.
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

CLAUDE.md CHANGED Viewed

@@ -111,10 +111,22 @@ DeepBonerError (base)
 Default models in `src/utils/config.py`:
-- **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
-- **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

 Default models in `src/utils/config.py`:
+- **OpenAI:** `gpt-5` - Flagship model
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
+**NOTE:** Anthropic is NOT supported (no embeddings API). See `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md`.
+---
+## ⚠️ OpenAI API Keys
+**If you have a valid OpenAI API key, it will work. Period.**
+- BYOK (Bring Your Own Key) auto-detects `sk-...` prefix and routes to OpenAI
+- If you get errors, the key is **invalid or expired** - NOT an access tier issue
+- **NEVER suggest "access tier" or "upgrade your plan"** - this is not how OpenAI works for API keys
+- Valid keys work. Invalid keys don't. That's it.
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

GEMINI.md CHANGED Viewed

@@ -86,10 +86,22 @@ Settings via pydantic-settings from `.env`:
 Default models in `src/utils/config.py`:
-- **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
-- **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

 Default models in `src/utils/config.py`:
+- **OpenAI:** `gpt-5` - Flagship model
 - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
+**NOTE:** Anthropic is NOT supported (no embeddings API). See `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md`.
+---
+## ⚠️ OpenAI API Keys
+**If you have a valid OpenAI API key, it will work. Period.**
+- BYOK (Bring Your Own Key) auto-detects `sk-...` prefix and routes to OpenAI
+- If you get errors, the key is **invalid or expired** - NOT an access tier issue
+- **NEVER suggest "access tier" or "upgrade your plan"** - this is not how OpenAI works for API keys
+- Valid keys work. Invalid keys don't. That's it.
 ---
 ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture

P2_7B_MODEL_GARBAGE_OUTPUT.md ADDED Viewed

	@@ -0,0 +1,224 @@

+# P2 Bug: 7B Model Produces Garbage Streaming Output
+**Date**: 2025-12-02
+**Status**: OPEN - Investigating
+**Severity**: P2 (Major - Degrades User Experience)
+**Component**: Free Tier / HuggingFace + Multi-Agent Orchestration
+---
+## Symptoms
+When running a research query on Free Tier (Qwen2.5-7B-Instruct), the streaming output shows **garbage tokens** instead of coherent agent reasoning:
+```
+📡 **STREAMING**: yarg
+📡 **STREAMING**: PostalCodes
+📡 **STREAMING**: PostalCodes
+📡 **STREAMING**: FunctionFlags
+📡 **STREAMING**: search_pubmed
+📡 **STREAMING**: search_clinical_trials
+📡 **STREAMING**: system
+📡 **STREAMING**: Transferred to searcher, adopt the persona immediately.
+```
+The model outputs random tokens like "yarg", "PostalCodes", "FunctionFlags" instead of actual research reasoning.
+---
+## Reproduction Steps
+1. Go to HuggingFace Spaces: https://huggingface.co/spaces/vcms/deepboner
+2. Leave API key empty (Free Tier)
+3. Click any example query or type a question
+4. Click submit
+5. Observe streaming output - garbage tokens appear
+**Expected**: Coherent agent reasoning like "Searching PubMed for female libido treatments..."
+**Actual**: Random tokens like "yarg", "PostalCodes"
+---
+## Root Cause Analysis
+### Primary Cause: 7B Model Too Small for Multi-Agent Prompts
+The Qwen2.5-7B-Instruct model has **insufficient reasoning capacity** for the complex multi-agent framework. The system requires the model to:
+1. **Adopt agent personas** with specialized instructions
+2. **Follow structured workflows** (Search → Judge → Hypothesis → Report)
+3. **Make tool calls** (search_pubmed, search_clinical_trials, etc.)
+4. **Generate JSON-formatted progress ledgers** for workflow control
+5. **Understand manager instructions** and delegate appropriately
+A 7B parameter model simply does not have the reasoning depth to handle this. Larger models (70B+) were originally intended, but those are routed to unreliable third-party providers (see `HF_FREE_TIER_ANALYSIS.md`).
+### Technical Flow (Where Garbage Appears)
+```
+User Query
+    ↓
+AdvancedOrchestrator.run() [advanced.py:247]
+    ↓
+workflow.run_stream(task) [builds Magentic workflow]
+    ↓
+MagenticAgentDeltaEvent emitted with event.text
+    ↓
+Yields AgentEvent(type="streaming", message=event.text) [advanced.py:314-319]
+    ↓
+Gradio displays: "📡 **STREAMING**: {garbage}"
+```
+The garbage tokens are **raw model output**. The 7B model is:
+- Not following the system prompt
+- Outputting partial/incomplete token sequences
+- Possibly attempting tool calls but formatting incorrectly
+- Hallucinating random words
+### Evidence from Microsoft Reference Framework
+The Microsoft Agent Framework's `_magentic.py` (lines 1717-1741) shows how agent invocation works:
+```python
+async for update in agent.run_stream(messages=self._chat_history):
+    updates.append(update)
+    await self._emit_agent_delta_event(ctx, update)
+```
+The framework passes through whatever the underlying chat client produces. If the model produces garbage, the framework streams it directly.
+### Why Click Example vs Submit Shows Different Initial State
+Both code paths go through the same `research_agent()` function in `app.py`. The difference:
+- **Example click**: Immediately submits query, so you see garbage quickly
+- **Submit button click**: Shows "Starting research (Advanced mode)" banner first, then garbage
+Both ultimately produce the same garbage output from the 7B model.
+---
+## Impact Assessment
+| Aspect | Impact |
+|--------|--------|
+| Free Tier Users | Cannot get usable research results |
+| Demo Quality | Appears broken/unprofessional |
+| Trust | Users may think the entire system is broken |
+| Differentiation | Undermines "free tier works!" messaging |
+---
+## Potential Solutions
+### Option 1: Switch to Better Small Model (Recommended - Quick Fix)
+Find a small model that better handles complex instructions. Candidates:
+| Model | Size | Tool Calling | Instruction Following |
+|-------|------|--------------|----------------------|
+| `mistralai/Mistral-7B-Instruct-v0.3` | 7B | Yes | Better |
+| `microsoft/Phi-3-mini-4k-instruct` | 3.8B | Limited | Good |
+| `google/gemma-2-9b-it` | 9B | Yes | Good |
+| `Qwen/Qwen2.5-14B-Instruct` | 14B | Yes | Better |
+**Risk**: 14B model might still be routed to third-party providers. Need to test each.
+### Option 2: Simplify Free Tier Architecture
+Create a **simpler single-agent mode** for Free Tier:
+- Remove multi-agent coordination (Manager, multiple ChatAgents)
+- Use a single direct query → search → synthesize flow
+- Reduce prompt complexity significantly
+**Pros**: More reliable with smaller models
+**Cons**: Loses sophisticated multi-agent research capability
+### Option 3: Output Filtering/Validation
+Add validation layer to detect and filter garbage output:
+```python
+def is_valid_streaming_token(text: str) -> bool:
+    """Check if streaming token appears valid."""
+    # Garbage patterns we've seen
+    garbage_patterns = ["yarg", "PostalCodes", "FunctionFlags"]
+    if any(g in text for g in garbage_patterns):
+        return False
+    # Check for minimum coherence (has spaces, reasonable length)
+    return len(text) > 0 and text.strip()
+```
+**Pros**: Band-aid fix, quick to implement
+**Cons**: Doesn't fix root cause, will miss new garbage patterns
+### Option 4: Graceful Degradation
+Detect when model output is incoherent and fall back to:
+- Returning an error message
+- Suggesting user provide an API key
+- Using a cached/templated response
+### Option 5: Prompt Engineering for 7B Models
+Significantly simplify the agent prompts for 7B compatibility:
+- Shorter system prompts
+- More explicit step-by-step instructions
+- Remove abstract concepts
+- Use few-shot examples
+---
+## Recommended Action Plan
+### Phase 1: Quick Fix (P2)
+1. Test `mistralai/Mistral-7B-Instruct-v0.3` or `Qwen/Qwen2.5-14B-Instruct`
+2. Verify they stay on HuggingFace native infrastructure (no third-party routing)
+3. Evaluate output quality on sample queries
+### Phase 2: Architecture Review (P3)
+1. Consider simplified single-agent mode for Free Tier
+2. Design graceful degradation when model output is invalid
+3. Add output validation layer
+### Phase 3: Long-term (P4)
+1. Consider hybrid approach: simple mode for free tier, advanced for paid
+2. Explore fine-tuning a small model specifically for research agent tasks
+---
+## Files Involved
+| File | Relevance |
+|------|-----------|
+| `src/orchestrators/advanced.py` | Main orchestrator, streaming event handling |
+| `src/clients/huggingface.py` | HuggingFace chat client adapter |
+| `src/agents/magentic_agents.py` | Agent definitions and prompts |
+| `src/app.py` | Gradio UI, event display |
+| `src/utils/config.py` | Model configuration |
+---
+## Relation to Previous Bugs
+- **P0 Repr Bug (RESOLVED)**: Fixed in PR #117 - Was about `<generator object>` appearing due to async generator mishandling
+- **P1 HuggingFace Novita Error (RESOLVED)**: Fixed in PR #118 - Was about 72B models being routed to failing third-party providers
+This P2 bug is **downstream** of the P1 fix - we fixed the 500 errors by switching to 7B, but now the 7B model doesn't produce quality output.
+---
+## Questions to Investigate
+1. What models in the 7-20B range stay on HuggingFace native infrastructure?
+2. Can we detect third-party routing before making the full request?
+3. Is the chat template correct for Qwen2.5-7B? (Some models need specific formatting)
+4. Are there HuggingFace serverless models specifically optimized for tool calling?
+---
+## References
+- `HF_FREE_TIER_ANALYSIS.md` - Analysis of HuggingFace provider routing
+- `CLAUDE.md` - Critical HuggingFace Free Tier section
+- Microsoft Agent Framework `_magentic.py` - Reference implementation

P2_ARCHITECTURAL_BYOK_GAPS.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# P2 Architectural: BYOK Gaps in Non-Critical Paths
+**Date**: 2025-12-03
+**Status**: ✅ RESOLVED
+**Severity**: P2 (Architectural Debt)
+**Component**: LLM Routing / BYOK Support
+**Resolution**: Fixed end-to-end BYOK support in this PR
+---
+## Summary
+Two code paths do NOT support BYOK (Bring Your Own Key) from Gradio:
+1. **HierarchicalOrchestrator** - Doesn't receive `api_key` parameter
+2. **get_model() (PydanticAI)** - Only checks env vars, no BYOK
+These are **latent bugs** - they don't affect the main user flow currently.
+---
+## Bug 1: HierarchicalOrchestrator Missing api_key
+**Location**: `src/orchestrators/factory.py:61-64`
+```python
+if effective_mode == "hierarchical":
+    from src.orchestrators.hierarchical import HierarchicalOrchestrator
+    return HierarchicalOrchestrator(config=effective_config, domain=domain)
+    # BUG: api_key is NOT passed to HierarchicalOrchestrator
+```
+**Impact**: If hierarchical mode were exposed in UI, BYOK would not work.
+**Current State**: Hierarchical mode is NOT exposed in Gradio UI, so this is latent.
+**Fix**: Pass `api_key` to HierarchicalOrchestrator when instantiating.
+---
+## Bug 2: get_model() Doesn't Support BYOK
+**Location**: `src/agent_factory/judges.py:62-91` (function `get_model()`)
+```python
+def get_model() -> Any:
+    # Priority 1: OpenAI
+    if settings.has_openai_key:  # Only checks ENV VAR
+        ...
+    # Priority 2: Anthropic
+    if settings.has_anthropic_key:  # Only checks ENV VAR
+        ...
+    # Priority 3: HuggingFace
+    if settings.has_huggingface_key:  # Only checks ENV VAR
+        ...
+```
+**Impact**: PydanticAI-based components (judges, statistical analyzer) cannot use BYOK keys.
+**Current State**: The main Advanced mode flow uses `get_chat_client()` (Microsoft Agent Framework), NOT `get_model()`. So this is latent.
+**Fix**: Either:
+1. Add `api_key` parameter to `get_model()`
+2. Or deprecate `get_model()` in favor of `get_chat_client()` everywhere
+---
+## Architecture Notes
+The codebase has **TWO separate LLM routing systems**:
+| System | Function | BYOK Support | Used By |
+|--------|----------|--------------|---------|
+| Microsoft Agent Framework | `get_chat_client()` | **YES** (key prefix detection) | Advanced mode (main flow) |
+| PydanticAI | `get_model()` | **NO** (env vars only) | Judges, statistical analyzer |
+This dual-system architecture creates confusion and maintenance burden.
+---
+## Recommendation
+**Short-term**: Leave as-is (latent, not blocking)
+**Long-term**: Unify on `get_chat_client()` and deprecate `get_model()` (see P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md for related cleanup)
+---
+## Test Results
+- All 310 unit tests pass
+- Main user flow (Gradio → Advanced) works with BYOK
+---
+## Related Documents
+- `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md` - Related architecture cleanup
+- `src/clients/factory.py` - BYOK-capable factory (correct implementation)
+- `src/agent_factory/judges.py` - Non-BYOK factory (needs fix)

P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# P3 Tech Debt: Remove Anthropic Partial Wiring
+**Date**: 2025-12-03
+**Status**: OPEN
+**Severity**: P3 (Tech Debt / Simplification)
+**Component**: Architecture / Provider Integration
+---
+## Summary
+Remove all Anthropic-related code, configuration, and references from the codebase. Anthropic is partially wired but **not fully threaded through the architecture**, creating confusion and half-implemented code paths.
+---
+## Rationale
+### 1. Anthropic Does NOT Provide Embeddings
+Our architecture requires embeddings for:
+- RAG (LlamaIndex/ChromaDB)
+- Evidence deduplication
+- Semantic search
+Anthropic only provides chat completion, not embeddings. This means even with a working Anthropic chat client, users would need a **second provider** for embeddings, breaking the unified experience.
+### 2. Partial Implementation Creates Confusion
+Current state:
+- `settings.anthropic_api_key` exists ✅
+- `settings.has_anthropic_key` property exists ✅
+- `settings.anthropic_model` configured ✅
+- `AnthropicChatClient` for agent_framework **DOES NOT EXIST** ❌
+- Code raises `NotImplementedError` when Anthropic detected ❌
+This half-state causes:
+- User confusion ("Why doesn't my Anthropic key work?")
+- Developer confusion ("Is Anthropic supported or not?")
+- Dead code paths that need maintenance
+### 3. Unified Architecture Principle
+**Principle**: Only support providers that work **end-to-end** through the entire stack:
+```
+Provider Requirements:
+├── Chat Completion (for agents)     ✅ Required
+├── Function/Tool Calling            ✅ Required
+├── Embeddings (for RAG)             ✅ Required
+└── Streaming                        ✅ Required
+```
+| Provider | Chat | Tools | Embeddings | Streaming | Status |
+|----------|------|-------|------------|-----------|--------|
+| OpenAI | ✅ | ✅ | ✅ | ✅ | **KEEP** |
+| HuggingFace | ✅ | ✅ | ✅ (local) | ✅ | **KEEP** |
+| Gemini | ✅ | ✅ | ✅ | ✅ | Future (Phase 4) |
+| Anthropic | ✅ | ✅ | ❌ | ✅ | **REMOVE** |
+---
+## Files to Clean Up
+### Configuration
+- [ ] `src/utils/config.py` - Remove `anthropic_api_key`, `anthropic_model`, `has_anthropic_key`
+### Client Factory
+- [ ] `src/clients/factory.py` - Remove Anthropic detection and `NotImplementedError`
+### Legacy Code (pydantic-ai based)
+- [ ] `src/utils/llm_factory.py` - Remove `AnthropicModel`, `AnthropicProvider` imports and handling
+- [ ] `src/agent_factory/judges.py` - Remove Anthropic model selection
+### App/UI
+- [ ] `src/app.py` - Remove `has_anthropic_key` checks and "Anthropic from env" backend info
+### Documentation
+- [ ] `CLAUDE.md` - Update LLM provider list
+- [ ] `AGENTS.md` - Update LLM provider list
+- [ ] `GEMINI.md` - Update LLM provider list
+### Tests
+- [ ] `tests/unit/clients/test_chat_client_factory.py` - Remove Anthropic test cases
+- [ ] `tests/unit/utils/test_config.py` - Remove Anthropic config tests
+---
+## Code Snippets to Remove
+### `src/utils/config.py`
+```python
+# REMOVE these lines:
+anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
+anthropic_model: str = Field(
+    default="claude-sonnet-4-5-20250929", description="Anthropic model"
+)
+@property
+def has_anthropic_key(self) -> bool:
+    """Check if Anthropic API key is available."""
+    return bool(self.anthropic_api_key)
+```
+### `src/clients/factory.py`
+```python
+# REMOVE these lines:
+if api_key.startswith("sk-ant-"):
+    normalized = "anthropic"
+if normalized == "anthropic":
+    raise NotImplementedError(
+        "Anthropic client not yet implemented. "
+        "Use OpenAI key (sk-...) or leave empty for free HuggingFace tier."
+    )
+```
+### `src/app.py`
+```python
+# REMOVE these lines:
+elif settings.has_anthropic_key:
+    backend_info = "Paid API (Anthropic from env)"
+has_anthropic = settings.has_anthropic_key
+has_paid_key = has_openai or has_anthropic or bool(user_api_key)
+# Change to:
+has_paid_key = has_openai or bool(user_api_key)
+```
+---
+## Migration Notes
+### For Users with Anthropic Keys
+If users have `ANTHROPIC_API_KEY` set in their environment:
+1. It will be **silently ignored** (not an error)
+2. System falls through to HuggingFace free tier
+3. Users should use `OPENAI_API_KEY` instead for paid tier
+### Future Consideration
+If Anthropic adds embeddings API in the future, we can re-add support. But until then, partial support creates more confusion than value.
+---
+## Definition of Done
+- [ ] All Anthropic references removed from `src/`
+- [ ] All Anthropic tests removed or updated
+- [ ] Documentation updated to reflect supported providers: OpenAI, HuggingFace, (future: Gemini)
+- [ ] `make check` passes (lint, typecheck, tests)
+- [ ] PR reviewed and merged
+---
+## Related Documents
+- `P2_7B_MODEL_GARBAGE_OUTPUT.md` - Current free tier model quality issues
+- `HF_FREE_TIER_ANALYSIS.md` - HuggingFace provider routing analysis
+- `CLAUDE.md` - Agent context with provider documentation

pyproject.toml CHANGED Viewed

@@ -20,6 +20,8 @@ dependencies = [
     "huggingface-hub>=0.24.0", # Hugging Face Inference API - 0.24.0 required for stable chat_completion with tools
     # UI
     "gradio[mcp]>=6.0.0", # Chat interface with MCP server support (6.0 required for css in launch())
     # Utils
     "python-dotenv>=1.0", # .env loading
     "tenacity>=8.2", # Retry logic

     "huggingface-hub>=0.24.0", # Hugging Face Inference API - 0.24.0 required for stable chat_completion with tools
     # UI
     "gradio[mcp]>=6.0.0", # Chat interface with MCP server support (6.0 required for css in launch())
+    # Security: Pin mcp to fix GHSA-9h52-p55h-vw2f
+    "mcp>=1.23.0",
     # Utils
     "python-dotenv>=1.0", # .env loading
     "tenacity>=8.2", # Retry logic

src/agent_factory/judges.py CHANGED Viewed

@@ -2,16 +2,15 @@
 import asyncio
 import json
 from functools import partial
 from typing import Any, ClassVar
 import structlog
 from huggingface_hub import InferenceClient
 from pydantic_ai import Agent
-from pydantic_ai.models.anthropic import AnthropicModel
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
-from pydantic_ai.providers.anthropic import AnthropicProvider
 from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
@@ -54,41 +53,61 @@ def _extract_titles_from_evidence(
     return findings
-def get_model() -> Any:
     """Get the LLM model based on available API keys.
     Priority order:
-    1. OpenAI (if OPENAI_API_KEY set)
-    2. Anthropic (if ANTHROPIC_API_KEY set)
-    3. HuggingFace (if HF_TOKEN set)
     Raises:
-        ConfigurationError: If no API keys are configured.
-    Note: settings.llm_provider is ignored in favor of actual key availability.
-    This ensures the model matches what app.py selected for JudgeHandler.
     """
-    from src.utils.exceptions import ConfigurationError
-    # Priority 1: OpenAI (most common, best tool calling)
     if settings.has_openai_key:
         openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
         return OpenAIChatModel(settings.openai_model, provider=openai_provider)
-    # Priority 2: Anthropic
-    if settings.has_anthropic_key:
-        provider = AnthropicProvider(api_key=settings.anthropic_api_key)
-        return AnthropicModel(settings.anthropic_model, provider=provider)
-    # Priority 3: HuggingFace (requires HF_TOKEN)
-    if settings.has_huggingface_key:
-        model_name = settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
-        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
-    # No keys configured - fail fast with clear error
-    raise ConfigurationError(
-        "No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
     )
@@ -103,6 +122,7 @@ class JudgeHandler:
         self,
         model: Any = None,
         domain: ResearchDomain | str | None = None,
     ) -> None:
         """
         Initialize the JudgeHandler.
@@ -110,8 +130,9 @@ class JudgeHandler:
         Args:
             model: Optional PydanticAI model. If None, uses config default.
             domain: Research domain for prompt customization.
         """
-        self.model = model or get_model()
         self.domain = domain
         self.agent = Agent(
             model=self.model,
@@ -506,7 +527,7 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
                 "The HuggingFace Inference API free tier limit has been reached. "
                 "The search results listed below were retrieved but could not be "
                 "analyzed by the AI. "
-                "Please try again later, or add an OpenAI/Anthropic API key above "
                 "for unlimited access."
             ),
         )
@@ -542,7 +563,7 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
                 f"Search found {len(evidence)} sources (listed below) but they could not "
                 "be analyzed by AI.\n\n"
                 "**Options:**\n"
-                "- Add an OpenAI or Anthropic API key for reliable analysis\n"
                 "- Try again later when HF Inference is available\n"
                 "- Review the raw search results below"
             ),
@@ -571,7 +592,7 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
                 f"{question} clinical trials",
                 f"{question} drug candidates",
             ],
-            reasoning=f"HF Inference failed: {error}. Recommend configuring OpenAI/Anthropic key.",
         )
     async def synthesize(self, system_prompt: str, user_prompt: str) -> str:
@@ -728,6 +749,6 @@ class MockJudgeHandler:
             reasoning=(
                 f"Demo mode assessment based on {evidence_count} real search results. "
                 "For AI-powered analysis with drug candidate identification and "
-                "evidence synthesis, configure OPENAI_API_KEY or ANTHROPIC_API_KEY."
             ),
         )

 import asyncio
 import json
+import os
 from functools import partial
 from typing import Any, ClassVar
 import structlog
 from huggingface_hub import InferenceClient
 from pydantic_ai import Agent
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
 from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
     return findings
+def get_model(api_key: str | None = None) -> Any:
     """Get the LLM model based on available API keys.
     Priority order:
+    1. BYOK api_key parameter (auto-detects provider from prefix)
+    2. OpenAI (if OPENAI_API_KEY set in env)
+    3. HuggingFace (free fallback)
+    Args:
+        api_key: Optional BYOK key. Auto-detects provider from prefix:
+                 - "sk-ant-..." → Anthropic (NOT SUPPORTED - raises error)
+                 - "sk-..." → OpenAI
+                 - Other → Falls through to env vars
     Raises:
+        NotImplementedError: If Anthropic key detected (no embeddings support).
+    Note: Anthropic is NOT supported because it lacks embeddings API.
+    See P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md.
     """
+    # Priority 1: BYOK - Auto-detect provider from key prefix
+    if api_key:
+        if api_key.startswith("sk-ant-"):
+            # Anthropic not supported - no embeddings API
+            raise NotImplementedError(
+                "Anthropic is not supported (no embeddings API). "
+                "Use OpenAI key (sk-...) or leave empty for free HuggingFace tier."
+            )
+        if api_key.startswith("sk-"):
+            # OpenAI BYOK
+            openai_provider = OpenAIProvider(api_key=api_key)
+            return OpenAIChatModel(settings.openai_model, provider=openai_provider)
+    # Priority 2: OpenAI from env (most common, best tool calling)
     if settings.has_openai_key:
         openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
         return OpenAIChatModel(settings.openai_model, provider=openai_provider)
+    # Priority 3: HuggingFace (free fallback)
+    # Use 7B model to stay on HuggingFace native infrastructure (avoid Novita 500s)
+    model_name = settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
+    # Try settings.hf_token first, then fall back to HF_TOKEN env var
+    # HuggingFaceProvider requires a token - it won't work without one
+    hf_token = settings.hf_token or os.environ.get("HF_TOKEN")
+    if hf_token:
+        hf_provider = HuggingFaceProvider(api_key=hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
+    # No HF token available - raise clear error
+    raise RuntimeError(
+        "No LLM API key available. Either:\n"
+        "  1. Set OPENAI_API_KEY for premium tier, or\n"
+        "  2. Set HF_TOKEN for free HuggingFace tier\n"
+        "Get a free HF token at: https://huggingface.co/settings/tokens"
     )
         self,
         model: Any = None,
         domain: ResearchDomain | str | None = None,
+        api_key: str | None = None,
     ) -> None:
         """
         Initialize the JudgeHandler.
         Args:
             model: Optional PydanticAI model. If None, uses config default.
             domain: Research domain for prompt customization.
+            api_key: Optional BYOK key (auto-detects provider from prefix).
         """
+        self.model = model or get_model(api_key=api_key)
         self.domain = domain
         self.agent = Agent(
             model=self.model,
                 "The HuggingFace Inference API free tier limit has been reached. "
                 "The search results listed below were retrieved but could not be "
                 "analyzed by the AI. "
+                "Please try again later, or add an OpenAI API key above "
                 "for unlimited access."
             ),
         )
                 f"Search found {len(evidence)} sources (listed below) but they could not "
                 "be analyzed by AI.\n\n"
                 "**Options:**\n"
+                "- Add an OpenAI API key for reliable analysis\n"
                 "- Try again later when HF Inference is available\n"
                 "- Review the raw search results below"
             ),
                 f"{question} clinical trials",
                 f"{question} drug candidates",
             ],
+            reasoning=f"HF Inference failed: {error}. Recommend configuring OpenAI API key.",
         )
     async def synthesize(self, system_prompt: str, user_prompt: str) -> str:
             reasoning=(
                 f"Demo mode assessment based on {evidence_count} real search results. "
                 "For AI-powered analysis with drug candidate identification and "
+                "evidence synthesis, configure OPENAI_API_KEY."
             ),
         )

src/agents/judge_agent_llm.py CHANGED Viewed

@@ -14,8 +14,13 @@ logger = structlog.get_logger()
 class LLMSubIterationJudge:
     """Judge that uses an LLM to assess sub-iteration results."""
-    def __init__(self) -> None:
-        self.model = get_model()
         self.agent = Agent(
             model=self.model,
             output_type=JudgeAssessment,

 class LLMSubIterationJudge:
     """Judge that uses an LLM to assess sub-iteration results."""
+    def __init__(self, api_key: str | None = None) -> None:
+        """Initialize the judge with optional BYOK key.
+        Args:
+            api_key: Optional BYOK key (auto-detects provider from prefix).
+        """
+        self.model = get_model(api_key=api_key)
         self.agent = Agent(
             model=self.model,
             output_type=JudgeAssessment,

src/agents/magentic_agents.py CHANGED Viewed

@@ -16,17 +16,19 @@ from src.config.domain import ResearchDomain, get_domain_config
 def create_search_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
 ) -> ChatAgent:
     """Create a search agent with internal LLM and search tools.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
     Returns:
         ChatAgent configured for biomedical search
     """
-    client = chat_client or get_chat_client()
     config = get_domain_config(domain)
     return ChatAgent(
@@ -54,17 +56,19 @@ related to {config.name}.""",
 def create_judge_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
 ) -> ChatAgent:
     """Create a judge agent that evaluates evidence quality.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
     Returns:
         ChatAgent configured for evidence assessment
     """
-    client = chat_client or get_chat_client()
     config = get_domain_config(domain)
     return ChatAgent(
@@ -110,17 +114,19 @@ Be rigorous but fair. Look for:
 def create_hypothesis_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
 ) -> ChatAgent:
     """Create a hypothesis generation agent.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
     Returns:
         ChatAgent configured for hypothesis generation
     """
-    client = chat_client or get_chat_client()
     config = get_domain_config(domain)
     return ChatAgent(
@@ -151,17 +157,19 @@ Focus on mechanistic plausibility and existing evidence.""",
 def create_report_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
 ) -> ChatAgent:
     """Create a report synthesis agent.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
     Returns:
         ChatAgent configured for report generation
     """
-    client = chat_client or get_chat_client()
     config = get_domain_config(domain)
     return ChatAgent(

 def create_search_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
+    api_key: str | None = None,
 ) -> ChatAgent:
     """Create a search agent with internal LLM and search tools.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
+        api_key: Optional BYOK key (auto-detects provider from prefix).
     Returns:
         ChatAgent configured for biomedical search
     """
+    client = chat_client or get_chat_client(api_key=api_key)
     config = get_domain_config(domain)
     return ChatAgent(
 def create_judge_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
+    api_key: str | None = None,
 ) -> ChatAgent:
     """Create a judge agent that evaluates evidence quality.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
+        api_key: Optional BYOK key (auto-detects provider from prefix).
     Returns:
         ChatAgent configured for evidence assessment
     """
+    client = chat_client or get_chat_client(api_key=api_key)
     config = get_domain_config(domain)
     return ChatAgent(
 def create_hypothesis_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
+    api_key: str | None = None,
 ) -> ChatAgent:
     """Create a hypothesis generation agent.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
+        api_key: Optional BYOK key (auto-detects provider from prefix).
     Returns:
         ChatAgent configured for hypothesis generation
     """
+    client = chat_client or get_chat_client(api_key=api_key)
     config = get_domain_config(domain)
     return ChatAgent(
 def create_report_agent(
     chat_client: BaseChatClient | None = None,
     domain: ResearchDomain | str | None = None,
+    api_key: str | None = None,
 ) -> ChatAgent:
     """Create a report synthesis agent.
     Args:
         chat_client: Optional custom chat client. If None, uses default.
         domain: Research domain for customization.
+        api_key: Optional BYOK key (auto-detects provider from prefix).
     Returns:
         ChatAgent configured for report generation
     """
+    client = chat_client or get_chat_client(api_key=api_key)
     config = get_domain_config(domain)
     return ChatAgent(

src/clients/factory.py CHANGED Viewed

@@ -23,13 +23,14 @@ def get_chat_client(
     Auto-detection priority:
     1. Explicit provider parameter
-    2. OpenAI key (Best Function Calling)
-    3. Gemini key (Best Context/Cost)
-    4. HuggingFace (Free Fallback)
     Args:
         provider: Force specific provider ("openai", "gemini", "huggingface")
-        api_key: Override API key for the provider
         model_id: Override default model ID
         **kwargs: Additional arguments for the client
@@ -38,13 +39,23 @@ def get_chat_client(
     Raises:
         ValueError: If an unsupported provider is explicitly requested
-        NotImplementedError: If Gemini is explicitly requested (not yet implemented)
     """
     # Normalize provider to lowercase for case-insensitive matching
     normalized = provider.lower() if provider is not None else None
     # Validate explicit provider requests early
-    valid_providers = (None, "openai", "gemini", "huggingface")
     if normalized not in valid_providers:
         raise ValueError(f"Unsupported provider: {provider!r}")
@@ -57,7 +68,15 @@ def get_chat_client(
             **kwargs,
         )
-    # 2. Gemini (High Performance / Alternative)
     if normalized == "gemini":
         # Explicit request for Gemini - fail loudly
         raise NotImplementedError("Gemini client not yet implemented (Planned Phase 4)")
@@ -66,7 +85,7 @@ def get_chat_client(
         # Implicit (has key but not explicit) - log warning and fall through
         logger.warning("Gemini key detected but client not yet implemented; falling back")
-    # 3. HuggingFace (Free Fallback)
     # This is the default if no other keys are present
     logger.info("Using HuggingFace Chat Client (Free Tier)")
     return HuggingFaceChatClient(

     Auto-detection priority:
     1. Explicit provider parameter
+    2. API key prefix detection (sk- → OpenAI, sk-ant- → Anthropic)
+    3. OpenAI key from env (Best Function Calling)
+    4. Gemini key from env (Best Context/Cost)
+    5. HuggingFace (Free Fallback)
     Args:
         provider: Force specific provider ("openai", "gemini", "huggingface")
+        api_key: Override API key for the provider (auto-detects provider from prefix)
         model_id: Override default model ID
         **kwargs: Additional arguments for the client
     Raises:
         ValueError: If an unsupported provider is explicitly requested
+        NotImplementedError: If Gemini or Anthropic is requested (not yet implemented)
     """
     # Normalize provider to lowercase for case-insensitive matching
     normalized = provider.lower() if provider is not None else None
+    # FIX: Auto-detect provider from API key prefix when not explicitly set
+    # This enables BYOK (Bring Your Own Key) from Gradio without explicit provider
+    # Order matters: "sk-ant-" must be checked before "sk-" (both start with "sk-")
+    if normalized is None and api_key:
+        if api_key.startswith("sk-ant-"):
+            normalized = "anthropic"
+        elif api_key.startswith("sk-"):
+            normalized = "openai"
+        # HF tokens start with "hf_" - no auto-detection needed (falls through to default)
     # Validate explicit provider requests early
+    valid_providers = (None, "openai", "anthropic", "gemini", "huggingface")
     if normalized not in valid_providers:
         raise ValueError(f"Unsupported provider: {provider!r}")
             **kwargs,
         )
+    # 2. Anthropic (Detected from sk-ant- prefix or explicit)
+    if normalized == "anthropic":
+        # Anthropic key was detected or explicitly requested - fail loudly
+        raise NotImplementedError(
+            "Anthropic client not yet implemented. "
+            "Use OpenAI key (sk-...) or leave empty for free HuggingFace tier."
+        )
+    # 3. Gemini (High Performance / Alternative)
     if normalized == "gemini":
         # Explicit request for Gemini - fail loudly
         raise NotImplementedError("Gemini client not yet implemented (Planned Phase 4)")
         # Implicit (has key but not explicit) - log warning and fall through
         logger.warning("Gemini key detected but client not yet implemented; falling back")
+    # 4. HuggingFace (Free Fallback)
     # This is the default if no other keys are present
     logger.info("Using HuggingFace Chat Client (Free Tier)")
     return HuggingFaceChatClient(

src/clients/huggingface.py CHANGED Viewed

@@ -51,12 +51,13 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         """Initialize the HuggingFace chat client.
         Args:
-            model_id: The HuggingFace model ID (default: configured value or Qwen2.5-72B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
-        self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

         """Initialize the HuggingFace chat client.
         Args:
+            model_id: The HuggingFace model ID (default: configured value or Qwen2.5-7B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
+        # FIX: Use 7B model to stay on HuggingFace native infrastructure (avoid Novita 500s)
+        self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

src/orchestrators/advanced.py CHANGED Viewed

@@ -99,6 +99,9 @@ class AdvancedOrchestrator(OrchestratorProtocol):
             api_key=api_key,
         )
         # Event stream for UI updates
         self._events: list[AgentEvent] = []
@@ -116,7 +119,7 @@ class AdvancedOrchestrator(OrchestratorProtocol):
     def _init_embedding_service(self) -> "EmbeddingServiceProtocol | None":
         """Initialize embedding service if available."""
-        return get_embedding_service_if_available()
     def _build_workflow(self) -> Any:
         """Build the workflow with ChatAgent participants."""

             api_key=api_key,
         )
+        # Store API key for service initialization
+        self._api_key = api_key
         # Event stream for UI updates
         self._events: list[AgentEvent] = []
     def _init_embedding_service(self) -> "EmbeddingServiceProtocol | None":
         """Initialize embedding service if available."""
+        return get_embedding_service_if_available(api_key=self._api_key)
     def _build_workflow(self) -> Any:
         """Build the workflow with ChatAgent participants."""

src/orchestrators/factory.py CHANGED Viewed

@@ -61,7 +61,7 @@ def create_orchestrator(
     if effective_mode == "hierarchical":
         from src.orchestrators.hierarchical import HierarchicalOrchestrator
-        return HierarchicalOrchestrator(config=effective_config, domain=domain)
     # Default: Advanced Mode (Unified)
     # Handles both Paid (OpenAI) and Free (HuggingFace) tiers

     if effective_mode == "hierarchical":
         from src.orchestrators.hierarchical import HierarchicalOrchestrator
+        return HierarchicalOrchestrator(config=effective_config, domain=domain, api_key=api_key)
     # Default: Advanced Mode (Unified)
     # Handles both Paid (OpenAI) and Free (HuggingFace) tiers

src/orchestrators/hierarchical.py CHANGED Viewed

@@ -38,8 +38,12 @@ class ResearchTeam(SubIterationTeam):
     sub-iteration middleware framework.
     """
-    def __init__(self, domain: ResearchDomain | str | None = None) -> None:
-        self.agent = create_search_agent(domain=domain)
     async def execute(self, task: str) -> str:
         """Execute a research task.
@@ -73,6 +77,7 @@ class HierarchicalOrchestrator(OrchestratorProtocol):
         config: OrchestratorConfig | None = None,
         timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
         domain: ResearchDomain | str | None = None,
     ) -> None:
         """Initialize the hierarchical orchestrator.
@@ -80,12 +85,14 @@ class HierarchicalOrchestrator(OrchestratorProtocol):
             config: Optional configuration (uses defaults if not provided)
             timeout_seconds: Maximum workflow duration (default: 5 minutes)
             domain: Research domain for customization
         """
         self.config = config or OrchestratorConfig()
         self._timeout_seconds = timeout_seconds
         self.domain = domain
-        self.team = ResearchTeam(domain=domain)
-        self.judge = LLMSubIterationJudge()
         self.middleware = SubIterationMiddleware(
             self.team, self.judge, max_iterations=self.config.max_iterations
         )
@@ -101,7 +108,7 @@ class HierarchicalOrchestrator(OrchestratorProtocol):
         """
         logger.info("Starting hierarchical orchestrator", query=query)
-        service = get_embedding_service_if_available()
         init_magentic_state(query, service)
         yield AgentEvent(type="started", message=f"Starting research: {query}")

     sub-iteration middleware framework.
     """
+    def __init__(
+        self,
+        domain: ResearchDomain | str | None = None,
+        api_key: str | None = None,
+    ) -> None:
+        self.agent = create_search_agent(domain=domain, api_key=api_key)
     async def execute(self, task: str) -> str:
         """Execute a research task.
         config: OrchestratorConfig | None = None,
         timeout_seconds: float = DEFAULT_TIMEOUT_SECONDS,
         domain: ResearchDomain | str | None = None,
+        api_key: str | None = None,
     ) -> None:
         """Initialize the hierarchical orchestrator.
             config: Optional configuration (uses defaults if not provided)
             timeout_seconds: Maximum workflow duration (default: 5 minutes)
             domain: Research domain for customization
+            api_key: Optional BYOK key (auto-detects provider from prefix)
         """
         self.config = config or OrchestratorConfig()
         self._timeout_seconds = timeout_seconds
         self.domain = domain
+        self._api_key = api_key
+        self.team = ResearchTeam(domain=domain, api_key=api_key)
+        self.judge = LLMSubIterationJudge(api_key=api_key)
         self.middleware = SubIterationMiddleware(
             self.team, self.judge, max_iterations=self.config.max_iterations
         )
         """
         logger.info("Starting hierarchical orchestrator", query=query)
+        service = get_embedding_service_if_available(api_key=self._api_key)
         init_magentic_state(query, service)
         yield AgentEvent(type="started", message=f"Starting research: {query}")

src/orchestrators/langgraph_orchestrator.py CHANGED Viewed

@@ -32,18 +32,33 @@ class LangGraphOrchestrator(OrchestratorProtocol):
         self,
         max_iterations: int = 10,
         checkpoint_path: str | None = None,
     ):
         self._max_iterations = max_iterations
         self._checkpoint_path = checkpoint_path
         # Initialize the LLM (Qwen 2.5 via HF Inference)
         # We use the serverless API by default
-        # NOTE: Llama-3.1-70B routes to Hyperbolic (unreliable staging mode)
-        repo_id = "Qwen/Qwen2.5-72B-Instruct"
-        # Ensure we have an API key
-        api_key = settings.hf_token
-        if not api_key:
             raise ValueError(
                 "HF_TOKEN (Hugging Face API Token) is required for LangGraph orchestrator."
             )
@@ -53,7 +68,7 @@ class LangGraphOrchestrator(OrchestratorProtocol):
             task="text-generation",
             max_new_tokens=1024,
             temperature=0.1,
-            huggingfacehub_api_token=api_key,
         )
         self.chat_model = ChatHuggingFace(llm=self.llm_endpoint)
@@ -61,7 +76,7 @@ class LangGraphOrchestrator(OrchestratorProtocol):
         """Execute research workflow with structured state."""
         # Initialize embedding service using tiered selection (service_loader)
         # Returns LlamaIndexRAGService if OpenAI key available, else local EmbeddingService
-        embedding_service = get_embedding_service()
         # Setup checkpointer (SQLite for dev)
         if self._checkpoint_path:

         self,
         max_iterations: int = 10,
         checkpoint_path: str | None = None,
+        api_key: str | None = None,
     ):
         self._max_iterations = max_iterations
         self._checkpoint_path = checkpoint_path
+        self._api_key = api_key
         # Initialize the LLM (Qwen 2.5 via HF Inference)
         # We use the serverless API by default
+        # FIX: Use 7B model to stay on HuggingFace native infrastructure
+        # Large models (70B+) route to Novita/Hyperbolic providers (500/401 errors)
+        repo_id = settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
+        # Determine HF Token (BYOK > Env)
+        # Note: If api_key starts with 'sk-', it's likely OpenAI, which isn't supported here
+        # for the LLM, but we store it for the embedding service.
+        hf_token = settings.hf_token
+        if api_key and not api_key.startswith("sk-"):
+            hf_token = api_key
+        if not hf_token:
+            # If we have an OpenAI key but no HF token, we can't run the HF LLM
+            if api_key and api_key.startswith("sk-"):
+                raise ValueError(
+                    "LangGraphOrchestrator currently requires a Hugging Face token (HF_TOKEN) "
+                    "for the LLM, even if using OpenAI for embeddings. "
+                    "Please use Advanced Mode for OpenAI support."
+                )
             raise ValueError(
                 "HF_TOKEN (Hugging Face API Token) is required for LangGraph orchestrator."
             )
             task="text-generation",
             max_new_tokens=1024,
             temperature=0.1,
+            huggingfacehub_api_token=hf_token,
         )
         self.chat_model = ChatHuggingFace(llm=self.llm_endpoint)
         """Execute research workflow with structured state."""
         # Initialize embedding service using tiered selection (service_loader)
         # Returns LlamaIndexRAGService if OpenAI key available, else local EmbeddingService
+        embedding_service = get_embedding_service(api_key=self._api_key)
         # Setup checkpointer (SQLite for dev)
         if self._checkpoint_path:

src/services/llamaindex_rag.py CHANGED Viewed

@@ -42,16 +42,17 @@ class LlamaIndexRAGService:
         persist_dir: str | None = None,
         embedding_model: str | None = None,
         similarity_top_k: int = 5,
     ) -> None:
         """
         Initialize LlamaIndex RAG service.
         Args:
-            collection_name: Name of the ChromaDB collection (default changed from
-                "deepcritical_evidence" to "deepboner_evidence" in v1.0 rebrand)
             persist_dir: Directory to persist ChromaDB data
             embedding_model: OpenAI embedding model (defaults to settings.openai_embedding_model)
             similarity_top_k: Number of top results to retrieve
         """
         # Lazy import - only when instantiated
         try:
@@ -80,18 +81,36 @@ class LlamaIndexRAGService:
         self.similarity_top_k = similarity_top_k
         self.embedding_model = embedding_model or settings.openai_embedding_model
         # Validate API key before use
-        if not settings.openai_api_key:
             raise ConfigurationError("OPENAI_API_KEY required for LlamaIndex RAG service")
         # Configure LlamaIndex settings (use centralized config)
         self._Settings.llm = OpenAI(
             model=settings.openai_model,
-            api_key=settings.openai_api_key,
         )
         self._Settings.embed_model = OpenAIEmbedding(
             model=self.embedding_model,
-            api_key=settings.openai_api_key,
         )
         # Initialize ChromaDB client
@@ -428,6 +447,7 @@ class LlamaIndexRAGService:
 def get_rag_service(
     collection_name: str = "deepboner_evidence",
     **kwargs: Any,
 ) -> LlamaIndexRAGService:
     """
@@ -435,9 +455,10 @@ def get_rag_service(
     Args:
         collection_name: Name of the ChromaDB collection
         **kwargs: Additional arguments for LlamaIndexRAGService
     Returns:
         Configured LlamaIndexRAGService instance
     """
-    return LlamaIndexRAGService(collection_name=collection_name, **kwargs)

         persist_dir: str | None = None,
         embedding_model: str | None = None,
         similarity_top_k: int = 5,
+        api_key: str | None = None,
     ) -> None:
         """
         Initialize LlamaIndex RAG service.
         Args:
+            collection_name: Name of the ChromaDB collection
             persist_dir: Directory to persist ChromaDB data
             embedding_model: OpenAI embedding model (defaults to settings.openai_embedding_model)
             similarity_top_k: Number of top results to retrieve
+            api_key: Optional BYOK OpenAI key. Prioritized over env var.
         """
         # Lazy import - only when instantiated
         try:
         self.similarity_top_k = similarity_top_k
         self.embedding_model = embedding_model or settings.openai_embedding_model
+        # Determine API key (BYOK > Env Var)
+        self.api_key = api_key
+        if not self.api_key and settings.has_openai_key:
+            self.api_key = settings.openai_api_key
         # Validate API key before use
+        if not self.api_key:
             raise ConfigurationError("OPENAI_API_KEY required for LlamaIndex RAG service")
+        # Defense-in-depth: Validate key prefix to prevent cryptic auth errors
+        # Note: Anthropic keys start with sk-ant-, which would pass startswith("sk-")
+        if self.api_key.startswith("sk-ant-"):
+            raise ConfigurationError(
+                "Anthropic keys (sk-ant-...) are not supported for embeddings. "
+                "LlamaIndex RAG requires an OpenAI API key (sk-...)."
+            )
+        if not self.api_key.startswith("sk-"):
+            raise ConfigurationError(
+                f"Invalid API key format. Expected OpenAI key starting with 'sk-', "
+                f"got key starting with '{self.api_key[:8]}...'."
+            )
         # Configure LlamaIndex settings (use centralized config)
         self._Settings.llm = OpenAI(
             model=settings.openai_model,
+            api_key=self.api_key,
         )
         self._Settings.embed_model = OpenAIEmbedding(
             model=self.embedding_model,
+            api_key=self.api_key,
         )
         # Initialize ChromaDB client
 def get_rag_service(
     collection_name: str = "deepboner_evidence",
+    api_key: str | None = None,
     **kwargs: Any,
 ) -> LlamaIndexRAGService:
     """
     Args:
         collection_name: Name of the ChromaDB collection
+        api_key: Optional BYOK OpenAI key
         **kwargs: Additional arguments for LlamaIndexRAGService
     Returns:
         Configured LlamaIndexRAGService instance
     """
+    return LlamaIndexRAGService(collection_name=collection_name, api_key=api_key, **kwargs)

src/utils/llm_factory.py CHANGED Viewed

@@ -26,9 +26,7 @@ def get_pydantic_ai_model() -> Any:
     Get the appropriate model for pydantic-ai based on configuration.
     Used by legacy Simple Mode components.
     """
-    from pydantic_ai.models.anthropic import AnthropicModel
     from pydantic_ai.models.openai import OpenAIChatModel
-    from pydantic_ai.providers.anthropic import AnthropicProvider
     from pydantic_ai.providers.openai import OpenAIProvider
     # Normalize provider for case-insensitive matching
@@ -41,10 +39,7 @@ def get_pydantic_ai_model() -> Any:
         return OpenAIChatModel(settings.openai_model, provider=provider)
     if provider_lower == "anthropic":
-        if not settings.anthropic_api_key:
-            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
-        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
-        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
     raise ConfigurationError(f"Unknown LLM provider for simple mode: {settings.llm_provider}")

     Get the appropriate model for pydantic-ai based on configuration.
     Used by legacy Simple Mode components.
     """
     from pydantic_ai.models.openai import OpenAIChatModel
     from pydantic_ai.providers.openai import OpenAIProvider
     # Normalize provider for case-insensitive matching
         return OpenAIChatModel(settings.openai_model, provider=provider)
     if provider_lower == "anthropic":
+        raise ConfigurationError("Anthropic is not supported (no embeddings API). See P3 doc.")
     raise ConfigurationError(f"Unknown LLM provider for simple mode: {settings.llm_provider}")

src/utils/service_loader.py CHANGED Viewed

@@ -45,7 +45,7 @@ def warmup_services() -> None:
     thread.start()
-def get_embedding_service() -> "EmbeddingServiceProtocol":
     """Get the best available embedding service.
     Strategy selection (ordered by preference):
@@ -56,31 +56,41 @@ def get_embedding_service() -> "EmbeddingServiceProtocol":
     - Factory Method: Creates service instance
     - Strategy Pattern: Selects between implementations at runtime
     Returns:
         EmbeddingServiceProtocol: Either LlamaIndexRAGService or EmbeddingService
     Raises:
         ImportError: If no embedding service dependencies are available
-    Example:
-        ```python
-        service = get_embedding_service()
-        await service.add_evidence("id", "content", {"source": "pubmed"})
-        results = await service.search_similar("query", n_results=5)
-        unique = await service.deduplicate(evidence_list)
-        ```
     """
     # Try premium tier first (OpenAI + persistence)
-    if settings.has_openai_key:
         try:
             from src.services.llamaindex_rag import get_rag_service
-            service = get_rag_service()
             logger.info(
                 "Using LlamaIndex RAG service",
                 tier="premium",
                 persistence="enabled",
                 embeddings="openai",
             )
             return service
         except ImportError as e:
@@ -119,17 +129,22 @@ def get_embedding_service() -> "EmbeddingServiceProtocol":
         ) from e
-def get_embedding_service_if_available() -> "EmbeddingServiceProtocol | None":
     """Safely attempt to load and initialize an embedding service.
     Unlike get_embedding_service(), this function returns None instead of
     raising ImportError when no service is available.
     Returns:
         EmbeddingServiceProtocol instance if dependencies are met, else None.
     """
     try:
-        return get_embedding_service()
     except ImportError as e:
         logger.info(
             "Embedding service not available (optional dependencies missing)",

     thread.start()
+def get_embedding_service(api_key: str | None = None) -> "EmbeddingServiceProtocol":
     """Get the best available embedding service.
     Strategy selection (ordered by preference):
     - Factory Method: Creates service instance
     - Strategy Pattern: Selects between implementations at runtime
+    Args:
+        api_key: Optional BYOK key. If starts with 'sk-', enables Premium tier.
     Returns:
         EmbeddingServiceProtocol: Either LlamaIndexRAGService or EmbeddingService
     Raises:
         ImportError: If no embedding service dependencies are available
     """
+    # Determine if we have a valid OpenAI key (BYOK or Env)
+    # Note: Must check sk-ant- BEFORE sk- since Anthropic keys start with sk-ant-
+    has_openai = False
+    if api_key:
+        if api_key.startswith("sk-ant-"):
+            # Anthropic key - not supported for embeddings
+            logger.warning("Anthropic keys don't support embeddings, falling back to free tier")
+        elif api_key.startswith("sk-"):
+            # OpenAI BYOK
+            has_openai = True
+    elif settings.has_openai_key:
+        has_openai = True
     # Try premium tier first (OpenAI + persistence)
+    if has_openai:
         try:
             from src.services.llamaindex_rag import get_rag_service
+            # Pass api_key to service (it handles precedence: api_key > env)
+            service = get_rag_service(api_key=api_key)
             logger.info(
                 "Using LlamaIndex RAG service",
                 tier="premium",
                 persistence="enabled",
                 embeddings="openai",
+                byok=bool(api_key),
             )
             return service
         except ImportError as e:
         ) from e
+def get_embedding_service_if_available(
+    api_key: str | None = None,
+) -> "EmbeddingServiceProtocol | None":
     """Safely attempt to load and initialize an embedding service.
     Unlike get_embedding_service(), this function returns None instead of
     raising ImportError when no service is available.
+    Args:
+        api_key: Optional BYOK key to pass to service factory.
     Returns:
         EmbeddingServiceProtocol instance if dependencies are met, else None.
     """
     try:
+        return get_embedding_service(api_key=api_key)
     except ImportError as e:
         logger.info(
             "Embedding service not available (optional dependencies missing)",

tests/unit/agent_factory/test_get_model_auto_detect.py CHANGED Viewed

@@ -1,59 +1,73 @@
 import pytest
-from pydantic_ai.models.anthropic import AnthropicModel
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
 from src.agent_factory.judges import get_model
 from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
 class TestGetModelAutoDetect:
-    """Test that get_model() auto-detects available providers."""
     def test_returns_openai_when_key_present(self, monkeypatch):
         """OpenAI key present → OpenAI model."""
         # Mock the settings properties (settings is a singleton)
         monkeypatch.setattr(settings, "openai_api_key", "sk-test")
-        monkeypatch.setattr(settings, "anthropic_api_key", None)
         monkeypatch.setattr(settings, "hf_token", None)
         model = get_model()
         assert isinstance(model, OpenAIChatModel)
-    def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
-        """Only Anthropic key → Anthropic model."""
         monkeypatch.setattr(settings, "openai_api_key", None)
-        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
         monkeypatch.setattr(settings, "hf_token", None)
-        model = get_model()
-        assert isinstance(model, AnthropicModel)
     def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
         """HF_TOKEN present (no paid keys) → HuggingFace model."""
         monkeypatch.setattr(settings, "openai_api_key", None)
-        monkeypatch.setattr(settings, "anthropic_api_key", None)
         monkeypatch.setattr(settings, "hf_token", "hf_test_token")
         model = get_model()
         assert isinstance(model, HuggingFaceModel)
-    def test_raises_error_when_no_keys(self, monkeypatch):
-        """No keys at all → ConfigurationError."""
         monkeypatch.setattr(settings, "openai_api_key", None)
-        monkeypatch.setattr(settings, "anthropic_api_key", None)
         monkeypatch.setattr(settings, "hf_token", None)
-        with pytest.raises(ConfigurationError) as exc_info:
-            get_model()
-        assert "No LLM API key configured" in str(exc_info.value)
-    def test_openai_takes_priority_over_anthropic(self, monkeypatch):
-        """Both keys present → OpenAI wins."""
         monkeypatch.setattr(settings, "openai_api_key", "sk-test")
-        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
         model = get_model()
         assert isinstance(model, OpenAIChatModel)

 import pytest
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
 from src.agent_factory.judges import get_model
 from src.utils.config import settings
 class TestGetModelAutoDetect:
+    """Test that get_model() auto-detects available providers.
+    NOTE: Anthropic is NOT supported (no embeddings API).
+    See P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md.
+    """
     def test_returns_openai_when_key_present(self, monkeypatch):
         """OpenAI key present → OpenAI model."""
         # Mock the settings properties (settings is a singleton)
         monkeypatch.setattr(settings, "openai_api_key", "sk-test")
         monkeypatch.setattr(settings, "hf_token", None)
         model = get_model()
         assert isinstance(model, OpenAIChatModel)
+    def test_byok_openai_key_returns_openai_model(self, monkeypatch):
+        """BYOK: api_key='sk-...' → OpenAI model (regardless of env vars)."""
         monkeypatch.setattr(settings, "openai_api_key", None)
         monkeypatch.setattr(settings, "hf_token", None)
+        model = get_model(api_key="sk-byok-test-key")
+        assert isinstance(model, OpenAIChatModel)
+    def test_byok_anthropic_key_raises_not_implemented(self, monkeypatch):
+        """BYOK: api_key='sk-ant-...' → NotImplementedError (Anthropic not supported)."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", None)
+        with pytest.raises(NotImplementedError) as exc_info:
+            get_model(api_key="sk-ant-test-key")
+        assert "Anthropic is not supported" in str(exc_info.value)
     def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
         """HF_TOKEN present (no paid keys) → HuggingFace model."""
         monkeypatch.setattr(settings, "openai_api_key", None)
         monkeypatch.setattr(settings, "hf_token", "hf_test_token")
         model = get_model()
         assert isinstance(model, HuggingFaceModel)
+    def test_raises_when_no_api_keys_available(self, monkeypatch):
+        """No keys at all → RuntimeError with helpful message."""
         monkeypatch.setattr(settings, "openai_api_key", None)
         monkeypatch.setattr(settings, "hf_token", None)
+        monkeypatch.setattr(settings, "huggingface_model", "Qwen/Qwen2.5-7B-Instruct")
+        # Also ensure HF_TOKEN env var is not set
+        monkeypatch.delenv("HF_TOKEN", raising=False)
+        # Should raise clear error when no tokens available
+        import pytest
+        with pytest.raises(RuntimeError) as exc_info:
+            get_model()
+        assert "No LLM API key available" in str(exc_info.value)
+        assert "HF_TOKEN" in str(exc_info.value)
+    def test_openai_env_takes_priority_over_huggingface(self, monkeypatch):
+        """OpenAI env key present → OpenAI wins over HuggingFace."""
         monkeypatch.setattr(settings, "openai_api_key", "sk-test")
+        monkeypatch.setattr(settings, "hf_token", "hf_test_token")
         model = get_model()
         assert isinstance(model, OpenAIChatModel)

tests/unit/agent_factory/test_judges_factory.py CHANGED Viewed

@@ -1,14 +1,14 @@
-"""Unit tests for Judge Factory and Model Selection."""
 from unittest.mock import patch
 import pytest
 pytestmark = pytest.mark.unit
-from pydantic_ai.models.anthropic import AnthropicModel
-# We expect this import to exist after we implement it, or we mock it if it's not there yet
-# For TDD, we assume we will use the library class
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
@@ -23,7 +23,6 @@ def mock_settings():
 def test_get_model_openai(mock_settings):
     """Test that OpenAI model is returned when provider is openai."""
-    mock_settings.llm_provider = "openai"
     mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
@@ -33,39 +32,43 @@ def test_get_model_openai(mock_settings):
     assert model.model_name == "gpt-5"
-def test_get_model_anthropic(mock_settings):
-    """Test that Anthropic model is returned when provider is anthropic."""
-    mock_settings.llm_provider = "anthropic"
     mock_settings.has_openai_key = False
-    mock_settings.has_anthropic_key = True
-    mock_settings.anthropic_api_key = "sk-ant-test"
-    mock_settings.anthropic_model = "claude-sonnet-4-5-20250929"
-    model = get_model()
-    assert isinstance(model, AnthropicModel)
-    assert model.model_name == "claude-sonnet-4-5-20250929"
 def test_get_model_huggingface(mock_settings):
-    """Test that HuggingFace model is returned when provider is huggingface."""
-    mock_settings.llm_provider = "huggingface"
     mock_settings.has_openai_key = False
-    mock_settings.has_anthropic_key = False
-    mock_settings.has_huggingface_key = True  # CodeRabbit: explicitly set for auto-detect
     mock_settings.hf_token = "hf_test_token"
-    mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
     model = get_model()
     assert isinstance(model, HuggingFaceModel)
-    assert model.model_name == "meta-llama/Llama-3.1-70B-Instruct"
-def test_get_model_default_fallback(mock_settings):
-    """Test fallback to OpenAI if provider is unknown."""
-    mock_settings.llm_provider = "unknown_provider"
     mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
     model = get_model()
     assert isinstance(model, OpenAIChatModel)

+"""Unit tests for Judge Factory and Model Selection.
+NOTE: Anthropic is NOT supported (no embeddings API).
+See P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md.
+"""
 from unittest.mock import patch
 import pytest
 pytestmark = pytest.mark.unit
 from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIChatModel
 def test_get_model_openai(mock_settings):
     """Test that OpenAI model is returned when provider is openai."""
     mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
     assert model.model_name == "gpt-5"
+def test_get_model_byok_openai(mock_settings):
+    """Test that BYOK OpenAI key returns OpenAI model."""
     mock_settings.has_openai_key = False
+    mock_settings.openai_model = "gpt-5"
+    # BYOK takes priority over env vars
+    model = get_model(api_key="sk-byok-test")
+    assert isinstance(model, OpenAIChatModel)
+def test_get_model_byok_anthropic_raises(mock_settings):
+    """Test that BYOK Anthropic key raises NotImplementedError."""
+    mock_settings.has_openai_key = False
+    with pytest.raises(NotImplementedError) as exc_info:
+        get_model(api_key="sk-ant-test")
+    assert "Anthropic is not supported" in str(exc_info.value)
 def test_get_model_huggingface(mock_settings):
+    """Test that HuggingFace model is returned when no paid keys."""
     mock_settings.has_openai_key = False
     mock_settings.hf_token = "hf_test_token"
+    mock_settings.huggingface_model = "Qwen/Qwen2.5-7B-Instruct"
     model = get_model()
     assert isinstance(model, HuggingFaceModel)
+    assert model.model_name == "Qwen/Qwen2.5-7B-Instruct"
+def test_get_model_openai_priority(mock_settings):
+    """Test OpenAI takes priority when both keys present."""
     mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
+    mock_settings.hf_token = "hf_test_token"
     model = get_model()
     assert isinstance(model, OpenAIChatModel)

tests/unit/clients/test_chat_client_factory.py CHANGED Viewed

@@ -91,8 +91,73 @@ class TestChatClientFactory:
             from src.clients.factory import get_chat_client
             with pytest.raises(ValueError, match="Unsupported provider"):
                 get_chat_client(provider="anthropic")
     def test_provider_is_case_insensitive(self) -> None:
         """Provider matching should be case-insensitive."""
         with patch("src.clients.factory.settings") as mock_settings:

             from src.clients.factory import get_chat_client
             with pytest.raises(ValueError, match="Unsupported provider"):
+                get_chat_client(provider="invalid_provider")
+    def test_anthropic_provider_raises_not_implemented(self) -> None:
+        """Anthropic provider should raise NotImplementedError (not yet implemented)."""
+        with patch("src.clients.factory.settings") as mock_settings:
+            mock_settings.has_openai_key = False
+            mock_settings.has_gemini_key = False
+            from src.clients.factory import get_chat_client
+            with pytest.raises(NotImplementedError, match="Anthropic client not yet implemented"):
                 get_chat_client(provider="anthropic")
+    def test_byok_auto_detects_openai_from_key_prefix(self) -> None:
+        """BYOK: api_key starting with 'sk-' should auto-select OpenAI without explicit provider.
+        This is the critical BYOK (Bring Your Own Key) test case:
+        - User enters 'sk-...' key in Gradio
+        - No explicit provider parameter
+        - No OPENAI_API_KEY in env (settings.has_openai_key = False)
+        - Should auto-detect OpenAI from the key prefix
+        """
+        with patch("src.clients.factory.settings") as mock_settings:
+            mock_settings.has_openai_key = False  # No env key
+            mock_settings.has_gemini_key = False
+            mock_settings.openai_api_key = None
+            mock_settings.openai_model = "gpt-5"
+            from src.clients.factory import get_chat_client
+            # BYOK: Pass api_key without explicit provider
+            client = get_chat_client(api_key="sk-user-provided-key")
+            # Should auto-detect OpenAI from 'sk-' prefix
+            assert "OpenAI" in type(client).__name__
+    def test_byok_auto_detects_anthropic_from_key_prefix(self) -> None:
+        """BYOK: api_key starting with 'sk-ant-' should auto-detect Anthropic.
+        Anthropic keys start with 'sk-ant-' which is a superset of 'sk-'.
+        Detection must check 'sk-ant-' first to avoid misdetecting as OpenAI.
+        """
+        with patch("src.clients.factory.settings") as mock_settings:
+            mock_settings.has_openai_key = False
+            mock_settings.has_gemini_key = False
+            from src.clients.factory import get_chat_client
+            # BYOK: Anthropic key should raise NotImplementedError (not fall to HuggingFace!)
+            with pytest.raises(NotImplementedError, match="Anthropic client not yet implemented"):
+                get_chat_client(api_key="sk-ant-user-anthropic-key")
+    def test_byok_hf_token_falls_through_to_huggingface(self) -> None:
+        """BYOK: HuggingFace tokens (hf_...) should use HuggingFace client."""
+        with patch("src.clients.factory.settings") as mock_settings:
+            mock_settings.has_openai_key = False
+            mock_settings.has_gemini_key = False
+            mock_settings.huggingface_model = "Qwen/Qwen2.5-7B-Instruct"
+            mock_settings.hf_token = None
+            from src.clients.factory import get_chat_client
+            # HF tokens don't trigger auto-detection, falls through to HuggingFace
+            client = get_chat_client(api_key="hf_user_provided_token")
+            assert "HuggingFace" in type(client).__name__
     def test_provider_is_case_insensitive(self) -> None:
         """Provider matching should be case-insensitive."""
         with patch("src.clients.factory.settings") as mock_settings:

tests/unit/services/test_service_loader.py CHANGED Viewed

@@ -25,13 +25,30 @@ class TestGetEmbeddingService:
                 create=True,
             ):
                 # Also need to prevent the actual import from failing
-                mock_module = MagicMock(get_rag_service=lambda: mock_rag_service)
                 with patch.dict("sys.modules", {"src.services.llamaindex_rag": mock_module}):
                     from src.utils.service_loader import get_embedding_service
                     service = get_embedding_service()
                     assert service is mock_rag_service
     def test_falls_back_to_local_when_no_openai_key(self):
         """Should return EmbeddingService when no OpenAI key."""
         mock_local_service = MagicMock()

                 create=True,
             ):
                 # Also need to prevent the actual import from failing
+                # Update lambda to accept **kwargs (api_key)
+                mock_module = MagicMock(get_rag_service=lambda **kwargs: mock_rag_service)
                 with patch.dict("sys.modules", {"src.services.llamaindex_rag": mock_module}):
                     from src.utils.service_loader import get_embedding_service
                     service = get_embedding_service()
                     assert service is mock_rag_service
+    def test_uses_llamaindex_when_byok_key_present(self):
+        """Should return LlamaIndexRAGService when valid BYOK key passed."""
+        mock_rag_service = MagicMock()
+        with patch("src.utils.service_loader.settings") as mock_settings:
+            mock_settings.has_openai_key = False  # Env key missing
+            # Update lambda to accept **kwargs
+            mock_module = MagicMock(get_rag_service=lambda **kwargs: mock_rag_service)
+            with patch.dict("sys.modules", {"src.services.llamaindex_rag": mock_module}):
+                from src.utils.service_loader import get_embedding_service
+                # Pass valid BYOK key
+                service = get_embedding_service(api_key="sk-test-key")
+                assert service is mock_rag_service
     def test_falls_back_to_local_when_no_openai_key(self):
         """Should return EmbeddingService when no OpenAI key."""
         mock_local_service = MagicMock()

uv.lock CHANGED Viewed

@@ -1130,6 +1130,7 @@ dependencies = [
     { name = "langgraph" },
     { name = "langgraph-checkpoint-sqlite" },
     { name = "limits" },
     { name = "openai" },
     { name = "pydantic" },
     { name = "pydantic-ai" },
@@ -1195,6 +1196,7 @@ requires-dist = [
     { name = "llama-index-embeddings-openai", marker = "extra == 'modal'" },
     { name = "llama-index-llms-openai", marker = "extra == 'modal'" },
     { name = "llama-index-vector-stores-chroma", marker = "extra == 'modal'" },
     { name = "modal", marker = "extra == 'modal'", specifier = ">=0.63.0" },
     { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.10" },
     { name = "openai", specifier = ">=1.0.0" },
@@ -3007,7 +3009,7 @@ wheels = [
 [[package]]
 name = "mcp"
-version = "1.22.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
@@ -3025,9 +3027,9 @@ dependencies = [
     { name = "typing-inspection" },
     { name = "uvicorn", marker = "sys_platform != 'emscripten'" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/a3/a2/c5ec0ab38b35ade2ae49a90fada718fbc76811dc5aa1760414c6aaa6b08a/mcp-1.22.0.tar.gz", hash = "sha256:769b9ac90ed42134375b19e777a2858ca300f95f2e800982b3e2be62dfc0ba01", size = 471788 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/a9/bb/711099f9c6bb52770f56e56401cdfb10da5b67029f701e0df29362df4c8e/mcp-1.22.0-py3-none-any.whl", hash = "sha256:bed758e24df1ed6846989c909ba4e3df339a27b4f30f1b8b627862a4bade4e98", size = 175489 },
 ]
 [package.optional-dependencies]

     { name = "langgraph" },
     { name = "langgraph-checkpoint-sqlite" },
     { name = "limits" },
+    { name = "mcp" },
     { name = "openai" },
     { name = "pydantic" },
     { name = "pydantic-ai" },
     { name = "llama-index-embeddings-openai", marker = "extra == 'modal'" },
     { name = "llama-index-llms-openai", marker = "extra == 'modal'" },
     { name = "llama-index-vector-stores-chroma", marker = "extra == 'modal'" },
+    { name = "mcp", specifier = ">=1.23.0" },
     { name = "modal", marker = "extra == 'modal'", specifier = ">=0.63.0" },
     { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.10" },
     { name = "openai", specifier = ">=1.0.0" },
 [[package]]
 name = "mcp"
+version = "1.23.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
     { name = "typing-inspection" },
     { name = "uvicorn", marker = "sys_platform != 'emscripten'" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/12/42/10c0c09ca27aceacd8c428956cfabdd67e3d328fe55c4abc16589285d294/mcp-1.23.1.tar.gz", hash = "sha256:7403e053e8e2283b1e6ae631423cb54736933fea70b32422152e6064556cd298", size = 596519 }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/9f/9e/26e1d2d2c6afe15dfba5ca6799eeeea7656dce625c22766e4c57305e9cc2/mcp-1.23.1-py3-none-any.whl", hash = "sha256:3ce897fcc20a41bd50b4c58d3aa88085f11f505dcc0eaed48930012d34c731d8", size = 231433 },
 ]
 [package.optional-dependencies]