Add UI-based LLM provider selection for cloud testing
Browse files- Config-based LLM selection: Added LLM_PROVIDER and ENABLE_LLM_FALLBACK environment variables
- Smart routing: _call_with_fallback() function routes to selected provider with optional fallback
- UI dropdowns: Added LLM provider selection to both Test & Debug and Full Evaluation tabs
- Cloud-friendly: UI selection overrides env vars, enabling instant provider switching without rebuilds
- Unified behavior: Same UI selection works identically in local and cloud environments
Modified files:
- src/agent/llm_client.py: Config-based provider routing (~150 lines)
- app.py: UI dropdowns and function parameter updates (~30 lines)
- CHANGELOG.md: Documented two problems solved
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- CHANGELOG.md +68 -0
- app.py +49 -6
- src/agent/llm_client.py +119 -109
|
@@ -99,6 +99,74 @@
|
|
| 99 |
- ✅ No regressions introduced by Stage 5 changes
|
| 100 |
- ✅ Test suite run time: ~2min 40sec
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
### Created Files
|
| 103 |
|
| 104 |
### Deleted Files
|
|
|
|
| 99 |
- ✅ No regressions introduced by Stage 5 changes
|
| 100 |
- ✅ Test suite run time: ~2min 40sec
|
| 101 |
|
| 102 |
+
### [PROBLEM: LLM Provider Debugging - Config-Based Selection]
|
| 103 |
+
|
| 104 |
+
**Problem:** Hard to debug which LLM provider is handling each step with 4-tier fallback chain. Cannot isolate provider performance for improvement.
|
| 105 |
+
|
| 106 |
+
**Modified Files:**
|
| 107 |
+
|
| 108 |
+
- **.env** (~5 lines added)
|
| 109 |
+
- Added `LLM_PROVIDER=gemini` - Select single provider: "gemini", "huggingface", "groq", or "claude"
|
| 110 |
+
- Added `ENABLE_LLM_FALLBACK=false` - Toggle fallback behavior (true/false)
|
| 111 |
+
- Removed deprecated `DEFAULT_LLM_MODEL` config
|
| 112 |
+
|
| 113 |
+
- **src/agent/llm_client.py** (~150 lines added/modified)
|
| 114 |
+
- Added `LLM_PROVIDER` config variable (line 49) - Reads from environment
|
| 115 |
+
- Added `ENABLE_LLM_FALLBACK` config variable (line 50) - Reads from environment
|
| 116 |
+
- Added `_get_provider_function()` helper (lines 114-158) - Maps function names to provider implementations
|
| 117 |
+
- Added `_call_with_fallback()` routing function (lines 161-212)
|
| 118 |
+
- Primary provider: Uses LLM_PROVIDER config
|
| 119 |
+
- Fallback behavior: Controlled by ENABLE_LLM_FALLBACK
|
| 120 |
+
- Logging: Clear info logs showing which provider is used
|
| 121 |
+
- Error handling: Specific error messages when fallback disabled
|
| 122 |
+
- Updated `plan_question()` - Now uses `_call_with_fallback()` (simplified from 40 lines to 1 line)
|
| 123 |
+
- Updated `select_tools_with_function_calling()` - Now uses `_call_with_fallback()` (simplified from 40 lines to 1 line)
|
| 124 |
+
- Updated `synthesize_answer()` - Now uses `_call_with_fallback()` (simplified from 40 lines to 1 line)
|
| 125 |
+
|
| 126 |
+
**Benefits:**
|
| 127 |
+
- ✅ Easy debugging: Change `LLM_PROVIDER=groq` in .env to test specific provider
|
| 128 |
+
- ✅ Clear logs: Know exactly which LLM handled each step
|
| 129 |
+
- ✅ Isolated testing: Disable fallback to test single provider performance
|
| 130 |
+
- ✅ Production safety: Enable fallback=true for deployment reliability
|
| 131 |
+
|
| 132 |
+
**Verification:**
|
| 133 |
+
- ✅ Config-based selection tested with Groq provider
|
| 134 |
+
- ✅ Logs show "Using primary provider: groq"
|
| 135 |
+
- ✅ Fallback disabled error handling works correctly
|
| 136 |
+
|
| 137 |
+
### [PROBLEM: Cloud Testing UX - UI-Based LLM Selection]
|
| 138 |
+
|
| 139 |
+
**Problem:** Testing different LLM providers in HF Spaces cloud requires manually changing environment variables in Space settings, then waiting for rebuild. Slow iteration, poor UX.
|
| 140 |
+
|
| 141 |
+
**Modified Files:**
|
| 142 |
+
|
| 143 |
+
- **app.py** (~30 lines added/modified)
|
| 144 |
+
- Updated `test_single_question()` function signature - Added `llm_provider` and `enable_fallback` parameters
|
| 145 |
+
- Sets `os.environ["LLM_PROVIDER"]` from UI selection (overrides .env and HF Space env vars)
|
| 146 |
+
- Sets `os.environ["ENABLE_LLM_FALLBACK"]` from UI checkbox
|
| 147 |
+
- Adds provider info to diagnostics output
|
| 148 |
+
- Updated `run_and_submit_all()` function signature - Added `llm_provider` and `enable_fallback` parameters
|
| 149 |
+
- Reordered params: UI inputs first, profile last (optional)
|
| 150 |
+
- Sets environment variables before agent initialization
|
| 151 |
+
- Added UI components in "Test & Debug" tab:
|
| 152 |
+
- `llm_provider_dropdown` - Select from: Gemini, HuggingFace, Groq, Claude (default: Groq)
|
| 153 |
+
- `enable_fallback_checkbox` - Toggle fallback behavior (default: false for testing)
|
| 154 |
+
- Added UI components in "Full Evaluation" tab:
|
| 155 |
+
- `eval_llm_provider_dropdown` - Select LLM for all questions (default: Groq)
|
| 156 |
+
- `eval_enable_fallback_checkbox` - Toggle fallback (default: true for production)
|
| 157 |
+
- Updated button click handlers to pass new UI inputs to functions
|
| 158 |
+
|
| 159 |
+
**Benefits:**
|
| 160 |
+
- ✅ **Cloud testing:** Test all 4 providers directly from HF Space UI
|
| 161 |
+
- ✅ **Instant switching:** No environment variable changes, no rebuild wait
|
| 162 |
+
- ✅ **Clear visibility:** UI shows which provider is selected
|
| 163 |
+
- ✅ **A/B testing:** Easy comparison between providers on same questions
|
| 164 |
+
- ✅ **Production safety:** Fallback enabled by default for full evaluation
|
| 165 |
+
|
| 166 |
+
**Verification:**
|
| 167 |
+
- ✅ No syntax errors in app.py
|
| 168 |
+
- ✅ UI components properly connected to function parameters
|
| 169 |
+
|
| 170 |
### Created Files
|
| 171 |
|
| 172 |
### Deleted Files
|
|
@@ -148,12 +148,18 @@ def format_diagnostics(final_state: dict) -> str:
|
|
| 148 |
return "\n".join(diagnostics)
|
| 149 |
|
| 150 |
|
| 151 |
-
def test_single_question(question: str):
|
| 152 |
"""Test agent with a single question and return diagnostics."""
|
| 153 |
if not question or not question.strip():
|
| 154 |
return "Please enter a question.", "", check_api_keys()
|
| 155 |
|
| 156 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
# Initialize agent
|
| 158 |
agent = GAIAAgent()
|
| 159 |
|
|
@@ -163,8 +169,9 @@ def test_single_question(question: str):
|
|
| 163 |
# Get final state from agent
|
| 164 |
final_state = agent.last_state or {}
|
| 165 |
|
| 166 |
-
# Format diagnostics
|
| 167 |
-
|
|
|
|
| 168 |
api_status = check_api_keys()
|
| 169 |
|
| 170 |
return answer, diagnostics, api_status
|
|
@@ -183,7 +190,7 @@ def test_single_question(question: str):
|
|
| 183 |
# Stage 5: Performance optimization
|
| 184 |
|
| 185 |
|
| 186 |
-
def run_and_submit_all(profile: gr.OAuthProfile | None):
|
| 187 |
"""
|
| 188 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
| 189 |
and displays the results.
|
|
@@ -202,6 +209,11 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
|
|
| 202 |
questions_url = f"{api_url}/questions"
|
| 203 |
submit_url = f"{api_url}/submit"
|
| 204 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
# 1. Instantiate Agent (Stage 1: GAIAAgent with LangGraph)
|
| 206 |
try:
|
| 207 |
logger.info("Initializing GAIAAgent...")
|
|
@@ -363,6 +375,20 @@ with gr.Blocks() as demo:
|
|
| 363 |
placeholder="e.g., What is the capital of France?",
|
| 364 |
lines=3
|
| 365 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 366 |
test_button = gr.Button("Run Test", variant="primary")
|
| 367 |
|
| 368 |
with gr.Row():
|
|
@@ -386,7 +412,7 @@ with gr.Blocks() as demo:
|
|
| 386 |
|
| 387 |
test_button.click(
|
| 388 |
fn=test_single_question,
|
| 389 |
-
inputs=[test_question_input],
|
| 390 |
outputs=[test_answer_output, test_diagnostics_output, test_api_status]
|
| 391 |
)
|
| 392 |
|
|
@@ -409,6 +435,19 @@ with gr.Blocks() as demo:
|
|
| 409 |
|
| 410 |
gr.LoginButton()
|
| 411 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 412 |
run_button = gr.Button("Run Evaluation & Submit All Answers")
|
| 413 |
|
| 414 |
status_output = gr.Textbox(
|
|
@@ -422,7 +461,11 @@ with gr.Blocks() as demo:
|
|
| 422 |
type="filepath"
|
| 423 |
)
|
| 424 |
|
| 425 |
-
run_button.click(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 426 |
|
| 427 |
if __name__ == "__main__":
|
| 428 |
print("\n" + "-" * 30 + " App Starting " + "-" * 30)
|
|
|
|
| 148 |
return "\n".join(diagnostics)
|
| 149 |
|
| 150 |
|
| 151 |
+
def test_single_question(question: str, llm_provider: str, enable_fallback: bool):
|
| 152 |
"""Test agent with a single question and return diagnostics."""
|
| 153 |
if not question or not question.strip():
|
| 154 |
return "Please enter a question.", "", check_api_keys()
|
| 155 |
|
| 156 |
try:
|
| 157 |
+
# Set LLM provider from UI selection (overrides .env)
|
| 158 |
+
os.environ["LLM_PROVIDER"] = llm_provider.lower()
|
| 159 |
+
os.environ["ENABLE_LLM_FALLBACK"] = "true" if enable_fallback else "false"
|
| 160 |
+
|
| 161 |
+
logger.info(f"UI Config: LLM_PROVIDER={llm_provider}, ENABLE_LLM_FALLBACK={enable_fallback}")
|
| 162 |
+
|
| 163 |
# Initialize agent
|
| 164 |
agent = GAIAAgent()
|
| 165 |
|
|
|
|
| 169 |
# Get final state from agent
|
| 170 |
final_state = agent.last_state or {}
|
| 171 |
|
| 172 |
+
# Format diagnostics with LLM provider info
|
| 173 |
+
provider_info = f"**LLM Provider:** {llm_provider} (Fallback: {'Enabled' if enable_fallback else 'Disabled'})\n\n"
|
| 174 |
+
diagnostics = provider_info + format_diagnostics(final_state)
|
| 175 |
api_status = check_api_keys()
|
| 176 |
|
| 177 |
return answer, diagnostics, api_status
|
|
|
|
| 190 |
# Stage 5: Performance optimization
|
| 191 |
|
| 192 |
|
| 193 |
+
def run_and_submit_all(llm_provider: str, enable_fallback: bool, profile: gr.OAuthProfile | None = None):
|
| 194 |
"""
|
| 195 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
| 196 |
and displays the results.
|
|
|
|
| 209 |
questions_url = f"{api_url}/questions"
|
| 210 |
submit_url = f"{api_url}/submit"
|
| 211 |
|
| 212 |
+
# Set LLM provider from UI selection (overrides .env)
|
| 213 |
+
os.environ["LLM_PROVIDER"] = llm_provider.lower()
|
| 214 |
+
os.environ["ENABLE_LLM_FALLBACK"] = "true" if enable_fallback else "false"
|
| 215 |
+
logger.info(f"UI Config for Full Evaluation: LLM_PROVIDER={llm_provider}, ENABLE_LLM_FALLBACK={enable_fallback}")
|
| 216 |
+
|
| 217 |
# 1. Instantiate Agent (Stage 1: GAIAAgent with LangGraph)
|
| 218 |
try:
|
| 219 |
logger.info("Initializing GAIAAgent...")
|
|
|
|
| 375 |
placeholder="e.g., What is the capital of France?",
|
| 376 |
lines=3
|
| 377 |
)
|
| 378 |
+
|
| 379 |
+
with gr.Row():
|
| 380 |
+
llm_provider_dropdown = gr.Dropdown(
|
| 381 |
+
label="LLM Provider",
|
| 382 |
+
choices=["Gemini", "HuggingFace", "Groq", "Claude"],
|
| 383 |
+
value="Groq",
|
| 384 |
+
info="Select which LLM to use for this test"
|
| 385 |
+
)
|
| 386 |
+
enable_fallback_checkbox = gr.Checkbox(
|
| 387 |
+
label="Enable Fallback",
|
| 388 |
+
value=False,
|
| 389 |
+
info="If enabled, falls back to other providers on failure"
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
test_button = gr.Button("Run Test", variant="primary")
|
| 393 |
|
| 394 |
with gr.Row():
|
|
|
|
| 412 |
|
| 413 |
test_button.click(
|
| 414 |
fn=test_single_question,
|
| 415 |
+
inputs=[test_question_input, llm_provider_dropdown, enable_fallback_checkbox],
|
| 416 |
outputs=[test_answer_output, test_diagnostics_output, test_api_status]
|
| 417 |
)
|
| 418 |
|
|
|
|
| 435 |
|
| 436 |
gr.LoginButton()
|
| 437 |
|
| 438 |
+
with gr.Row():
|
| 439 |
+
eval_llm_provider_dropdown = gr.Dropdown(
|
| 440 |
+
label="LLM Provider for Evaluation",
|
| 441 |
+
choices=["Gemini", "HuggingFace", "Groq", "Claude"],
|
| 442 |
+
value="Groq",
|
| 443 |
+
info="Select which LLM to use for all questions"
|
| 444 |
+
)
|
| 445 |
+
eval_enable_fallback_checkbox = gr.Checkbox(
|
| 446 |
+
label="Enable Fallback",
|
| 447 |
+
value=True,
|
| 448 |
+
info="Recommended: Enable fallback for production evaluation"
|
| 449 |
+
)
|
| 450 |
+
|
| 451 |
run_button = gr.Button("Run Evaluation & Submit All Answers")
|
| 452 |
|
| 453 |
status_output = gr.Textbox(
|
|
|
|
| 461 |
type="filepath"
|
| 462 |
)
|
| 463 |
|
| 464 |
+
run_button.click(
|
| 465 |
+
fn=run_and_submit_all,
|
| 466 |
+
inputs=[eval_llm_provider_dropdown, eval_enable_fallback_checkbox],
|
| 467 |
+
outputs=[status_output, results_table, export_output]
|
| 468 |
+
)
|
| 469 |
|
| 470 |
if __name__ == "__main__":
|
| 471 |
print("\n" + "-" * 30 + " App Starting " + "-" * 30)
|
|
@@ -45,6 +45,10 @@ GROQ_MODEL = "qwen/qwen3-32b" # Free tier: 60 req/min, fast inference
|
|
| 45 |
TEMPERATURE = 0 # Deterministic for factoid answers
|
| 46 |
MAX_TOKENS = 4096
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
# ============================================================================
|
| 49 |
# Logging Setup
|
| 50 |
# ============================================================================
|
|
@@ -102,6 +106,112 @@ def retry_with_backoff(func: Callable, max_retries: int = 3) -> Any:
|
|
| 102 |
raise
|
| 103 |
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
# ============================================================================
|
| 106 |
# Client Initialization
|
| 107 |
# ============================================================================
|
|
@@ -423,8 +533,8 @@ def plan_question(
|
|
| 423 |
"""
|
| 424 |
Analyze question and generate execution plan using LLM.
|
| 425 |
|
| 426 |
-
|
| 427 |
-
|
| 428 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 429 |
|
| 430 |
Args:
|
|
@@ -435,43 +545,7 @@ def plan_question(
|
|
| 435 |
Returns:
|
| 436 |
Execution plan as structured text
|
| 437 |
"""
|
| 438 |
-
|
| 439 |
-
return retry_with_backoff(
|
| 440 |
-
lambda: plan_question_gemini(question, available_tools, file_paths)
|
| 441 |
-
)
|
| 442 |
-
except Exception as gemini_error:
|
| 443 |
-
logger.warning(
|
| 444 |
-
f"[plan_question] Gemini failed: {gemini_error}, trying HuggingFace fallback"
|
| 445 |
-
)
|
| 446 |
-
try:
|
| 447 |
-
return retry_with_backoff(
|
| 448 |
-
lambda: plan_question_hf(question, available_tools, file_paths)
|
| 449 |
-
)
|
| 450 |
-
except Exception as hf_error:
|
| 451 |
-
logger.warning(
|
| 452 |
-
f"[plan_question] HuggingFace failed: {hf_error}, trying Groq fallback"
|
| 453 |
-
)
|
| 454 |
-
try:
|
| 455 |
-
return retry_with_backoff(
|
| 456 |
-
lambda: plan_question_groq(question, available_tools, file_paths)
|
| 457 |
-
)
|
| 458 |
-
except Exception as groq_error:
|
| 459 |
-
logger.warning(
|
| 460 |
-
f"[plan_question] Groq failed: {groq_error}, trying Claude fallback"
|
| 461 |
-
)
|
| 462 |
-
try:
|
| 463 |
-
return retry_with_backoff(
|
| 464 |
-
lambda: plan_question_claude(
|
| 465 |
-
question, available_tools, file_paths
|
| 466 |
-
)
|
| 467 |
-
)
|
| 468 |
-
except Exception as claude_error:
|
| 469 |
-
logger.error(
|
| 470 |
-
f"[plan_question] All LLMs failed. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 471 |
-
)
|
| 472 |
-
raise Exception(
|
| 473 |
-
f"Planning failed with all LLMs. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 474 |
-
)
|
| 475 |
|
| 476 |
|
| 477 |
# ============================================================================
|
|
@@ -825,8 +899,8 @@ def select_tools_with_function_calling(
|
|
| 825 |
"""
|
| 826 |
Use LLM function calling to dynamically select tools and extract parameters.
|
| 827 |
|
| 828 |
-
|
| 829 |
-
|
| 830 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 831 |
|
| 832 |
Args:
|
|
@@ -837,41 +911,7 @@ def select_tools_with_function_calling(
|
|
| 837 |
Returns:
|
| 838 |
List of tool calls with extracted parameters
|
| 839 |
"""
|
| 840 |
-
|
| 841 |
-
return retry_with_backoff(
|
| 842 |
-
lambda: select_tools_gemini(question, plan, available_tools)
|
| 843 |
-
)
|
| 844 |
-
except Exception as gemini_error:
|
| 845 |
-
logger.warning(
|
| 846 |
-
f"[select_tools] Gemini failed: {gemini_error}, trying HuggingFace fallback"
|
| 847 |
-
)
|
| 848 |
-
try:
|
| 849 |
-
return retry_with_backoff(
|
| 850 |
-
lambda: select_tools_hf(question, plan, available_tools)
|
| 851 |
-
)
|
| 852 |
-
except Exception as hf_error:
|
| 853 |
-
logger.warning(
|
| 854 |
-
f"[select_tools] HuggingFace failed: {hf_error}, trying Groq fallback"
|
| 855 |
-
)
|
| 856 |
-
try:
|
| 857 |
-
return retry_with_backoff(
|
| 858 |
-
lambda: select_tools_groq(question, plan, available_tools)
|
| 859 |
-
)
|
| 860 |
-
except Exception as groq_error:
|
| 861 |
-
logger.warning(
|
| 862 |
-
f"[select_tools] Groq failed: {groq_error}, trying Claude fallback"
|
| 863 |
-
)
|
| 864 |
-
try:
|
| 865 |
-
return retry_with_backoff(
|
| 866 |
-
lambda: select_tools_claude(question, plan, available_tools)
|
| 867 |
-
)
|
| 868 |
-
except Exception as claude_error:
|
| 869 |
-
logger.error(
|
| 870 |
-
f"[select_tools] All LLMs failed. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 871 |
-
)
|
| 872 |
-
raise Exception(
|
| 873 |
-
f"Tool selection failed with all LLMs. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 874 |
-
)
|
| 875 |
|
| 876 |
|
| 877 |
# ============================================================================
|
|
@@ -1121,8 +1161,8 @@ def synthesize_answer(question: str, evidence: List[str]) -> str:
|
|
| 1121 |
"""
|
| 1122 |
Synthesize factoid answer from collected evidence using LLM.
|
| 1123 |
|
| 1124 |
-
|
| 1125 |
-
|
| 1126 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 1127 |
|
| 1128 |
Args:
|
|
@@ -1132,37 +1172,7 @@ def synthesize_answer(question: str, evidence: List[str]) -> str:
|
|
| 1132 |
Returns:
|
| 1133 |
Factoid answer string
|
| 1134 |
"""
|
| 1135 |
-
|
| 1136 |
-
return retry_with_backoff(lambda: synthesize_answer_gemini(question, evidence))
|
| 1137 |
-
except Exception as gemini_error:
|
| 1138 |
-
logger.warning(
|
| 1139 |
-
f"[synthesize_answer] Gemini failed: {gemini_error}, trying HuggingFace fallback"
|
| 1140 |
-
)
|
| 1141 |
-
try:
|
| 1142 |
-
return retry_with_backoff(lambda: synthesize_answer_hf(question, evidence))
|
| 1143 |
-
except Exception as hf_error:
|
| 1144 |
-
logger.warning(
|
| 1145 |
-
f"[synthesize_answer] HuggingFace failed: {hf_error}, trying Groq fallback"
|
| 1146 |
-
)
|
| 1147 |
-
try:
|
| 1148 |
-
return retry_with_backoff(
|
| 1149 |
-
lambda: synthesize_answer_groq(question, evidence)
|
| 1150 |
-
)
|
| 1151 |
-
except Exception as groq_error:
|
| 1152 |
-
logger.warning(
|
| 1153 |
-
f"[synthesize_answer] Groq failed: {groq_error}, trying Claude fallback"
|
| 1154 |
-
)
|
| 1155 |
-
try:
|
| 1156 |
-
return retry_with_backoff(
|
| 1157 |
-
lambda: synthesize_answer_claude(question, evidence)
|
| 1158 |
-
)
|
| 1159 |
-
except Exception as claude_error:
|
| 1160 |
-
logger.error(
|
| 1161 |
-
f"[synthesize_answer] All LLMs failed. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 1162 |
-
)
|
| 1163 |
-
raise Exception(
|
| 1164 |
-
f"Answer synthesis failed with all LLMs. Gemini: {gemini_error}, HF: {hf_error}, Groq: {groq_error}, Claude: {claude_error}"
|
| 1165 |
-
)
|
| 1166 |
|
| 1167 |
|
| 1168 |
# ============================================================================
|
|
|
|
| 45 |
TEMPERATURE = 0 # Deterministic for factoid answers
|
| 46 |
MAX_TOKENS = 4096
|
| 47 |
|
| 48 |
+
# LLM Provider Selection
|
| 49 |
+
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "gemini").lower() # "gemini", "huggingface", "groq", or "claude"
|
| 50 |
+
ENABLE_LLM_FALLBACK = os.getenv("ENABLE_LLM_FALLBACK", "false").lower() == "true"
|
| 51 |
+
|
| 52 |
# ============================================================================
|
| 53 |
# Logging Setup
|
| 54 |
# ============================================================================
|
|
|
|
| 106 |
raise
|
| 107 |
|
| 108 |
|
| 109 |
+
# ============================================================================
|
| 110 |
+
# LLM Provider Routing
|
| 111 |
+
# ============================================================================
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
def _get_provider_function(function_name: str, provider: str) -> Callable:
|
| 115 |
+
"""
|
| 116 |
+
Get the provider-specific function for a given operation.
|
| 117 |
+
|
| 118 |
+
Args:
|
| 119 |
+
function_name: Base function name ("plan_question", "select_tools", "synthesize_answer")
|
| 120 |
+
provider: Provider name ("gemini", "huggingface", "groq", "claude")
|
| 121 |
+
|
| 122 |
+
Returns:
|
| 123 |
+
Callable: Provider-specific function
|
| 124 |
+
|
| 125 |
+
Raises:
|
| 126 |
+
ValueError: If provider is invalid
|
| 127 |
+
"""
|
| 128 |
+
# Map function names to provider-specific implementations
|
| 129 |
+
function_map = {
|
| 130 |
+
"plan_question": {
|
| 131 |
+
"gemini": plan_question_gemini,
|
| 132 |
+
"huggingface": plan_question_hf,
|
| 133 |
+
"groq": plan_question_groq,
|
| 134 |
+
"claude": plan_question_claude,
|
| 135 |
+
},
|
| 136 |
+
"select_tools": {
|
| 137 |
+
"gemini": select_tools_gemini,
|
| 138 |
+
"huggingface": select_tools_hf,
|
| 139 |
+
"groq": select_tools_groq,
|
| 140 |
+
"claude": select_tools_claude,
|
| 141 |
+
},
|
| 142 |
+
"synthesize_answer": {
|
| 143 |
+
"gemini": synthesize_answer_gemini,
|
| 144 |
+
"huggingface": synthesize_answer_hf,
|
| 145 |
+
"groq": synthesize_answer_groq,
|
| 146 |
+
"claude": synthesize_answer_claude,
|
| 147 |
+
},
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
+
if function_name not in function_map:
|
| 151 |
+
raise ValueError(f"Unknown function name: {function_name}")
|
| 152 |
+
|
| 153 |
+
if provider not in function_map[function_name]:
|
| 154 |
+
raise ValueError(
|
| 155 |
+
f"Unknown provider: {provider}. Valid options: gemini, huggingface, groq, claude"
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
return function_map[function_name][provider]
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def _call_with_fallback(function_name: str, *args, **kwargs) -> Any:
|
| 162 |
+
"""
|
| 163 |
+
Call LLM function with configured provider and optional fallback.
|
| 164 |
+
|
| 165 |
+
Args:
|
| 166 |
+
function_name: Base function name ("plan_question", "select_tools", "synthesize_answer")
|
| 167 |
+
*args, **kwargs: Arguments to pass to the provider-specific function
|
| 168 |
+
|
| 169 |
+
Returns:
|
| 170 |
+
Result from LLM call
|
| 171 |
+
|
| 172 |
+
Raises:
|
| 173 |
+
Exception: If selected provider fails and fallback disabled, or all providers fail
|
| 174 |
+
"""
|
| 175 |
+
primary_provider = LLM_PROVIDER
|
| 176 |
+
|
| 177 |
+
# Define fallback order (excluding primary provider)
|
| 178 |
+
all_providers = ["gemini", "huggingface", "groq", "claude"]
|
| 179 |
+
fallback_providers = [p for p in all_providers if p != primary_provider]
|
| 180 |
+
|
| 181 |
+
# Try primary provider first
|
| 182 |
+
try:
|
| 183 |
+
primary_func = _get_provider_function(function_name, primary_provider)
|
| 184 |
+
logger.info(f"[{function_name}] Using primary provider: {primary_provider}")
|
| 185 |
+
return retry_with_backoff(lambda: primary_func(*args, **kwargs))
|
| 186 |
+
except Exception as primary_error:
|
| 187 |
+
logger.warning(f"[{function_name}] Primary provider {primary_provider} failed: {primary_error}")
|
| 188 |
+
|
| 189 |
+
# If fallback disabled, raise immediately
|
| 190 |
+
if not ENABLE_LLM_FALLBACK:
|
| 191 |
+
logger.error(f"[{function_name}] Fallback disabled. Failing fast.")
|
| 192 |
+
raise Exception(
|
| 193 |
+
f"{function_name} failed with {primary_provider}: {primary_error}. "
|
| 194 |
+
f"Fallback disabled (ENABLE_LLM_FALLBACK=false)"
|
| 195 |
+
)
|
| 196 |
+
|
| 197 |
+
# Try fallback providers in order
|
| 198 |
+
errors = {primary_provider: primary_error}
|
| 199 |
+
for fallback_provider in fallback_providers:
|
| 200 |
+
try:
|
| 201 |
+
fallback_func = _get_provider_function(function_name, fallback_provider)
|
| 202 |
+
logger.info(f"[{function_name}] Trying fallback provider: {fallback_provider}")
|
| 203 |
+
return retry_with_backoff(lambda: fallback_func(*args, **kwargs))
|
| 204 |
+
except Exception as fallback_error:
|
| 205 |
+
errors[fallback_provider] = fallback_error
|
| 206 |
+
logger.warning(f"[{function_name}] Fallback provider {fallback_provider} failed: {fallback_error}")
|
| 207 |
+
continue
|
| 208 |
+
|
| 209 |
+
# All providers failed
|
| 210 |
+
error_summary = ", ".join([f"{k}: {v}" for k, v in errors.items()])
|
| 211 |
+
logger.error(f"[{function_name}] All providers failed. {error_summary}")
|
| 212 |
+
raise Exception(f"{function_name} failed with all providers. {error_summary}")
|
| 213 |
+
|
| 214 |
+
|
| 215 |
# ============================================================================
|
| 216 |
# Client Initialization
|
| 217 |
# ============================================================================
|
|
|
|
| 533 |
"""
|
| 534 |
Analyze question and generate execution plan using LLM.
|
| 535 |
|
| 536 |
+
Uses LLM_PROVIDER config to select which provider to use.
|
| 537 |
+
If ENABLE_LLM_FALLBACK=true, falls back to other providers on failure.
|
| 538 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 539 |
|
| 540 |
Args:
|
|
|
|
| 545 |
Returns:
|
| 546 |
Execution plan as structured text
|
| 547 |
"""
|
| 548 |
+
return _call_with_fallback("plan_question", question, available_tools, file_paths)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 549 |
|
| 550 |
|
| 551 |
# ============================================================================
|
|
|
|
| 899 |
"""
|
| 900 |
Use LLM function calling to dynamically select tools and extract parameters.
|
| 901 |
|
| 902 |
+
Uses LLM_PROVIDER config to select which provider to use.
|
| 903 |
+
If ENABLE_LLM_FALLBACK=true, falls back to other providers on failure.
|
| 904 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 905 |
|
| 906 |
Args:
|
|
|
|
| 911 |
Returns:
|
| 912 |
List of tool calls with extracted parameters
|
| 913 |
"""
|
| 914 |
+
return _call_with_fallback("select_tools", question, plan, available_tools)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 915 |
|
| 916 |
|
| 917 |
# ============================================================================
|
|
|
|
| 1161 |
"""
|
| 1162 |
Synthesize factoid answer from collected evidence using LLM.
|
| 1163 |
|
| 1164 |
+
Uses LLM_PROVIDER config to select which provider to use.
|
| 1165 |
+
If ENABLE_LLM_FALLBACK=true, falls back to other providers on failure.
|
| 1166 |
Each provider call wrapped with retry logic (3 attempts with exponential backoff).
|
| 1167 |
|
| 1168 |
Args:
|
|
|
|
| 1172 |
Returns:
|
| 1173 |
Factoid answer string
|
| 1174 |
"""
|
| 1175 |
+
return _call_with_fallback("synthesize_answer", question, evidence)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1176 |
|
| 1177 |
|
| 1178 |
# ============================================================================
|