agentbee / dev /dev_260104_09_ui_based_llm_selection.md
mangubee's picture
Update Dev
d93842c
|
raw
history blame
2.1 kB

[dev_260104_09] Cloud Testing UX - UI-Based LLM Selection

Date: 2026-01-04 Type: Feature Status: Resolved Stage: [Stage 5: Performance Optimization]

Problem Description

Testing different LLM providers in HF Spaces cloud requires manually changing environment variables in Space settings, then waiting for rebuild. Slow iteration, poor UX.


Key Decisions

  • UI dropdowns: Add provider selection in both Test & Debug and Full Evaluation tabs
  • Environment override: Set os.environ directly from UI selection (overrides .env and HF Space env vars)
  • Toggle fallback: Checkbox to enable/disable fallback behavior
  • Default strategy: Groq for testing, fallback enabled for production

Outcome

Cloud testing now much faster - test all 4 providers directly from HF Space UI without rebuild.

Deliverables:

  • app.py - Added UI dropdowns and checkboxes for LLM provider selection in both tabs

Changelog

What was changed:

  • app.py (~30 lines added/modified)
    • Updated test_single_question() function signature - Added llm_provider and enable_fallback parameters
      • Sets os.environ["LLM_PROVIDER"] from UI selection (overrides .env and HF Space env vars)
      • Sets os.environ["ENABLE_LLM_FALLBACK"] from UI checkbox
      • Adds provider info to diagnostics output
    • Updated run_and_submit_all() function signature - Added llm_provider and enable_fallback parameters
      • Reordered params: UI inputs first, profile last (optional)
      • Sets environment variables before agent initialization
    • Added UI components in "Test & Debug" tab:
      • llm_provider_dropdown - Select from: Gemini, HuggingFace, Groq, Claude (default: Groq)
      • enable_fallback_checkbox - Toggle fallback behavior (default: false for testing)
    • Added UI components in "Full Evaluation" tab:
      • eval_llm_provider_dropdown - Select LLM for all questions (default: Groq)
      • eval_enable_fallback_checkbox - Toggle fallback (default: true for production)
    • Updated button click handlers to pass new UI inputs to functions