agentbee

Sleeping

File size: 2,099 Bytes

d93842c

# [dev_260104_09] Cloud Testing UX - UI-Based LLM Selection

**Date:** 2026-01-04
**Type:** Feature
**Status:** Resolved
**Stage:** [Stage 5: Performance Optimization]

## Problem Description

Testing different LLM providers in HF Spaces cloud requires manually changing environment variables in Space settings, then waiting for rebuild. Slow iteration, poor UX.

---

## Key Decisions

- **UI dropdowns:** Add provider selection in both Test & Debug and Full Evaluation tabs
- **Environment override:** Set os.environ directly from UI selection (overrides .env and HF Space env vars)
- **Toggle fallback:** Checkbox to enable/disable fallback behavior
- **Default strategy:** Groq for testing, fallback enabled for production

---

## Outcome

Cloud testing now much faster - test all 4 providers directly from HF Space UI without rebuild.

**Deliverables:**
- `app.py` - Added UI dropdowns and checkboxes for LLM provider selection in both tabs

## Changelog

**What was changed:**
- **app.py** (~30 lines added/modified)
  - Updated `test_single_question()` function signature - Added `llm_provider` and `enable_fallback` parameters
    - Sets `os.environ["LLM_PROVIDER"]` from UI selection (overrides .env and HF Space env vars)
    - Sets `os.environ["ENABLE_LLM_FALLBACK"]` from UI checkbox
    - Adds provider info to diagnostics output
  - Updated `run_and_submit_all()` function signature - Added `llm_provider` and `enable_fallback` parameters
    - Reordered params: UI inputs first, profile last (optional)
    - Sets environment variables before agent initialization
  - Added UI components in "Test & Debug" tab:
    - `llm_provider_dropdown` - Select from: Gemini, HuggingFace, Groq, Claude (default: Groq)
    - `enable_fallback_checkbox` - Toggle fallback behavior (default: false for testing)
  - Added UI components in "Full Evaluation" tab:
    - `eval_llm_provider_dropdown` - Select LLM for all questions (default: Groq)
    - `eval_enable_fallback_checkbox` - Toggle fallback (default: true for production)
  - Updated button click handlers to pass new UI inputs to functions