llm_compare / README.md
crossingk's picture
Update README.md
ec2f575 verified

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: LLM Compare
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false

LLM Compare

A Gradio web app for side-by-side LLM comparison. Compare your Dify application against reference models from OpenAI, Anthropic, Google Gemini, Qwen, and Yi.

Features

  • Two-column layout: Your Dify app on the left, a selectable reference model on the right
  • Multiple providers: OpenAI (GPT-4o), Anthropic (Claude), Google Gemini, Qwen, Yi
  • Overridable defaults: Base URL and Model ID auto-fill from env vars but can be edited per-session
  • Evaluation workflow: Comment and grade (1–10) each model's response
  • Nickname tracking: All evaluations tagged with user nickname
  • Excel export: Download all evaluation data as .xlsx

Setup

pip install -r requirements.txt
python app.py

Environment Variables

Set these as Hugging Face Spaces secrets (Settings β†’ Repository secrets) to provide defaults. Users can override Base URL / Model ID in the UI at runtime.

API Keys (required for each provider you use)

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GOOGLE_API_KEY Google Gemini
DASHSCOPE_API_KEY Qwen (DashScope / Alibaba)
YI_API_KEY Yi (01.AI)

Base URL overrides (optional)

Override the default API endpoint for each provider. Useful for proxies or custom deployments.

Variable Default
OPENAI_BASE_URL (uses OpenAI SDK default)
ANTHROPIC_BASE_URL (uses Anthropic SDK default)
GOOGLE_BASE_URL (uses Google GenAI SDK default)
DASHSCOPE_BASE_URL https://dashscope.aliyuncs.com/compatible-mode/v1
YI_BASE_URL https://api.01.ai/v1

Model ID overrides (optional)

Override the default model ID. Useful for switching to newer model versions without code changes.

Variable Default
OPENAI_MODEL_ID gpt-4o
OPENAI_MINI_MODEL_ID gpt-4o-mini
ANTHROPIC_MODEL_ID claude-sonnet-4-20250514
GOOGLE_MODEL_ID gemini-2.0-flash
DASHSCOPE_MODEL_ID qwen-plus
YI_MODEL_ID yi-large

How it works

  1. Select a reference model from the dropdown β€” Base URL and Model ID auto-fill from env vars (or registry defaults)
  2. Edit Base URL / Model ID if needed (changes apply to current session only)
  3. Enter your prompt and click Send to Both
  4. Grade and comment on each response, then Submit Evaluation

Deployment

Deploy on HuggingFace Spaces with Gradio SDK. Set the API keys and optional overrides as repository secrets in Settings.