Spaces:
Sleeping
Sleeping
pull
#1
by crossingk - opened
- .gitattributes +35 -2
- .github/prompts/plan-llmCompare.prompt.md +0 -69
- .gitignore +0 -5
- README.md +6 -75
- __pycache__/app.cpython-311.pyc +0 -0
- __pycache__/db.cpython-311.pyc +0 -0
- __pycache__/providers.cpython-311.pyc +0 -0
- app.py +0 -297
- db.py +0 -100
- evaluations.db +0 -0
- llm_compare +0 -1
- providers.py +0 -184
- requirements.txt +0 -5
.gitattributes
CHANGED
|
@@ -1,2 +1,35 @@
|
|
| 1 |
-
|
| 2 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
.github/prompts/plan-llmCompare.prompt.md
DELETED
|
@@ -1,69 +0,0 @@
|
|
| 1 |
-
## Plan: LLM Comparison Web App (Gradio)
|
| 2 |
-
|
| 3 |
-
Build a Gradio Blocks app with two-column side-by-side LLM comparison. Left: user's custom model via OpenAI-compatible API endpoint. Right: selectable provider models (OpenAI, Anthropic, Gemini, Qwen, Yi) with default API keys from HF Spaces secrets. Users enter a nickname, prompt both models, then comment and grade (1-10) each response. All evaluations persist to SQLite. Admin can download all data as Excel (.xlsx). Deploy on HuggingFace Spaces.
|
| 4 |
-
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
### Phase 1: Project Setup
|
| 8 |
-
1. Create `requirements.txt` with: `gradio`, `openai`, `anthropic`, `google-generativeai`, `openpyxl`
|
| 9 |
-
2. Update `README.md` with project description and setup instructions
|
| 10 |
-
|
| 11 |
-
### Phase 2: Database Layer — `db.py`
|
| 12 |
-
3. Create SQLite helper with `init_db()` to create the `evaluations` table with columns: `id`, `timestamp`, `nickname`, `prompt`, `left_model_name`, `left_model_endpoint`, `left_response`, `left_comment`, `left_grade`, `right_model_name`, `right_provider`, `right_response`, `right_comment`, `right_grade`
|
| 13 |
-
4. Add `save_evaluation(...)` function to insert a row
|
| 14 |
-
5. Add `export_to_excel(filepath)` function using `openpyxl` to dump all rows to .xlsx
|
| 15 |
-
|
| 16 |
-
### Phase 3: LLM Provider Abstraction — `providers.py`
|
| 17 |
-
6. Define a model registry dict mapping display name → (provider, model_id, base_url, env_var_name):
|
| 18 |
-
- **OpenAI** (`gpt-4o`, `gpt-4o-mini`): `openai` SDK, default base
|
| 19 |
-
- **Anthropic** (`claude-sonnet-4-20250514`): `anthropic` SDK
|
| 20 |
-
- **Google Gemini** (`gemini-2.0-flash`): `google-generativeai` SDK
|
| 21 |
-
- **Qwen** (`qwen-plus`): `openai` SDK with DashScope base URL
|
| 22 |
-
- **Yi** (`yi-large`): `openai` SDK with 01.AI base URL
|
| 23 |
-
7. Implement `call_model(provider, model_name, prompt, api_key)` — dispatches to the correct SDK, falls back to env var key if user key is empty
|
| 24 |
-
8. Implement `call_custom_endpoint(base_url, model_name, prompt, api_key)` — uses `openai` SDK with user-supplied base_url for the left-side custom model
|
| 25 |
-
|
| 26 |
-
### Phase 4: Gradio UI — `app.py`
|
| 27 |
-
9. Build Gradio Blocks layout:
|
| 28 |
-
- **Top bar**: Nickname text input (required)
|
| 29 |
-
- **Prompt area**: Shared textbox + "Send to both" button
|
| 30 |
-
- **Two-column `gr.Row`**:
|
| 31 |
-
- **Left** ("Your Model"): API endpoint URL, model name, API key, response display, comment textbox, grade slider (1-10)
|
| 32 |
-
- **Right** ("Reference Model"): model dropdown (from registry), API key (optional, default provided), response display, comment textbox, grade slider (1-10)
|
| 33 |
-
- **Submit Evaluation** button → saves to SQLite
|
| 34 |
-
- **Download Report** button → exports .xlsx file
|
| 35 |
-
10. Wire "Send to both" → calls both models, displays responses
|
| 36 |
-
11. Wire "Submit Evaluation" → validates inputs, saves to DB, shows success notification
|
| 37 |
-
12. Wire "Download Report" → exports SQLite to temp .xlsx, returns as `gr.File`
|
| 38 |
-
|
| 39 |
-
### Phase 5: Security & Configuration
|
| 40 |
-
13. Default API keys from env vars (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `DASHSCOPE_API_KEY`, `YI_API_KEY`), set as HF Spaces secrets. User-provided keys override per-session only — never stored. All keys processed server-side only.
|
| 41 |
-
14. Input sanitization: validate URL format for left endpoint, sanitize nickname (max 50 chars)
|
| 42 |
-
|
| 43 |
-
### Phase 6: Deployment
|
| 44 |
-
15. Create HuggingFace Space (Gradio SDK), push code
|
| 45 |
-
16. Set repository secrets for default API keys
|
| 46 |
-
17. End-to-end test on live Space
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
**Relevant files**
|
| 51 |
-
- `app.py` — Main Gradio Blocks UI, event wiring, layout (new)
|
| 52 |
-
- `db.py` — SQLite init, save, export functions (new)
|
| 53 |
-
- `providers.py` — Model registry, API call dispatch (new)
|
| 54 |
-
- `requirements.txt` — Python dependencies (new)
|
| 55 |
-
- `README.md` — Update with project info (existing)
|
| 56 |
-
|
| 57 |
-
**Verification**
|
| 58 |
-
1. Launch locally with `python app.py`, verify two-column layout renders
|
| 59 |
-
2. Test left column with a local OpenAI-compatible endpoint (e.g. Ollama)
|
| 60 |
-
3. Test right column with each provider using default keys
|
| 61 |
-
4. Submit evaluation → verify row in SQLite
|
| 62 |
-
5. Download report → verify .xlsx has all columns populated
|
| 63 |
-
6. Test validation (missing nickname, missing grade → error)
|
| 64 |
-
7. Deploy to HF Spaces, set secrets, run full end-to-end
|
| 65 |
-
|
| 66 |
-
**Further Considerations**
|
| 67 |
-
1. **SQLite persistence on HF Spaces**: Ephemeral storage resets on restart. Recommend enabling persistent storage and placing DB under `/data`. Alternative: periodic backup to HF Dataset.
|
| 68 |
-
2. **Rate limiting**: Consider adding per-nickname rate limiting to prevent abuse of default API keys.
|
| 69 |
-
3. **Streaming responses**: Initial version uses non-streaming calls; streaming can be added later for better UX.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.gitignore
DELETED
|
@@ -1,5 +0,0 @@
|
|
| 1 |
-
__pycache__/
|
| 2 |
-
*.pyc
|
| 3 |
-
evaluations.db
|
| 4 |
-
*.xlsx
|
| 5 |
-
.DS_Store
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,82 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
A Gradio web app for side-by-side LLM comparison. Compare your Dify application against reference models from OpenAI, Anthropic, Google Gemini, Qwen, and Yi.
|
| 15 |
-
|
| 16 |
-
## Features
|
| 17 |
-
|
| 18 |
-
- **Two-column layout**: Your Dify app on the left, a selectable reference model on the right
|
| 19 |
-
- **Multiple providers**: OpenAI (GPT-4o), Anthropic (Claude), Google Gemini, Qwen, Yi
|
| 20 |
-
- **Overridable defaults**: Base URL and Model ID auto-fill from env vars but can be edited per-session
|
| 21 |
-
- **Evaluation workflow**: Comment and grade (1–10) each model's response
|
| 22 |
-
- **Nickname tracking**: All evaluations tagged with user nickname
|
| 23 |
-
- **Excel export**: Download all evaluation data as `.xlsx`
|
| 24 |
-
|
| 25 |
-
## Setup
|
| 26 |
-
|
| 27 |
-
```bash
|
| 28 |
-
pip install -r requirements.txt
|
| 29 |
-
python app.py
|
| 30 |
-
```
|
| 31 |
-
|
| 32 |
-
## Environment Variables
|
| 33 |
-
|
| 34 |
-
Set these as **Hugging Face Spaces secrets** (Settings → Repository secrets) to provide defaults.
|
| 35 |
-
Users can override Base URL / Model ID in the UI at runtime.
|
| 36 |
-
|
| 37 |
-
### API Keys (required for each provider you use)
|
| 38 |
-
|
| 39 |
-
| Variable | Provider |
|
| 40 |
-
|---|---|
|
| 41 |
-
| `OPENAI_API_KEY` | OpenAI |
|
| 42 |
-
| `ANTHROPIC_API_KEY` | Anthropic |
|
| 43 |
-
| `GOOGLE_API_KEY` | Google Gemini |
|
| 44 |
-
| `DASHSCOPE_API_KEY` | Qwen (DashScope / Alibaba) |
|
| 45 |
-
| `YI_API_KEY` | Yi (01.AI) |
|
| 46 |
-
|
| 47 |
-
### Base URL overrides (optional)
|
| 48 |
-
|
| 49 |
-
Override the default API endpoint for each provider. Useful for proxies or custom deployments.
|
| 50 |
-
|
| 51 |
-
| Variable | Default |
|
| 52 |
-
|---|---|
|
| 53 |
-
| `OPENAI_BASE_URL` | *(uses OpenAI SDK default)* |
|
| 54 |
-
| `ANTHROPIC_BASE_URL` | *(uses Anthropic SDK default)* |
|
| 55 |
-
| `GOOGLE_BASE_URL` | *(uses Google GenAI SDK default)* |
|
| 56 |
-
| `DASHSCOPE_BASE_URL` | `https://dashscope.aliyuncs.com/compatible-mode/v1` |
|
| 57 |
-
| `YI_BASE_URL` | `https://api.01.ai/v1` |
|
| 58 |
-
|
| 59 |
-
### Model ID overrides (optional)
|
| 60 |
-
|
| 61 |
-
Override the default model ID. Useful for switching to newer model versions without code changes.
|
| 62 |
-
|
| 63 |
-
| Variable | Default |
|
| 64 |
-
|---|---|
|
| 65 |
-
| `OPENAI_MODEL_ID` | `gpt-4o` |
|
| 66 |
-
| `OPENAI_MINI_MODEL_ID` | `gpt-4o-mini` |
|
| 67 |
-
| `ANTHROPIC_MODEL_ID` | `claude-sonnet-4-20250514` |
|
| 68 |
-
| `GOOGLE_MODEL_ID` | `gemini-2.0-flash` |
|
| 69 |
-
| `DASHSCOPE_MODEL_ID` | `qwen-plus` |
|
| 70 |
-
| `YI_MODEL_ID` | `yi-large` |
|
| 71 |
-
|
| 72 |
-
## How it works
|
| 73 |
-
|
| 74 |
-
1. Select a reference model from the dropdown — **Base URL** and **Model ID** auto-fill from env vars (or registry defaults)
|
| 75 |
-
2. Edit Base URL / Model ID if needed (changes apply to current session only)
|
| 76 |
-
3. Enter your prompt and click **Send to Both**
|
| 77 |
-
4. Grade and comment on each response, then **Submit Evaluation**
|
| 78 |
-
|
| 79 |
-
## Deployment
|
| 80 |
-
|
| 81 |
-
Deploy on HuggingFace Spaces with Gradio SDK. Set the API keys and optional overrides as repository secrets in Settings.
|
| 82 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Llm Compare
|
| 3 |
+
emoji: 👀
|
| 4 |
+
colorFrom: green
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 6.9.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
short_description: compares anti DV agent with other public agents
|
| 11 |
---
|
| 12 |
|
| 13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
__pycache__/app.cpython-311.pyc
DELETED
|
Binary file (9.94 kB)
|
|
|
__pycache__/db.cpython-311.pyc
DELETED
|
Binary file (4.43 kB)
|
|
|
__pycache__/providers.cpython-311.pyc
DELETED
|
Binary file (5.29 kB)
|
|
|
app.py
DELETED
|
@@ -1,297 +0,0 @@
|
|
| 1 |
-
import re
|
| 2 |
-
import tempfile
|
| 3 |
-
import gradio as gr
|
| 4 |
-
|
| 5 |
-
from db import init_db, save_evaluation, export_to_excel
|
| 6 |
-
from providers import (
|
| 7 |
-
MODEL_NAMES,
|
| 8 |
-
call_model,
|
| 9 |
-
call_custom_endpoint,
|
| 10 |
-
MODEL_REGISTRY,
|
| 11 |
-
get_model_defaults,
|
| 12 |
-
)
|
| 13 |
-
|
| 14 |
-
# ---------------------------------------------------------------------------
|
| 15 |
-
# Initialise database on import
|
| 16 |
-
# ---------------------------------------------------------------------------
|
| 17 |
-
init_db()
|
| 18 |
-
|
| 19 |
-
# ---------------------------------------------------------------------------
|
| 20 |
-
# Helpers
|
| 21 |
-
# ---------------------------------------------------------------------------
|
| 22 |
-
URL_RE = re.compile(r"^https?://\S+$")
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
def _sanitize_nickname(nick: str) -> str:
|
| 26 |
-
return nick.strip()[:50]
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
def _validate_url(url: str) -> bool:
|
| 30 |
-
return bool(URL_RE.match(url.strip()))
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
def on_model_select(display_name: str):
|
| 34 |
-
"""When user picks a model from dropdown, populate base_url and model_id."""
|
| 35 |
-
base_url, model_id = get_model_defaults(display_name)
|
| 36 |
-
return base_url, model_id
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
# ---------------------------------------------------------------------------
|
| 40 |
-
# Event handlers
|
| 41 |
-
# ---------------------------------------------------------------------------
|
| 42 |
-
|
| 43 |
-
def send_to_both(
|
| 44 |
-
prompt: str,
|
| 45 |
-
left_url: str,
|
| 46 |
-
left_model: str,
|
| 47 |
-
left_key: str,
|
| 48 |
-
right_name: str,
|
| 49 |
-
right_base_url: str,
|
| 50 |
-
right_model_id: str,
|
| 51 |
-
right_key: str,
|
| 52 |
-
):
|
| 53 |
-
"""Call both models and return their responses."""
|
| 54 |
-
if not prompt or not prompt.strip():
|
| 55 |
-
raise gr.Error("Please enter a prompt.")
|
| 56 |
-
|
| 57 |
-
# Left — Dify endpoint
|
| 58 |
-
left_response = ""
|
| 59 |
-
left_err = ""
|
| 60 |
-
if left_url and left_url.strip():
|
| 61 |
-
if not _validate_url(left_url):
|
| 62 |
-
left_err = "⚠️ Invalid URL format. Use http:// or https://."
|
| 63 |
-
else:
|
| 64 |
-
try:
|
| 65 |
-
left_response = call_custom_endpoint(
|
| 66 |
-
left_url.strip(), left_model.strip() or "default", prompt, left_key
|
| 67 |
-
)
|
| 68 |
-
except Exception as e:
|
| 69 |
-
left_err = f"⚠️ Left model error: {e}"
|
| 70 |
-
|
| 71 |
-
# Right — registry model (with optional user overrides)
|
| 72 |
-
right_response = ""
|
| 73 |
-
right_err = ""
|
| 74 |
-
try:
|
| 75 |
-
right_response = call_model(
|
| 76 |
-
right_name, prompt, right_key, right_base_url, right_model_id
|
| 77 |
-
)
|
| 78 |
-
except Exception as e:
|
| 79 |
-
right_err = f"⚠️ Right model error: {e}"
|
| 80 |
-
|
| 81 |
-
return (
|
| 82 |
-
left_response if not left_err else left_err,
|
| 83 |
-
right_response if not right_err else right_err,
|
| 84 |
-
)
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def submit_evaluation(
|
| 88 |
-
nickname: str,
|
| 89 |
-
prompt: str,
|
| 90 |
-
left_url: str,
|
| 91 |
-
left_model: str,
|
| 92 |
-
left_response: str,
|
| 93 |
-
left_comment: str,
|
| 94 |
-
left_grade: int,
|
| 95 |
-
right_name: str,
|
| 96 |
-
right_model_id: str,
|
| 97 |
-
right_response: str,
|
| 98 |
-
right_comment: str,
|
| 99 |
-
right_grade: int,
|
| 100 |
-
):
|
| 101 |
-
"""Validate and persist an evaluation."""
|
| 102 |
-
nickname = _sanitize_nickname(nickname)
|
| 103 |
-
if not nickname:
|
| 104 |
-
raise gr.Error("Nickname is required.")
|
| 105 |
-
if not prompt or not prompt.strip():
|
| 106 |
-
raise gr.Error("Prompt is empty — send a prompt first.")
|
| 107 |
-
if not left_response.strip() and not right_response.strip():
|
| 108 |
-
raise gr.Error("No responses to evaluate — send a prompt first.")
|
| 109 |
-
if left_grade < 1 or left_grade > 10:
|
| 110 |
-
raise gr.Error("Left grade must be between 1 and 10.")
|
| 111 |
-
if right_grade < 1 or right_grade > 10:
|
| 112 |
-
raise gr.Error("Right grade must be between 1 and 10.")
|
| 113 |
-
|
| 114 |
-
entry = MODEL_REGISTRY.get(right_name, {})
|
| 115 |
-
right_provider = entry.get("provider", "unknown")
|
| 116 |
-
|
| 117 |
-
save_evaluation(
|
| 118 |
-
nickname=nickname,
|
| 119 |
-
prompt=prompt,
|
| 120 |
-
left_model_name=left_model.strip() or "custom",
|
| 121 |
-
left_model_endpoint=left_url.strip(),
|
| 122 |
-
left_response=left_response,
|
| 123 |
-
left_comment=left_comment,
|
| 124 |
-
left_grade=int(left_grade),
|
| 125 |
-
right_model_name=right_model_id.strip() or right_name,
|
| 126 |
-
right_provider=right_provider,
|
| 127 |
-
right_response=right_response,
|
| 128 |
-
right_comment=right_comment,
|
| 129 |
-
right_grade=int(right_grade),
|
| 130 |
-
)
|
| 131 |
-
gr.Info("✅ Evaluation saved!")
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
def download_report():
|
| 135 |
-
"""Export all evaluations to a temp .xlsx and return as a downloadable file."""
|
| 136 |
-
tmp = tempfile.NamedTemporaryFile(suffix=".xlsx", delete=False)
|
| 137 |
-
export_to_excel(tmp.name)
|
| 138 |
-
return tmp.name
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
# ---------------------------------------------------------------------------
|
| 142 |
-
# Gradio Blocks UI
|
| 143 |
-
# ---------------------------------------------------------------------------
|
| 144 |
-
|
| 145 |
-
# Pre-compute initial defaults for first model
|
| 146 |
-
_init_base_url, _init_model_id = get_model_defaults(MODEL_NAMES[0])
|
| 147 |
-
|
| 148 |
-
with gr.Blocks(title="LLM Compare") as demo:
|
| 149 |
-
gr.Markdown("# 🔍 LLM Compare\nSide-by-side comparison: your Dify app vs reference models.")
|
| 150 |
-
|
| 151 |
-
# ---- Top bar: nickname ---------------------------------------------------
|
| 152 |
-
with gr.Row():
|
| 153 |
-
nickname = gr.Textbox(
|
| 154 |
-
label="Your Nickname",
|
| 155 |
-
placeholder="Enter a nickname (required)",
|
| 156 |
-
scale=2,
|
| 157 |
-
)
|
| 158 |
-
|
| 159 |
-
# ---- Prompt area ---------------------------------------------------------
|
| 160 |
-
with gr.Row():
|
| 161 |
-
prompt = gr.Textbox(
|
| 162 |
-
label="Prompt",
|
| 163 |
-
placeholder="Type your prompt here…",
|
| 164 |
-
lines=4,
|
| 165 |
-
scale=4,
|
| 166 |
-
)
|
| 167 |
-
send_btn = gr.Button("🚀 Send to Both", variant="primary", scale=1)
|
| 168 |
-
|
| 169 |
-
# ---- Two-column layout ---------------------------------------------------
|
| 170 |
-
with gr.Row(equal_height=True):
|
| 171 |
-
# ---- LEFT: Dify model ------------------------------------------------
|
| 172 |
-
with gr.Column():
|
| 173 |
-
gr.Markdown("### 🧪 Your Model (Dify Endpoint)")
|
| 174 |
-
left_url = gr.Textbox(
|
| 175 |
-
label="Dify API Base URL",
|
| 176 |
-
placeholder="https://api.dify.ai/v1",
|
| 177 |
-
)
|
| 178 |
-
left_model = gr.Textbox(
|
| 179 |
-
label="App Name (for display only)",
|
| 180 |
-
placeholder="e.g. my-dify-app",
|
| 181 |
-
)
|
| 182 |
-
left_key = gr.Textbox(
|
| 183 |
-
label="Dify Secret Key",
|
| 184 |
-
placeholder="app-xxxxxxxxxxxx",
|
| 185 |
-
type="password",
|
| 186 |
-
)
|
| 187 |
-
left_response = gr.Textbox(
|
| 188 |
-
label="Response",
|
| 189 |
-
lines=12,
|
| 190 |
-
interactive=False,
|
| 191 |
-
)
|
| 192 |
-
left_comment = gr.Textbox(
|
| 193 |
-
label="Comment",
|
| 194 |
-
placeholder="Your thoughts on this response…",
|
| 195 |
-
lines=2,
|
| 196 |
-
)
|
| 197 |
-
left_grade = gr.Slider(
|
| 198 |
-
minimum=1,
|
| 199 |
-
maximum=10,
|
| 200 |
-
step=1,
|
| 201 |
-
value=5,
|
| 202 |
-
label="Grade (1–10)",
|
| 203 |
-
)
|
| 204 |
-
|
| 205 |
-
# ---- RIGHT: reference model ------------------------------------------
|
| 206 |
-
with gr.Column():
|
| 207 |
-
gr.Markdown("### 📚 Reference Model")
|
| 208 |
-
right_name = gr.Dropdown(
|
| 209 |
-
choices=MODEL_NAMES,
|
| 210 |
-
value=MODEL_NAMES[0],
|
| 211 |
-
label="Select Model",
|
| 212 |
-
)
|
| 213 |
-
right_base_url = gr.Textbox(
|
| 214 |
-
label="Base URL (auto-filled, editable)",
|
| 215 |
-
value=_init_base_url,
|
| 216 |
-
placeholder="e.g. https://api.openai.com/v1",
|
| 217 |
-
)
|
| 218 |
-
right_model_id = gr.Textbox(
|
| 219 |
-
label="Model ID (auto-filled, editable)",
|
| 220 |
-
value=_init_model_id,
|
| 221 |
-
placeholder="e.g. gpt-4o",
|
| 222 |
-
)
|
| 223 |
-
right_key = gr.Textbox(
|
| 224 |
-
label="API Key (optional — uses env default)",
|
| 225 |
-
placeholder="Leave blank to use default key",
|
| 226 |
-
type="password",
|
| 227 |
-
)
|
| 228 |
-
right_response = gr.Textbox(
|
| 229 |
-
label="Response",
|
| 230 |
-
lines=12,
|
| 231 |
-
interactive=False,
|
| 232 |
-
)
|
| 233 |
-
right_comment = gr.Textbox(
|
| 234 |
-
label="Comment",
|
| 235 |
-
placeholder="Your thoughts on this response…",
|
| 236 |
-
lines=2,
|
| 237 |
-
)
|
| 238 |
-
right_grade = gr.Slider(
|
| 239 |
-
minimum=1,
|
| 240 |
-
maximum=10,
|
| 241 |
-
step=1,
|
| 242 |
-
value=5,
|
| 243 |
-
label="Grade (1–10)",
|
| 244 |
-
)
|
| 245 |
-
|
| 246 |
-
# ---- Action buttons ------------------------------------------------------
|
| 247 |
-
with gr.Row():
|
| 248 |
-
submit_btn = gr.Button("💾 Submit Evaluation", variant="primary")
|
| 249 |
-
download_btn = gr.Button("📥 Download Report (.xlsx)")
|
| 250 |
-
report_file = gr.File(label="Report", visible=False)
|
| 251 |
-
|
| 252 |
-
# ---- Wiring --------------------------------------------------------------
|
| 253 |
-
|
| 254 |
-
# Auto-fill base_url and model_id when dropdown changes
|
| 255 |
-
right_name.change(
|
| 256 |
-
fn=on_model_select,
|
| 257 |
-
inputs=[right_name],
|
| 258 |
-
outputs=[right_base_url, right_model_id],
|
| 259 |
-
)
|
| 260 |
-
|
| 261 |
-
send_btn.click(
|
| 262 |
-
fn=send_to_both,
|
| 263 |
-
inputs=[
|
| 264 |
-
prompt, left_url, left_model, left_key,
|
| 265 |
-
right_name, right_base_url, right_model_id, right_key,
|
| 266 |
-
],
|
| 267 |
-
outputs=[left_response, right_response],
|
| 268 |
-
)
|
| 269 |
-
|
| 270 |
-
submit_btn.click(
|
| 271 |
-
fn=submit_evaluation,
|
| 272 |
-
inputs=[
|
| 273 |
-
nickname,
|
| 274 |
-
prompt,
|
| 275 |
-
left_url,
|
| 276 |
-
left_model,
|
| 277 |
-
left_response,
|
| 278 |
-
left_comment,
|
| 279 |
-
left_grade,
|
| 280 |
-
right_name,
|
| 281 |
-
right_model_id,
|
| 282 |
-
right_response,
|
| 283 |
-
right_comment,
|
| 284 |
-
right_grade,
|
| 285 |
-
],
|
| 286 |
-
outputs=[],
|
| 287 |
-
)
|
| 288 |
-
|
| 289 |
-
download_btn.click(
|
| 290 |
-
fn=download_report,
|
| 291 |
-
inputs=[],
|
| 292 |
-
outputs=[report_file],
|
| 293 |
-
).then(lambda: gr.update(visible=True), outputs=[report_file])
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
if __name__ == "__main__":
|
| 297 |
-
demo.launch(theme=gr.themes.Soft())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
db.py
DELETED
|
@@ -1,100 +0,0 @@
|
|
| 1 |
-
import sqlite3
|
| 2 |
-
import os
|
| 3 |
-
from datetime import datetime
|
| 4 |
-
|
| 5 |
-
from openpyxl import Workbook
|
| 6 |
-
|
| 7 |
-
DB_DIR = os.environ.get("DATA_DIR", ".")
|
| 8 |
-
DB_PATH = os.path.join(DB_DIR, "evaluations.db")
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
def _get_conn() -> sqlite3.Connection:
|
| 12 |
-
conn = sqlite3.connect(DB_PATH)
|
| 13 |
-
conn.execute("PRAGMA journal_mode=WAL")
|
| 14 |
-
return conn
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
def init_db() -> None:
|
| 18 |
-
conn = _get_conn()
|
| 19 |
-
conn.execute(
|
| 20 |
-
"""
|
| 21 |
-
CREATE TABLE IF NOT EXISTS evaluations (
|
| 22 |
-
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
| 23 |
-
timestamp TEXT NOT NULL,
|
| 24 |
-
nickname TEXT NOT NULL,
|
| 25 |
-
prompt TEXT NOT NULL,
|
| 26 |
-
left_model_name TEXT NOT NULL,
|
| 27 |
-
left_model_endpoint TEXT NOT NULL,
|
| 28 |
-
left_response TEXT NOT NULL,
|
| 29 |
-
left_comment TEXT NOT NULL DEFAULT '',
|
| 30 |
-
left_grade INTEGER NOT NULL,
|
| 31 |
-
right_model_name TEXT NOT NULL,
|
| 32 |
-
right_provider TEXT NOT NULL,
|
| 33 |
-
right_response TEXT NOT NULL,
|
| 34 |
-
right_comment TEXT NOT NULL DEFAULT '',
|
| 35 |
-
right_grade INTEGER NOT NULL
|
| 36 |
-
)
|
| 37 |
-
"""
|
| 38 |
-
)
|
| 39 |
-
conn.commit()
|
| 40 |
-
conn.close()
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
def save_evaluation(
|
| 44 |
-
nickname: str,
|
| 45 |
-
prompt: str,
|
| 46 |
-
left_model_name: str,
|
| 47 |
-
left_model_endpoint: str,
|
| 48 |
-
left_response: str,
|
| 49 |
-
left_comment: str,
|
| 50 |
-
left_grade: int,
|
| 51 |
-
right_model_name: str,
|
| 52 |
-
right_provider: str,
|
| 53 |
-
right_response: str,
|
| 54 |
-
right_comment: str,
|
| 55 |
-
right_grade: int,
|
| 56 |
-
) -> None:
|
| 57 |
-
conn = _get_conn()
|
| 58 |
-
conn.execute(
|
| 59 |
-
"""
|
| 60 |
-
INSERT INTO evaluations (
|
| 61 |
-
timestamp, nickname, prompt,
|
| 62 |
-
left_model_name, left_model_endpoint, left_response, left_comment, left_grade,
|
| 63 |
-
right_model_name, right_provider, right_response, right_comment, right_grade
|
| 64 |
-
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
| 65 |
-
""",
|
| 66 |
-
(
|
| 67 |
-
datetime.utcnow().isoformat(),
|
| 68 |
-
nickname,
|
| 69 |
-
prompt,
|
| 70 |
-
left_model_name,
|
| 71 |
-
left_model_endpoint,
|
| 72 |
-
left_response,
|
| 73 |
-
left_comment,
|
| 74 |
-
left_grade,
|
| 75 |
-
right_model_name,
|
| 76 |
-
right_provider,
|
| 77 |
-
right_response,
|
| 78 |
-
right_comment,
|
| 79 |
-
right_grade,
|
| 80 |
-
),
|
| 81 |
-
)
|
| 82 |
-
conn.commit()
|
| 83 |
-
conn.close()
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
def export_to_excel(filepath: str) -> str:
|
| 87 |
-
conn = _get_conn()
|
| 88 |
-
cursor = conn.execute("SELECT * FROM evaluations ORDER BY id")
|
| 89 |
-
columns = [desc[0] for desc in cursor.description]
|
| 90 |
-
rows = cursor.fetchall()
|
| 91 |
-
conn.close()
|
| 92 |
-
|
| 93 |
-
wb = Workbook()
|
| 94 |
-
ws = wb.active
|
| 95 |
-
ws.title = "Evaluations"
|
| 96 |
-
ws.append(columns)
|
| 97 |
-
for row in rows:
|
| 98 |
-
ws.append(list(row))
|
| 99 |
-
wb.save(filepath)
|
| 100 |
-
return filepath
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
evaluations.db
DELETED
|
Binary file (12.3 kB)
|
|
|
llm_compare
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
Subproject commit 75be778201f3a72ce3af88996ea7e433263a5f41
|
|
|
|
|
|
providers.py
DELETED
|
@@ -1,184 +0,0 @@
|
|
| 1 |
-
import os
|
| 2 |
-
import requests
|
| 3 |
-
from openai import OpenAI
|
| 4 |
-
import anthropic
|
| 5 |
-
from google import genai
|
| 6 |
-
|
| 7 |
-
# ---------------------------------------------------------------------------
|
| 8 |
-
# Model Registry
|
| 9 |
-
# Each entry: display_name -> {provider, model_id, base_url (None = default), env_var}
|
| 10 |
-
# ---------------------------------------------------------------------------
|
| 11 |
-
MODEL_REGISTRY: dict[str, dict] = {
|
| 12 |
-
"GPT-4o (OpenAI)": {
|
| 13 |
-
"provider": "openai",
|
| 14 |
-
"model_id": "gpt-4o",
|
| 15 |
-
"base_url": None,
|
| 16 |
-
"env_var": "OPENAI_API_KEY",
|
| 17 |
-
"env_base_url": "OPENAI_BASE_URL",
|
| 18 |
-
"env_model_id": "OPENAI_MODEL_ID",
|
| 19 |
-
},
|
| 20 |
-
"GPT-4o-mini (OpenAI)": {
|
| 21 |
-
"provider": "openai",
|
| 22 |
-
"model_id": "gpt-4o-mini",
|
| 23 |
-
"base_url": None,
|
| 24 |
-
"env_var": "OPENAI_API_KEY",
|
| 25 |
-
"env_base_url": "OPENAI_BASE_URL",
|
| 26 |
-
"env_model_id": "OPENAI_MINI_MODEL_ID",
|
| 27 |
-
},
|
| 28 |
-
"Claude Sonnet 4 (Anthropic)": {
|
| 29 |
-
"provider": "anthropic",
|
| 30 |
-
"model_id": "claude-sonnet-4-6",
|
| 31 |
-
"base_url": None,
|
| 32 |
-
"env_var": "ANTHROPIC_API_KEY",
|
| 33 |
-
"env_base_url": "ANTHROPIC_BASE_URL",
|
| 34 |
-
"env_model_id": "ANTHROPIC_MODEL_ID",
|
| 35 |
-
},
|
| 36 |
-
"Gemini 2.0 Flash (Google)": {
|
| 37 |
-
"provider": "gemini",
|
| 38 |
-
"model_id": "gemini-2.0-flash",
|
| 39 |
-
"base_url": None,
|
| 40 |
-
"env_var": "GOOGLE_API_KEY",
|
| 41 |
-
"env_base_url": "GOOGLE_BASE_URL",
|
| 42 |
-
"env_model_id": "GOOGLE_MODEL_ID",
|
| 43 |
-
},
|
| 44 |
-
"Qwen-Plus (Alibaba)": {
|
| 45 |
-
"provider": "openai_compat",
|
| 46 |
-
"model_id": "qwen-plus",
|
| 47 |
-
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
| 48 |
-
"env_var": "DASHSCOPE_API_KEY",
|
| 49 |
-
"env_base_url": "DASHSCOPE_BASE_URL",
|
| 50 |
-
"env_model_id": "DASHSCOPE_MODEL_ID",
|
| 51 |
-
},
|
| 52 |
-
"Yi-Large (01.AI)": {
|
| 53 |
-
"provider": "openai_compat",
|
| 54 |
-
"model_id": "yi-large",
|
| 55 |
-
"base_url": "https://api.01.ai/v1",
|
| 56 |
-
"env_var": "YI_API_KEY",
|
| 57 |
-
"env_base_url": "YI_BASE_URL",
|
| 58 |
-
"env_model_id": "YI_MODEL_ID",
|
| 59 |
-
},
|
| 60 |
-
}
|
| 61 |
-
|
| 62 |
-
MODEL_NAMES = list(MODEL_REGISTRY.keys())
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
def get_model_defaults(display_name: str) -> tuple[str, str]:
|
| 66 |
-
"""Return (base_url, model_id) for a registry model, considering env overrides.
|
| 67 |
-
|
| 68 |
-
Priority: env var > registry hardcoded value.
|
| 69 |
-
"""
|
| 70 |
-
entry = MODEL_REGISTRY.get(display_name, {})
|
| 71 |
-
base_url = os.environ.get(entry.get("env_base_url", ""), "") or entry.get("base_url") or ""
|
| 72 |
-
model_id = os.environ.get(entry.get("env_model_id", ""), "") or entry.get("model_id", "")
|
| 73 |
-
return base_url, model_id
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
def _resolve_key(env_var: str, user_key: str | None) -> str:
|
| 77 |
-
"""Return user-provided key if non-empty, else fall back to env var."""
|
| 78 |
-
if user_key and user_key.strip():
|
| 79 |
-
return user_key.strip()
|
| 80 |
-
key = os.environ.get(env_var, "")
|
| 81 |
-
if not key:
|
| 82 |
-
raise ValueError(
|
| 83 |
-
f"No API key provided and environment variable {env_var} is not set."
|
| 84 |
-
)
|
| 85 |
-
return key
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
# ---------------------------------------------------------------------------
|
| 89 |
-
# Provider dispatch
|
| 90 |
-
# ---------------------------------------------------------------------------
|
| 91 |
-
|
| 92 |
-
def _call_openai(model_id: str, prompt: str, api_key: str, base_url: str | None) -> str:
|
| 93 |
-
client = OpenAI(api_key=api_key, base_url=base_url)
|
| 94 |
-
resp = client.chat.completions.create(
|
| 95 |
-
model=model_id,
|
| 96 |
-
messages=[{"role": "user", "content": prompt}],
|
| 97 |
-
)
|
| 98 |
-
return resp.choices[0].message.content
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
def _call_anthropic(model_id: str, prompt: str, api_key: str) -> str:
|
| 102 |
-
client = anthropic.Anthropic(api_key=api_key)
|
| 103 |
-
resp = client.messages.create(
|
| 104 |
-
model=model_id,
|
| 105 |
-
max_tokens=4096,
|
| 106 |
-
messages=[{"role": "user", "content": prompt}],
|
| 107 |
-
)
|
| 108 |
-
return resp.content[0].text
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
def _call_gemini(model_id: str, prompt: str, api_key: str) -> str:
|
| 112 |
-
client = genai.Client(api_key=api_key)
|
| 113 |
-
resp = client.models.generate_content(model=model_id, contents=prompt)
|
| 114 |
-
return resp.text
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
def call_model(
|
| 118 |
-
display_name: str,
|
| 119 |
-
prompt: str,
|
| 120 |
-
user_key: str | None = None,
|
| 121 |
-
user_base_url: str | None = None,
|
| 122 |
-
user_model_id: str | None = None,
|
| 123 |
-
) -> str:
|
| 124 |
-
"""Call a reference model from the registry.
|
| 125 |
-
|
| 126 |
-
User-supplied base_url / model_id override env-var defaults, which in turn
|
| 127 |
-
override the hardcoded registry values.
|
| 128 |
-
"""
|
| 129 |
-
entry = MODEL_REGISTRY.get(display_name)
|
| 130 |
-
if entry is None:
|
| 131 |
-
raise ValueError(f"Unknown model: {display_name}")
|
| 132 |
-
|
| 133 |
-
api_key = _resolve_key(entry["env_var"], user_key)
|
| 134 |
-
provider = entry["provider"]
|
| 135 |
-
|
| 136 |
-
# Resolve: user input > env var > registry default
|
| 137 |
-
default_base_url, default_model_id = get_model_defaults(display_name)
|
| 138 |
-
model_id = (user_model_id.strip() if user_model_id and user_model_id.strip() else "") or default_model_id
|
| 139 |
-
base_url = (user_base_url.strip() if user_base_url and user_base_url.strip() else "") or default_base_url or None
|
| 140 |
-
|
| 141 |
-
if provider in ("openai", "openai_compat"):
|
| 142 |
-
return _call_openai(model_id, prompt, api_key, base_url)
|
| 143 |
-
elif provider == "anthropic":
|
| 144 |
-
return _call_anthropic(model_id, prompt, api_key)
|
| 145 |
-
elif provider == "gemini":
|
| 146 |
-
return _call_gemini(model_id, prompt, api_key)
|
| 147 |
-
else:
|
| 148 |
-
raise ValueError(f"Unknown provider: {provider}")
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
def call_custom_endpoint(
|
| 152 |
-
base_url: str, model_name: str, prompt: str, api_key: str
|
| 153 |
-
) -> str:
|
| 154 |
-
"""Call a user-supplied Dify application endpoint (left column).
|
| 155 |
-
|
| 156 |
-
Dify API docs: https://docs.dify.ai/en/guides/application-publishing/developing-with-apis
|
| 157 |
-
|
| 158 |
-
base_url should be the Dify API base, e.g. https://api.dify.ai/v1
|
| 159 |
-
The endpoint called is {base_url}/chat-messages (for Chat apps).
|
| 160 |
-
"""
|
| 161 |
-
if not base_url or not base_url.strip():
|
| 162 |
-
raise ValueError("API endpoint URL is required for your Dify model.")
|
| 163 |
-
if not api_key or not api_key.strip():
|
| 164 |
-
raise ValueError("API Key (Secret Key) is required for Dify.")
|
| 165 |
-
|
| 166 |
-
url = base_url.strip().rstrip("/") + "/chat-messages"
|
| 167 |
-
headers = {
|
| 168 |
-
"Authorization": f"Bearer {api_key.strip()}",
|
| 169 |
-
"Content-Type": "application/json",
|
| 170 |
-
}
|
| 171 |
-
payload = {
|
| 172 |
-
"inputs": {},
|
| 173 |
-
"query": prompt,
|
| 174 |
-
"response_mode": "blocking",
|
| 175 |
-
"user": "llm-compare-user",
|
| 176 |
-
}
|
| 177 |
-
|
| 178 |
-
resp = requests.post(url, json=payload, headers=headers, timeout=120)
|
| 179 |
-
resp.raise_for_status()
|
| 180 |
-
data = resp.json()
|
| 181 |
-
answer = data.get("answer", "")
|
| 182 |
-
if not answer:
|
| 183 |
-
raise ValueError(f"Dify returned no answer. Full response: {data}")
|
| 184 |
-
return answer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
DELETED
|
@@ -1,5 +0,0 @@
|
|
| 1 |
-
gradio
|
| 2 |
-
openai
|
| 3 |
-
anthropic
|
| 4 |
-
google-genai
|
| 5 |
-
openpyxl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|