Spaces:

daviddogukan
/

multi-llm-compare

Running

App Files Files Community

multi-llm-compare / README.md

daviddogukan

Update README with comprehensive documentation

f2d7a00 5 months ago

preview code

raw

history blame contribute delete

4.31 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: Multi Llm Compare
emoji: 🐠
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Compare the output of the models with price estimations

🤖 Multi-LLM Comparison Tool

Compare responses from multiple Large Language Models (LLMs) side-by-side with custom parameters, timing metrics, and cost estimates.

✨ Features

Multi-Provider Support: OpenAI, Anthropic, Google (Gemini), Cohere, and Mistral
Dynamic Model Selection: Add multiple models with a simple + button interface
Custom Parameters: Configure temperature, top_p, max_tokens, and provider-specific parameters
Performance Metrics: Track response time for each model
Cost Estimation: Calculate estimated cost per 1000 API calls
Parallel Execution: All models are queried simultaneously for faster results
CSV Export: Export comparison results for further analysis
Unique Model Selection: Prevents duplicate model configurations

🚀 Getting Started

Installation

pip install -r requirements.txt

Run Locally

python app.py

The application will launch in your browser at http://localhost:7860

📖 How to Use

1. Select Models

Choose a Provider (OpenAI, Anthropic, Google, etc.)
Select a Model from the dropdown
Configure Parameters:
- Temperature (0-2): Controls randomness
- Top P (0-1): Nucleus sampling parameter
- Max Tokens: Maximum response length
- Top K: (Model-specific) Number of top tokens to consider
- Frequency/Presence Penalty: (Model-specific) Token repetition control

2. Add Models

Click the ➕ Add Model button to add the configured model to your comparison
Repeat to add multiple models with different configurations
Each model must be unique (same model with different parameters is allowed)

3. Enter API Keys

Provide API keys for the providers you want to use
Keys are only required for the providers you've selected
API keys are not stored and are only used for the current session

4. Run Comparison

Enter your prompt in the text area
Click 🚀 Run Comparison to query all selected models
Results will appear in a table showing:
- Model name and parameters
- Response time
- Estimated cost per 1000 calls
- Model output

5. Export Results

Click 📥 Export to CSV to download the results
The CSV file includes all comparison data with timestamps

💰 Pricing Information

The tool uses current pricing (per 1M tokens) for cost estimation:

OpenAI

GPT-4o: $2.50 input / $10.00 output
GPT-4o-mini: $0.15 input / $0.60 output

Anthropic

Claude 3.5 Sonnet: $3.00 input / $15.00 output
Claude 3.5 Haiku: $0.80 input / $4.00 output

Google

Gemini 1.5 Pro: $1.25 input / $5.00 output
Gemini 1.5 Flash: $0.075 input / $0.30 output

Note: Prices are estimates and may vary. Check provider documentation for current rates.

🔑 API Keys

You need API keys from the providers you want to use:

OpenAI: https://platform.openai.com/api-keys
Anthropic: https://console.anthropic.com/
Google: https://aistudio.google.com/app/api-keys
Cohere: https://dashboard.cohere.com/api-keys
Mistral: https://console.mistral.ai/

📋 Supported Parameters by Provider

Parameter	OpenAI	Anthropic	Google	Cohere	Mistral
Temperature	✅	✅	✅	✅	✅
Top P	✅	✅	✅	✅	✅
Max Tokens	✅	✅	✅	✅	✅
Top K	❌	✅	✅	✅	❌
Frequency Penalty	✅	❌	❌	✅	❌
Presence Penalty	✅	❌	❌	✅	❌

🛠️ Technical Details

Framework: Gradio 4.0+
Async Support: All API calls are executed in parallel using asyncio
Error Handling: Graceful error messages for API failures
Token Estimation: Accurate for OpenAI/Anthropic, estimated for others

📝 License

MIT License - See LICENSE file for details

🤝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference