multi-llm-compare / README.md
daviddogukan's picture
Update README with comprehensive documentation
f2d7a00

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: Multi Llm Compare
emoji: 🐠
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Compare the output of the models with price estimations

πŸ€– Multi-LLM Comparison Tool

Compare responses from multiple Large Language Models (LLMs) side-by-side with custom parameters, timing metrics, and cost estimates.

✨ Features

  • Multi-Provider Support: OpenAI, Anthropic, Google (Gemini), Cohere, and Mistral
  • Dynamic Model Selection: Add multiple models with a simple + button interface
  • Custom Parameters: Configure temperature, top_p, max_tokens, and provider-specific parameters
  • Performance Metrics: Track response time for each model
  • Cost Estimation: Calculate estimated cost per 1000 API calls
  • Parallel Execution: All models are queried simultaneously for faster results
  • CSV Export: Export comparison results for further analysis
  • Unique Model Selection: Prevents duplicate model configurations

πŸš€ Getting Started

Installation

pip install -r requirements.txt

Run Locally

python app.py

The application will launch in your browser at http://localhost:7860

πŸ“– How to Use

1. Select Models

  • Choose a Provider (OpenAI, Anthropic, Google, etc.)
  • Select a Model from the dropdown
  • Configure Parameters:
    • Temperature (0-2): Controls randomness
    • Top P (0-1): Nucleus sampling parameter
    • Max Tokens: Maximum response length
    • Top K: (Model-specific) Number of top tokens to consider
    • Frequency/Presence Penalty: (Model-specific) Token repetition control

2. Add Models

  • Click the βž• Add Model button to add the configured model to your comparison
  • Repeat to add multiple models with different configurations
  • Each model must be unique (same model with different parameters is allowed)

3. Enter API Keys

  • Provide API keys for the providers you want to use
  • Keys are only required for the providers you've selected
  • API keys are not stored and are only used for the current session

4. Run Comparison

  • Enter your prompt in the text area
  • Click πŸš€ Run Comparison to query all selected models
  • Results will appear in a table showing:
    • Model name and parameters
    • Response time
    • Estimated cost per 1000 calls
    • Model output

5. Export Results

  • Click πŸ“₯ Export to CSV to download the results
  • The CSV file includes all comparison data with timestamps

πŸ’° Pricing Information

The tool uses current pricing (per 1M tokens) for cost estimation:

OpenAI

  • GPT-4o: $2.50 input / $10.00 output
  • GPT-4o-mini: $0.15 input / $0.60 output

Anthropic

  • Claude 3.5 Sonnet: $3.00 input / $15.00 output
  • Claude 3.5 Haiku: $0.80 input / $4.00 output

Google

  • Gemini 1.5 Pro: $1.25 input / $5.00 output
  • Gemini 1.5 Flash: $0.075 input / $0.30 output

Note: Prices are estimates and may vary. Check provider documentation for current rates.

πŸ”‘ API Keys

You need API keys from the providers you want to use:

πŸ“‹ Supported Parameters by Provider

Parameter OpenAI Anthropic Google Cohere Mistral
Temperature βœ… βœ… βœ… βœ… βœ…
Top P βœ… βœ… βœ… βœ… βœ…
Max Tokens βœ… βœ… βœ… βœ… βœ…
Top K ❌ βœ… βœ… βœ… ❌
Frequency Penalty βœ… ❌ ❌ βœ… ❌
Presence Penalty βœ… ❌ ❌ βœ… ❌

πŸ› οΈ Technical Details

  • Framework: Gradio 4.0+
  • Async Support: All API calls are executed in parallel using asyncio
  • Error Handling: Graceful error messages for API failures
  • Token Estimation: Accurate for OpenAI/Anthropic, estimated for others

πŸ“ License

MIT License - See LICENSE file for details

🀝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests.


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference