Spaces:

gr8monk3ys
/

model-arena

Sleeping

App Files Files Community

model-arena / README.md

gr8monk3ys

Upload folder using huggingface_hub

252cc7d verified 3 months ago

preview code

raw

history blame contribute delete

2.28 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: AI Model Arena
emoji: ⚔️
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.9.1
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
short_description: Compare AI models head-to-head and vote for the best

AI Model Arena

Compare AI models head-to-head! Test the same prompt across different models and vote for the best response.

Features

6 Top Open Models

Mistral-7B - Fast and efficient, great at reasoning and code
Llama-3.1-8B - Meta's latest with strong general capabilities
Qwen2.5-7B - Excellent at multilingual tasks, math, and coding
Phi-3-mini - Microsoft's compact powerhouse
Gemma-2-9B - Google's quality-focused instruction model
Zephyr-7B - Aligned for helpfulness and safety

Battle System

Run any two models against each other
See response times for each model
Vote for the better response
Track wins on the leaderboard

5 Test Categories

Creative Writing - Poetry, stories, creative prompts
Coding - Programming challenges and algorithms
Reasoning - Logic puzzles and math problems
Knowledge - Explanations and factual queries
Summarization - Condensing complex topics

How to Use

Enter a prompt or use an example from a category
Select two models to compare
Click "Start Battle" to generate responses
Read both responses and compare quality, accuracy, and style
Vote for the better response
Check the leaderboard to see which models are winning!

Example Battles

Category	Sample Prompt
Creative	Write a haiku about AI
Coding	Implement a prime number checker
Reasoning	Solve: Bat + Ball = $1.10, Bat costs $1 more...
Knowledge	Explain quantum entanglement simply

Why This Matters

Different models have different strengths:

Some are faster, some more accurate
Some excel at code, others at creative tasks
Testing helps you choose the right model for your needs

Technical Details

All models accessed via HuggingFace Inference API
Response times measured for comparison
Leaderboard persists during Space session

License

MIT

Author

Built by Lorenzo Scaturchio