File size: 5,879 Bytes

---
license: mit
tags:
  - tutorial
  - crazyrouter
  - model-comparison
  - benchmark
  - llm
  - evaluation
language:
  - en
  - zh
---

# ⚖️ AI Model Comparison with Crazyrouter

> Compare GPT-4o vs Claude vs Gemini vs DeepSeek — same prompt, same API, side by side.

One of the biggest advantages of [Crazyrouter](https://crazyrouter.com/?utm_source=huggingface&utm_medium=tutorial&utm_campaign=dev_community) is the ability to test multiple models instantly. No separate accounts, no different SDKs. Just change the model name.

---

## Quick Comparison Script

```python
from openai import OpenAI
import time

client = OpenAI(
    base_url="https://crazyrouter.com/v1",
    api_key="sk-your-crazyrouter-key"
)

MODELS = [
    "gpt-4o",
    "gpt-4o-mini",
    "claude-sonnet-4-20250514",
    "claude-haiku-3.5",
    "gemini-2.0-flash",
    "deepseek-chat",
    "deepseek-reasoner",
]

PROMPT = "Explain the difference between TCP and UDP in exactly 3 sentences."

print(f"Prompt: {PROMPT}\n")
print("=" * 60)

for model in MODELS:
    try:
        start = time.time()
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": PROMPT}],
            max_tokens=200
        )
        elapsed = time.time() - start
        content = response.choices[0].message.content
        tokens = response.usage.total_tokens

        print(f"\n🤖 {model}")
        print(f"⏱️  {elapsed:.2f}s | 📊 {tokens} tokens")
        print(f"💬 {content}")
        print("-" * 60)
    except Exception as e:
        print(f"\n❌ {model}: {e}")
        print("-" * 60)
```

---

## Benchmark: Speed Test

```python
import time
from openai import OpenAI

client = OpenAI(
    base_url="https://crazyrouter.com/v1",
    api_key="sk-your-crazyrouter-key"
)

def benchmark(model, prompt, runs=3):
    times = []
    for _ in range(runs):
        start = time.time()
        client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=100
        )
        times.append(time.time() - start)
    avg = sum(times) / len(times)
    return avg

models = ["gpt-4o-mini", "claude-haiku-3.5", "gemini-2.0-flash", "deepseek-chat"]
prompt = "What is 2+2? Reply with just the number."

print("Speed Benchmark (avg of 3 runs)")
print("=" * 40)
for m in models:
    avg = benchmark(m, prompt)
    print(f"{m:30s} {avg:.2f}s")
```

---

## Coding Comparison

```python
CODING_PROMPT = """Write a Python function that:
1. Takes a list of integers
2. Returns the longest increasing subsequence
3. Include type hints and a docstring
"""

CODING_MODELS = [
    "gpt-4o",
    "claude-sonnet-4-20250514",
    "deepseek-chat",
    "gemini-2.0-flash",
]

for model in CODING_MODELS:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": CODING_PROMPT}],
        max_tokens=500
    )
    print(f"\n{'='*60}")
    print(f"🤖 {model}")
    print(f"{'='*60}")
    print(response.choices[0].message.content)
```

---

## Reasoning Comparison

Test models that support chain-of-thought reasoning:

```python
REASONING_PROMPT = """A farmer has 17 sheep. All but 9 die. How many sheep are left?
Think step by step."""

REASONING_MODELS = [
    "gpt-4o",
    "o3-mini",
    "deepseek-reasoner",
    "claude-sonnet-4-20250514",
]

for model in REASONING_MODELS:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": REASONING_PROMPT}],
        max_tokens=300
    )
    print(f"\n🤖 {model}: {response.choices[0].message.content[:200]}")
```

---

## Cost Comparison

```python
# Approximate pricing per 1M tokens (input/output)
PRICING = {
    "gpt-4o":           {"input": 2.50, "output": 10.00},
    "gpt-4o-mini":      {"input": 0.15, "output": 0.60},
    "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
    "claude-haiku-3.5": {"input": 0.80, "output": 4.00},
    "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
    "deepseek-chat":    {"input": 0.14, "output": 0.28},
}

def estimate_cost(model, input_tokens, output_tokens):
    p = PRICING.get(model, {"input": 0, "output": 0})
    return (input_tokens * p["input"] + output_tokens * p["output"]) / 1_000_000

# Example: 1000 requests, avg 500 input + 200 output tokens each
requests = 1000
input_tok = 500
output_tok = 200

print(f"Cost estimate for {requests} requests ({input_tok} in / {output_tok} out tokens each):\n")
for model, price in PRICING.items():
    cost = requests * estimate_cost(model, input_tok, output_tok)
    print(f"  {model:30s} ${cost:.4f}")
```

---

## When to Use Which Model

| Use Case | Recommended Model | Why |
|----------|------------------|-----|
| General chat | `gpt-4o-mini` | Fast, cheap, good quality |
| Complex analysis | `gpt-4o` or `claude-sonnet-4-20250514` | Best reasoning |
| Coding | `deepseek-chat` or `claude-sonnet-4-20250514` | Strong code generation |
| Long documents | `gemini-2.0-flash` | 1M token context |
| Math/Logic | `deepseek-reasoner` or `o3-mini` | Chain-of-thought |
| Budget tasks | `deepseek-chat` | $0.14/1M input |
| Speed critical | `gemini-2.0-flash` | Fastest response |

---

## Try It Live

👉 [Crazyrouter Demo on Hugging Face](https://huggingface.co/spaces/xujfcn/Crazyrouter-Demo) — switch models in real-time

---

## Links

- 🌐 [Crazyrouter](https://crazyrouter.com/?utm_source=huggingface&utm_medium=tutorial&utm_campaign=dev_community)
- 📖 [Getting Started](https://huggingface.co/xujfcn/Crazyrouter-Getting-Started)
- 🔗 [LangChain Guide](https://huggingface.co/xujfcn/Crazyrouter-LangChain-Guide)
- 💰 [Pricing](https://huggingface.co/spaces/xujfcn/Crazyrouter-Pricing)
- 💬 [Telegram](https://t.me/crazyrouter)
- 🐦 [Twitter @metaviiii](https://twitter.com/metaviiii)