xujfcn commited on
Commit
eef2932
Β·
1 Parent(s): e91775a

Model comparison tutorial - speed, coding, reasoning benchmarks

Browse files
Files changed (1) hide show
  1. README.md +221 -3
README.md CHANGED
@@ -1,3 +1,221 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - tutorial
5
+ - crazyrouter
6
+ - model-comparison
7
+ - benchmark
8
+ - llm
9
+ - evaluation
10
+ language:
11
+ - en
12
+ - zh
13
+ ---
14
+
15
+ # βš–οΈ AI Model Comparison with Crazyrouter
16
+
17
+ > Compare GPT-4o vs Claude vs Gemini vs DeepSeek β€” same prompt, same API, side by side.
18
+
19
+ One of the biggest advantages of [Crazyrouter](https://crazyrouter.com) is the ability to test multiple models instantly. No separate accounts, no different SDKs. Just change the model name.
20
+
21
+ ---
22
+
23
+ ## Quick Comparison Script
24
+
25
+ ```python
26
+ from openai import OpenAI
27
+ import time
28
+
29
+ client = OpenAI(
30
+ base_url="https://crazyrouter.com/v1",
31
+ api_key="sk-your-crazyrouter-key"
32
+ )
33
+
34
+ MODELS = [
35
+ "gpt-4o",
36
+ "gpt-4o-mini",
37
+ "claude-sonnet-4-20250514",
38
+ "claude-haiku-3.5",
39
+ "gemini-2.0-flash",
40
+ "deepseek-chat",
41
+ "deepseek-reasoner",
42
+ ]
43
+
44
+ PROMPT = "Explain the difference between TCP and UDP in exactly 3 sentences."
45
+
46
+ print(f"Prompt: {PROMPT}\n")
47
+ print("=" * 60)
48
+
49
+ for model in MODELS:
50
+ try:
51
+ start = time.time()
52
+ response = client.chat.completions.create(
53
+ model=model,
54
+ messages=[{"role": "user", "content": PROMPT}],
55
+ max_tokens=200
56
+ )
57
+ elapsed = time.time() - start
58
+ content = response.choices[0].message.content
59
+ tokens = response.usage.total_tokens
60
+
61
+ print(f"\nπŸ€– {model}")
62
+ print(f"⏱️ {elapsed:.2f}s | πŸ“Š {tokens} tokens")
63
+ print(f"πŸ’¬ {content}")
64
+ print("-" * 60)
65
+ except Exception as e:
66
+ print(f"\n❌ {model}: {e}")
67
+ print("-" * 60)
68
+ ```
69
+
70
+ ---
71
+
72
+ ## Benchmark: Speed Test
73
+
74
+ ```python
75
+ import time
76
+ from openai import OpenAI
77
+
78
+ client = OpenAI(
79
+ base_url="https://crazyrouter.com/v1",
80
+ api_key="sk-your-crazyrouter-key"
81
+ )
82
+
83
+ def benchmark(model, prompt, runs=3):
84
+ times = []
85
+ for _ in range(runs):
86
+ start = time.time()
87
+ client.chat.completions.create(
88
+ model=model,
89
+ messages=[{"role": "user", "content": prompt}],
90
+ max_tokens=100
91
+ )
92
+ times.append(time.time() - start)
93
+ avg = sum(times) / len(times)
94
+ return avg
95
+
96
+ models = ["gpt-4o-mini", "claude-haiku-3.5", "gemini-2.0-flash", "deepseek-chat"]
97
+ prompt = "What is 2+2? Reply with just the number."
98
+
99
+ print("Speed Benchmark (avg of 3 runs)")
100
+ print("=" * 40)
101
+ for m in models:
102
+ avg = benchmark(m, prompt)
103
+ print(f"{m:30s} {avg:.2f}s")
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Coding Comparison
109
+
110
+ ```python
111
+ CODING_PROMPT = """Write a Python function that:
112
+ 1. Takes a list of integers
113
+ 2. Returns the longest increasing subsequence
114
+ 3. Include type hints and a docstring
115
+ """
116
+
117
+ CODING_MODELS = [
118
+ "gpt-4o",
119
+ "claude-sonnet-4-20250514",
120
+ "deepseek-chat",
121
+ "gemini-2.0-flash",
122
+ ]
123
+
124
+ for model in CODING_MODELS:
125
+ response = client.chat.completions.create(
126
+ model=model,
127
+ messages=[{"role": "user", "content": CODING_PROMPT}],
128
+ max_tokens=500
129
+ )
130
+ print(f"\n{'='*60}")
131
+ print(f"πŸ€– {model}")
132
+ print(f"{'='*60}")
133
+ print(response.choices[0].message.content)
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Reasoning Comparison
139
+
140
+ Test models that support chain-of-thought reasoning:
141
+
142
+ ```python
143
+ REASONING_PROMPT = """A farmer has 17 sheep. All but 9 die. How many sheep are left?
144
+ Think step by step."""
145
+
146
+ REASONING_MODELS = [
147
+ "gpt-4o",
148
+ "o3-mini",
149
+ "deepseek-reasoner",
150
+ "claude-sonnet-4-20250514",
151
+ ]
152
+
153
+ for model in REASONING_MODELS:
154
+ response = client.chat.completions.create(
155
+ model=model,
156
+ messages=[{"role": "user", "content": REASONING_PROMPT}],
157
+ max_tokens=300
158
+ )
159
+ print(f"\nπŸ€– {model}: {response.choices[0].message.content[:200]}")
160
+ ```
161
+
162
+ ---
163
+
164
+ ## Cost Comparison
165
+
166
+ ```python
167
+ # Approximate pricing per 1M tokens (input/output)
168
+ PRICING = {
169
+ "gpt-4o": {"input": 2.50, "output": 10.00},
170
+ "gpt-4o-mini": {"input": 0.15, "output": 0.60},
171
+ "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
172
+ "claude-haiku-3.5": {"input": 0.80, "output": 4.00},
173
+ "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
174
+ "deepseek-chat": {"input": 0.14, "output": 0.28},
175
+ }
176
+
177
+ def estimate_cost(model, input_tokens, output_tokens):
178
+ p = PRICING.get(model, {"input": 0, "output": 0})
179
+ return (input_tokens * p["input"] + output_tokens * p["output"]) / 1_000_000
180
+
181
+ # Example: 1000 requests, avg 500 input + 200 output tokens each
182
+ requests = 1000
183
+ input_tok = 500
184
+ output_tok = 200
185
+
186
+ print(f"Cost estimate for {requests} requests ({input_tok} in / {output_tok} out tokens each):\n")
187
+ for model, price in PRICING.items():
188
+ cost = requests * estimate_cost(model, input_tok, output_tok)
189
+ print(f" {model:30s} ${cost:.4f}")
190
+ ```
191
+
192
+ ---
193
+
194
+ ## When to Use Which Model
195
+
196
+ | Use Case | Recommended Model | Why |
197
+ |----------|------------------|-----|
198
+ | General chat | `gpt-4o-mini` | Fast, cheap, good quality |
199
+ | Complex analysis | `gpt-4o` or `claude-sonnet-4-20250514` | Best reasoning |
200
+ | Coding | `deepseek-chat` or `claude-sonnet-4-20250514` | Strong code generation |
201
+ | Long documents | `gemini-2.0-flash` | 1M token context |
202
+ | Math/Logic | `deepseek-reasoner` or `o3-mini` | Chain-of-thought |
203
+ | Budget tasks | `deepseek-chat` | $0.14/1M input |
204
+ | Speed critical | `gemini-2.0-flash` | Fastest response |
205
+
206
+ ---
207
+
208
+ ## Try It Live
209
+
210
+ πŸ‘‰ [Crazyrouter Demo on Hugging Face](https://huggingface.co/spaces/xujfcn/Crazyrouter-Demo) β€” switch models in real-time
211
+
212
+ ---
213
+
214
+ ## Links
215
+
216
+ - 🌐 [Crazyrouter](https://crazyrouter.com)
217
+ - πŸ“– [Getting Started](https://huggingface.co/xujfcn/Crazyrouter-Getting-Started)
218
+ - πŸ”— [LangChain Guide](https://huggingface.co/xujfcn/Crazyrouter-LangChain-Guide)
219
+ - πŸ’° [Pricing](https://huggingface.co/spaces/xujfcn/Crazyrouter-Pricing)
220
+ - πŸ’¬ [Telegram](https://t.me/crazyrouter)
221
+ - 🐦 [Twitter @metaviiii](https://twitter.com/metaviiii)