HassanB4 commited on
Commit
c2ed3cb
Β·
verified Β·
1 Parent(s): b2e646c

Add HTML landing page with UI, stats, leaderboard table

Browse files
Files changed (1) hide show
  1. index.html +776 -0
index.html ADDED
@@ -0,0 +1,776 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>Haakkim β€” Arabic LLM Arena</title>
7
+ <style>
8
+ :root {
9
+ --bg: #0f1117;
10
+ --surface: #1a1d27;
11
+ --surface2: #222536;
12
+ --border: #2e3149;
13
+ --accent: #6366f1;
14
+ --accent2: #8b5cf6;
15
+ --gold: #f59e0b;
16
+ --text: #e2e8f0;
17
+ --muted: #94a3b8;
18
+ --green: #10b981;
19
+ --radius: 12px;
20
+ }
21
+
22
+ * { box-sizing: border-box; margin: 0; padding: 0; }
23
+
24
+ body {
25
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
26
+ background: var(--bg);
27
+ color: var(--text);
28
+ line-height: 1.6;
29
+ min-height: 100vh;
30
+ }
31
+
32
+ /* ── Hero ────────────────────────────────── */
33
+ .hero {
34
+ background: linear-gradient(135deg, #0f1117 0%, #1a1037 50%, #0f1117 100%);
35
+ border-bottom: 1px solid var(--border);
36
+ padding: 64px 24px 48px;
37
+ text-align: center;
38
+ position: relative;
39
+ overflow: hidden;
40
+ }
41
+ .hero::before {
42
+ content: '';
43
+ position: absolute;
44
+ inset: 0;
45
+ background: radial-gradient(ellipse 80% 50% at 50% 0%, rgba(99,102,241,0.15) 0%, transparent 70%);
46
+ pointer-events: none;
47
+ }
48
+ .hero-logo {
49
+ display: flex;
50
+ align-items: center;
51
+ justify-content: center;
52
+ gap: 16px;
53
+ margin-bottom: 16px;
54
+ }
55
+ .hero-logo img {
56
+ height: 60px;
57
+ object-fit: contain;
58
+ }
59
+ .hero h1 {
60
+ font-size: clamp(2rem, 5vw, 3.2rem);
61
+ font-weight: 800;
62
+ background: linear-gradient(135deg, #e2e8f0 0%, #a5b4fc 60%, #818cf8 100%);
63
+ -webkit-background-clip: text;
64
+ -webkit-text-fill-color: transparent;
65
+ background-clip: text;
66
+ margin-bottom: 8px;
67
+ }
68
+ .hero-ar {
69
+ font-size: 2.2rem;
70
+ font-weight: 700;
71
+ color: var(--accent);
72
+ direction: rtl;
73
+ margin-bottom: 16px;
74
+ }
75
+ .hero p {
76
+ font-size: 1.15rem;
77
+ color: var(--muted);
78
+ max-width: 600px;
79
+ margin: 0 auto 28px;
80
+ }
81
+ .hero-links {
82
+ display: flex;
83
+ gap: 12px;
84
+ justify-content: center;
85
+ flex-wrap: wrap;
86
+ }
87
+ .btn {
88
+ display: inline-flex;
89
+ align-items: center;
90
+ gap: 8px;
91
+ padding: 10px 22px;
92
+ border-radius: 8px;
93
+ font-size: 0.95rem;
94
+ font-weight: 600;
95
+ text-decoration: none;
96
+ transition: all 0.2s;
97
+ cursor: pointer;
98
+ }
99
+ .btn-primary {
100
+ background: var(--accent);
101
+ color: #fff;
102
+ }
103
+ .btn-primary:hover { background: #5457e0; transform: translateY(-1px); }
104
+ .btn-outline {
105
+ background: transparent;
106
+ color: var(--text);
107
+ border: 1px solid var(--border);
108
+ }
109
+ .btn-outline:hover { border-color: var(--accent); color: var(--accent); transform: translateY(-1px); }
110
+ .btn-dataset {
111
+ background: #1c7a4d;
112
+ color: #fff;
113
+ }
114
+ .btn-dataset:hover { background: #155f3d; transform: translateY(-1px); }
115
+
116
+ /* ── Container ───────────────────────────── */
117
+ .container { max-width: 1100px; margin: 0 auto; padding: 0 24px; }
118
+ section { padding: 56px 0; }
119
+ section + section { border-top: 1px solid var(--border); }
120
+
121
+ .section-title {
122
+ font-size: 1.5rem;
123
+ font-weight: 700;
124
+ margin-bottom: 8px;
125
+ color: var(--text);
126
+ }
127
+ .section-sub {
128
+ font-size: 0.95rem;
129
+ color: var(--muted);
130
+ margin-bottom: 32px;
131
+ }
132
+
133
+ /* ── Stats grid ──────────────────────────── */
134
+ .stats-grid {
135
+ display: grid;
136
+ grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
137
+ gap: 16px;
138
+ }
139
+ .stat-card {
140
+ background: var(--surface);
141
+ border: 1px solid var(--border);
142
+ border-radius: var(--radius);
143
+ padding: 24px 20px;
144
+ text-align: center;
145
+ transition: border-color 0.2s;
146
+ }
147
+ .stat-card:hover { border-color: var(--accent); }
148
+ .stat-value {
149
+ font-size: 2.2rem;
150
+ font-weight: 800;
151
+ color: var(--accent);
152
+ line-height: 1;
153
+ margin-bottom: 6px;
154
+ }
155
+ .stat-label {
156
+ font-size: 0.82rem;
157
+ color: var(--muted);
158
+ text-transform: uppercase;
159
+ letter-spacing: 0.05em;
160
+ }
161
+
162
+ /* ── Leaderboard table ───────────────────── */
163
+ .table-wrap {
164
+ overflow-x: auto;
165
+ border-radius: var(--radius);
166
+ border: 1px solid var(--border);
167
+ }
168
+ table {
169
+ width: 100%;
170
+ border-collapse: collapse;
171
+ font-size: 0.9rem;
172
+ }
173
+ thead th {
174
+ background: var(--surface2);
175
+ color: var(--muted);
176
+ font-size: 0.78rem;
177
+ text-transform: uppercase;
178
+ letter-spacing: 0.06em;
179
+ padding: 12px 16px;
180
+ text-align: left;
181
+ white-space: nowrap;
182
+ }
183
+ tbody tr {
184
+ border-top: 1px solid var(--border);
185
+ transition: background 0.15s;
186
+ }
187
+ tbody tr:hover { background: var(--surface2); }
188
+ td {
189
+ padding: 12px 16px;
190
+ vertical-align: middle;
191
+ }
192
+ .rank-badge {
193
+ display: inline-flex;
194
+ align-items: center;
195
+ justify-content: center;
196
+ width: 28px;
197
+ height: 28px;
198
+ border-radius: 50%;
199
+ font-weight: 700;
200
+ font-size: 0.85rem;
201
+ }
202
+ .rank-1 { background: linear-gradient(135deg, #f59e0b, #fbbf24); color: #000; }
203
+ .rank-2 { background: linear-gradient(135deg, #94a3b8, #cbd5e1); color: #000; }
204
+ .rank-3 { background: linear-gradient(135deg, #cd7c2f, #e09050); color: #000; }
205
+ .rank-other { background: var(--surface2); color: var(--muted); }
206
+ .model-name {
207
+ font-family: 'SFMono-Regular', Consolas, monospace;
208
+ font-size: 0.83rem;
209
+ color: var(--text);
210
+ }
211
+ .model-org {
212
+ color: var(--muted);
213
+ font-size: 0.78rem;
214
+ margin-right: 4px;
215
+ }
216
+ .score-val {
217
+ font-weight: 700;
218
+ color: var(--accent);
219
+ font-variant-numeric: tabular-nums;
220
+ }
221
+ .ci-val {
222
+ color: var(--muted);
223
+ font-size: 0.78rem;
224
+ font-variant-numeric: tabular-nums;
225
+ }
226
+ .battles-badge {
227
+ display: inline-flex;
228
+ align-items: center;
229
+ background: var(--surface2);
230
+ border-radius: 99px;
231
+ padding: 2px 10px;
232
+ font-size: 0.78rem;
233
+ color: var(--muted);
234
+ }
235
+
236
+ /* ── Info cards ──────────────────────────── */
237
+ .cards-grid {
238
+ display: grid;
239
+ grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
240
+ gap: 20px;
241
+ }
242
+ .card {
243
+ background: var(--surface);
244
+ border: 1px solid var(--border);
245
+ border-radius: var(--radius);
246
+ padding: 24px;
247
+ transition: border-color 0.2s, transform 0.2s;
248
+ }
249
+ .card:hover { border-color: var(--accent); transform: translateY(-2px); }
250
+ .card-icon {
251
+ font-size: 1.8rem;
252
+ margin-bottom: 12px;
253
+ }
254
+ .card h3 {
255
+ font-size: 1rem;
256
+ font-weight: 700;
257
+ margin-bottom: 8px;
258
+ }
259
+ .card p {
260
+ font-size: 0.88rem;
261
+ color: var(--muted);
262
+ }
263
+ .card-tag {
264
+ display: inline-block;
265
+ margin-top: 10px;
266
+ padding: 3px 10px;
267
+ border-radius: 99px;
268
+ font-size: 0.75rem;
269
+ font-weight: 600;
270
+ background: rgba(16,185,129,0.15);
271
+ color: var(--green);
272
+ }
273
+ .card-tag-no {
274
+ background: rgba(100,116,139,0.15);
275
+ color: var(--muted);
276
+ }
277
+
278
+ /* ── Figure images ───────────────────────── */
279
+ .figures-grid {
280
+ display: grid;
281
+ grid-template-columns: repeat(auto-fit, minmax(420px, 1fr));
282
+ gap: 20px;
283
+ }
284
+ .figure-card {
285
+ background: var(--surface);
286
+ border: 1px solid var(--border);
287
+ border-radius: var(--radius);
288
+ overflow: hidden;
289
+ }
290
+ .figure-card img {
291
+ width: 100%;
292
+ display: block;
293
+ background: #fff;
294
+ }
295
+ .figure-caption {
296
+ padding: 12px 16px;
297
+ font-size: 0.82rem;
298
+ color: var(--muted);
299
+ text-align: center;
300
+ }
301
+
302
+ /* ── Dialect pills ───────────────────────── */
303
+ .dialect-list {
304
+ display: flex;
305
+ flex-wrap: wrap;
306
+ gap: 10px;
307
+ margin-top: 8px;
308
+ }
309
+ .dialect-pill {
310
+ background: var(--surface2);
311
+ border: 1px solid var(--border);
312
+ border-radius: 99px;
313
+ padding: 6px 14px;
314
+ font-size: 0.82rem;
315
+ display: flex;
316
+ align-items: center;
317
+ gap: 6px;
318
+ }
319
+ .dialect-pill .bar {
320
+ height: 4px;
321
+ border-radius: 99px;
322
+ background: var(--accent);
323
+ min-width: 20px;
324
+ }
325
+
326
+ /* ── Scoring methodology ─────────────────── */
327
+ .method-steps {
328
+ display: grid;
329
+ grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
330
+ gap: 16px;
331
+ }
332
+ .step {
333
+ background: var(--surface);
334
+ border: 1px solid var(--border);
335
+ border-radius: var(--radius);
336
+ padding: 20px;
337
+ }
338
+ .step-num {
339
+ width: 32px;
340
+ height: 32px;
341
+ border-radius: 50%;
342
+ background: var(--accent);
343
+ color: #fff;
344
+ font-weight: 800;
345
+ font-size: 0.9rem;
346
+ display: flex;
347
+ align-items: center;
348
+ justify-content: center;
349
+ margin-bottom: 12px;
350
+ }
351
+ .step h4 {
352
+ font-size: 0.92rem;
353
+ font-weight: 700;
354
+ margin-bottom: 6px;
355
+ }
356
+ .step p {
357
+ font-size: 0.82rem;
358
+ color: var(--muted);
359
+ }
360
+
361
+ /* ── Team ────────────────────────────────── */
362
+ .team-grid {
363
+ display: grid;
364
+ grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
365
+ gap: 20px;
366
+ }
367
+ .team-card {
368
+ background: var(--surface);
369
+ border: 1px solid var(--border);
370
+ border-radius: var(--radius);
371
+ padding: 24px;
372
+ text-align: center;
373
+ transition: border-color 0.2s;
374
+ }
375
+ .team-card:hover { border-color: var(--accent); }
376
+ .team-avatar {
377
+ width: 56px;
378
+ height: 56px;
379
+ border-radius: 50%;
380
+ background: linear-gradient(135deg, var(--accent), var(--accent2));
381
+ display: flex;
382
+ align-items: center;
383
+ justify-content: center;
384
+ font-size: 1.4rem;
385
+ font-weight: 800;
386
+ color: #fff;
387
+ margin: 0 auto 14px;
388
+ }
389
+ .team-name { font-weight: 700; font-size: 1rem; margin-bottom: 4px; }
390
+ .team-role { font-size: 0.82rem; color: var(--muted); margin-bottom: 8px; }
391
+ .team-hf {
392
+ display: inline-flex;
393
+ align-items: center;
394
+ gap: 5px;
395
+ font-size: 0.78rem;
396
+ color: var(--accent);
397
+ text-decoration: none;
398
+ }
399
+ .team-hf:hover { text-decoration: underline; }
400
+
401
+ /* ── Dataset callout ─────────────────────── */
402
+ .dataset-callout {
403
+ background: linear-gradient(135deg, rgba(16,185,129,0.08), rgba(99,102,241,0.08));
404
+ border: 1px solid rgba(16,185,129,0.3);
405
+ border-radius: var(--radius);
406
+ padding: 28px 32px;
407
+ display: flex;
408
+ align-items: center;
409
+ gap: 24px;
410
+ flex-wrap: wrap;
411
+ }
412
+ .dataset-callout .ds-icon { font-size: 2.5rem; }
413
+ .dataset-callout h3 { font-size: 1.1rem; font-weight: 700; margin-bottom: 4px; }
414
+ .dataset-callout p { font-size: 0.88rem; color: var(--muted); }
415
+
416
+ /* ── Score note ──────────────────────────── */
417
+ .note-box {
418
+ background: rgba(99,102,241,0.08);
419
+ border: 1px solid rgba(99,102,241,0.25);
420
+ border-radius: 8px;
421
+ padding: 14px 18px;
422
+ font-size: 0.85rem;
423
+ color: var(--muted);
424
+ margin-top: 20px;
425
+ }
426
+ .note-box strong { color: var(--text); }
427
+
428
+ /* ── Footer ──────────────────────────────── */
429
+ footer {
430
+ border-top: 1px solid var(--border);
431
+ padding: 32px 24px;
432
+ text-align: center;
433
+ font-size: 0.82rem;
434
+ color: var(--muted);
435
+ }
436
+ footer a { color: var(--accent); text-decoration: none; }
437
+ footer a:hover { text-decoration: underline; }
438
+
439
+ @media (max-width: 600px) {
440
+ .hero { padding: 40px 16px 32px; }
441
+ .figures-grid { grid-template-columns: 1fr; }
442
+ .dataset-callout { flex-direction: column; gap: 12px; }
443
+ }
444
+ </style>
445
+ </head>
446
+ <body>
447
+
448
+ <!-- ═══════════════════════════════════════════════════
449
+ HERO
450
+ ════════════════════════════════════════════════════ -->
451
+ <div class="hero">
452
+ <div class="hero-logo">
453
+ <img src="haakkim-logo-withname.png" alt="Haakkim" onerror="this.style.display='none'" />
454
+ </div>
455
+ <div class="hero-ar">Ψ­ΩŽΩƒΩΩ‘Ω…</div>
456
+ <p>An open arena-style human preference evaluation platform for Arabic large language models β€” built from the ground up for Arabic.</p>
457
+ <div class="hero-links">
458
+ <a href="https://haakkim.tech" class="btn btn-primary">
459
+ 🌐 Live Platform
460
+ </a>
461
+ <a href="https://haakkim.tech/#leaderboard" class="btn btn-outline">
462
+ πŸ† Leaderboard
463
+ </a>
464
+ <a href="https://huggingface.co/datasets/Haakkim/Haakkim-1.0v" class="btn btn-dataset">
465
+ πŸ“¦ Dataset v1.0
466
+ </a>
467
+ </div>
468
+ </div>
469
+
470
+ <div class="container">
471
+
472
+ <!-- ═══════════════════════════════════════════════════
473
+ SNAPSHOT STATS
474
+ ════════════════════════════════════════════════════ -->
475
+ <section>
476
+ <div class="section-title">Current Snapshot β€” v1.0</div>
477
+ <div class="section-sub">Statistics from the first public release of Haakkim battle data</div>
478
+ <div class="stats-grid">
479
+ <div class="stat-card">
480
+ <div class="stat-value">1,273</div>
481
+ <div class="stat-label">Total Battles</div>
482
+ </div>
483
+ <div class="stat-card">
484
+ <div class="stat-value">831</div>
485
+ <div class="stat-label">BT-Ranked Battles</div>
486
+ </div>
487
+ <div class="stat-card">
488
+ <div class="stat-value">67</div>
489
+ <div class="stat-label">Models Ranked</div>
490
+ </div>
491
+ <div class="stat-card">
492
+ <div class="stat-value">11</div>
493
+ <div class="stat-label">Arabic Dialects</div>
494
+ </div>
495
+ <div class="stat-card">
496
+ <div class="stat-value">465</div>
497
+ <div class="stat-label">ESS (Clamped)</div>
498
+ </div>
499
+ <div class="stat-card">
500
+ <div class="stat-value">0.35</div>
501
+ <div class="stat-label">Graph Density</div>
502
+ </div>
503
+ </div>
504
+ </section>
505
+
506
+ <!-- ═══════════════════════════════════════════════════
507
+ MSA LEADERBOARD TOP 10
508
+ ════════════════════════════════════════════════════ -->
509
+ <section>
510
+ <div class="section-title">MSA Leaderboard β€” Top 10</div>
511
+ <div class="section-sub">Bradley–Terry scores (1000-centered log-odds). Full 67-model leaderboard at <a href="https://haakkim.tech/#leaderboard" style="color:var(--accent)">haakkim.tech</a></div>
512
+ <div class="table-wrap">
513
+ <table>
514
+ <thead>
515
+ <tr>
516
+ <th>Rank</th>
517
+ <th>Model</th>
518
+ <th>BT Score</th>
519
+ <th>95% CI</th>
520
+ <th>Battles</th>
521
+ </tr>
522
+ </thead>
523
+ <tbody>
524
+ <tr>
525
+ <td><span class="rank-badge rank-1">1</span></td>
526
+ <td><span class="model-org">mistralai/</span><span class="model-name">ministral-3b-2512</span></td>
527
+ <td><span class="score-val">1001.75</span></td>
528
+ <td><span class="ci-val">[1001.20, 1002.93]</span></td>
529
+ <td><span class="battles-badge">40</span></td>
530
+ </tr>
531
+ <tr>
532
+ <td><span class="rank-badge rank-2">2</span></td>
533
+ <td><span class="model-org">mistralai/</span><span class="model-name">ministral-8b-2512</span></td>
534
+ <td><span class="score-val">1001.61</span></td>
535
+ <td><span class="ci-val">[1000.72, 1002.97]</span></td>
536
+ <td><span class="battles-badge">43</span></td>
537
+ </tr>
538
+ <tr>
539
+ <td><span class="rank-badge rank-3">3</span></td>
540
+ <td><span class="model-org">Qwen/</span><span class="model-name">Qwen3-235B-A22B-Thinking-2507</span></td>
541
+ <td><span class="score-val">1001.21</span></td>
542
+ <td><span class="ci-val">[1000.47, 1002.00]</span></td>
543
+ <td><span class="battles-badge">38</span></td>
544
+ </tr>
545
+ <tr>
546
+ <td><span class="rank-badge rank-other">4</span></td>
547
+ <td><span class="model-org">Qwen/</span><span class="model-name">Qwen3-30B-A3B-Instruct-2507</span></td>
548
+ <td><span class="score-val">1001.14</span></td>
549
+ <td><span class="ci-val">[999.96, 1002.83]</span></td>
550
+ <td><span class="battles-badge">31</span></td>
551
+ </tr>
552
+ <tr>
553
+ <td><span class="rank-badge rank-other">5</span></td>
554
+ <td><span class="model-org">deepseek/</span><span class="model-name">deepseek-v3.2-exp</span></td>
555
+ <td><span class="score-val">1001.13</span></td>
556
+ <td><span class="ci-val">[1000.27, 1002.16]</span></td>
557
+ <td><span class="battles-badge">38</span></td>
558
+ </tr>
559
+ <tr>
560
+ <td><span class="rank-badge rank-other">6</span></td>
561
+ <td><span class="model-org">deepseek/</span><span class="model-name">deepseek-v3.1</span></td>
562
+ <td><span class="score-val">1000.99</span></td>
563
+ <td><span class="ci-val">[999.81, 1002.07]</span></td>
564
+ <td><span class="battles-badge">29</span></td>
565
+ </tr>
566
+ <tr>
567
+ <td><span class="rank-badge rank-other">7</span></td>
568
+ <td><span class="model-org">Qwen/</span><span class="model-name">Qwen3-235B-A22B-Instruct-2507</span></td>
569
+ <td><span class="score-val">1000.98</span></td>
570
+ <td><span class="ci-val">[1000.12, 1002.08]</span></td>
571
+ <td><span class="battles-badge">39</span></td>
572
+ </tr>
573
+ <tr>
574
+ <td><span class="rank-badge rank-other">8</span></td>
575
+ <td><span class="model-org">deepseek/</span><span class="model-name">deepseek-r1-0528</span></td>
576
+ <td><span class="score-val">1000.93</span></td>
577
+ <td><span class="ci-val">[1000.10, 1002.14]</span></td>
578
+ <td><span class="battles-badge">38</span></td>
579
+ </tr>
580
+ <tr>
581
+ <td><span class="rank-badge rank-other">9</span></td>
582
+ <td><span class="model-org">openai/</span><span class="model-name">gpt-oss-120b</span></td>
583
+ <td><span class="score-val">1000.93</span></td>
584
+ <td><span class="ci-val">[1000.04, 1002.58]</span></td>
585
+ <td><span class="battles-badge">25</span></td>
586
+ </tr>
587
+ <tr>
588
+ <td><span class="rank-badge rank-other">10</span></td>
589
+ <td><span class="model-org">deepseek/</span><span class="model-name">deepseek-v3.2</span></td>
590
+ <td><span class="score-val">1000.89</span></td>
591
+ <td><span class="ci-val">[999.86, 1002.25]</span></td>
592
+ <td><span class="battles-badge">31</span></td>
593
+ </tr>
594
+ </tbody>
595
+ </table>
596
+ </div>
597
+ <div class="note-box">
598
+ <strong>Score scale:</strong> Haakkim uses unscaled log-odds units centered at 1000 β€” a 1-point gap corresponds to win odds of eΒΉ β‰ˆ 2.7:1. This produces a ~4-point spread across 67 models. Chatbot Arena-style Elo scaling (Γ—173.7) would show the same ranking with hundreds-of-points spreads; both conventions encode identical win probabilities.
599
+ </div>
600
+ </section>
601
+
602
+ <!-- ═══════════════════════════════════════════════════
603
+ FIGURES
604
+ ════════════════════════════════════════════════════ -->
605
+ <section>
606
+ <div class="section-title">Analysis Figures</div>
607
+ <div class="section-sub">Regenerated from the live database β€” v1.0 snapshot</div>
608
+ <div class="figures-grid">
609
+ <div class="figure-card">
610
+ <img src="bt_leaderboard_ci_top20.png" alt="BT Leaderboard Top 20 with Confidence Intervals" />
611
+ <div class="figure-caption">BT Leaderboard β€” Top 20 models with 95% bootstrap confidence intervals</div>
612
+ </div>
613
+ <div class="figure-card">
614
+ <img src="dialect_distribution.png" alt="Dialect Distribution" />
615
+ <div class="figure-caption">Battle distribution across 11 Arabic dialect varieties</div>
616
+ </div>
617
+ <div class="figure-card">
618
+ <img src="vote_outcome_distribution_ranked.png" alt="Vote Outcome Distribution" />
619
+ <div class="figure-caption">Vote outcome distribution for ranked arena battles</div>
620
+ </div>
621
+ <div class="figure-card">
622
+ <img src="haakkim_overview.png" alt="Platform Overview" />
623
+ <div class="figure-caption">Haakkim platform overview and evaluation pipeline</div>
624
+ </div>
625
+ </div>
626
+ </section>
627
+
628
+ <!-- ═══════════════════════════════════════════════════
629
+ DIALECT COVERAGE
630
+ ════════════════════════════════════════════════════ -->
631
+ <section>
632
+ <div class="section-title">Arabic Dialect Coverage</div>
633
+ <div class="section-sub">11 varieties β€” from Modern Standard Arabic to regional dialects across the Arab world</div>
634
+ <div class="dialect-list">
635
+ <div class="dialect-pill">MSA <div class="bar" style="width:80px;background:#6366f1"></div> 77.5%</div>
636
+ <div class="dialect-pill">Tunisian <div class="bar" style="width:22px;background:#8b5cf6"></div> 9.0%</div>
637
+ <div class="dialect-pill">Saudi <div class="bar" style="width:16px;background:#a78bfa"></div> 6.5%</div>
638
+ <div class="dialect-pill">Egyptian <div class="bar" style="width:9px;background:#c4b5fd"></div> 3.5%</div>
639
+ <div class="dialect-pill">Levantine <div class="bar" style="width:6px"></div> 1.7%</div>
640
+ <div class="dialect-pill">Sudanese <div class="bar" style="width:4px"></div> 0.9%</div>
641
+ <div class="dialect-pill">Omani <div class="bar" style="width:3px"></div> 0.4%</div>
642
+ <div class="dialect-pill">Iraqi <div class="bar" style="width:2px"></div> 0.2%</div>
643
+ <div class="dialect-pill">Moroccan <div class="bar" style="width:2px"></div> &lt;0.1%</div>
644
+ <div class="dialect-pill">Libyan <div class="bar" style="width:2px"></div> &lt;0.1%</div>
645
+ <div class="dialect-pill">Algerian <div class="bar" style="width:2px"></div> &lt;0.1%</div>
646
+ </div>
647
+ </section>
648
+
649
+ <!-- ═══════════════════════════════════════════════════
650
+ EVALUATION MODES
651
+ ════════════════════════════════════════════════════ -->
652
+ <section>
653
+ <div class="section-title">Evaluation Modes</div>
654
+ <div class="section-sub">Three ways to compare Arabic LLMs β€” only Ranked Arena feeds the official leaderboard</div>
655
+ <div class="cards-grid">
656
+ <div class="card">
657
+ <div class="card-icon">βš”οΈ</div>
658
+ <h3>Ranked Arena</h3>
659
+ <p>Random model pairing, single-turn MSA, matched system instruction. Results feed the official Bradley–Terry leaderboard.</p>
660
+ <span class="card-tag">βœ“ BT Leaderboard</span>
661
+ </div>
662
+ <div class="card">
663
+ <div class="card-icon">↔️</div>
664
+ <h3>Side-by-Side</h3>
665
+ <p>User-selected model pair, any dialect. Useful for targeted comparisons but excluded from ranked scoring to prevent selection bias.</p>
666
+ <span class="card-tag card-tag-no">Win-rate only</span>
667
+ </div>
668
+ <div class="card">
669
+ <div class="card-icon">❓</div>
670
+ <h3>10 Questions</h3>
671
+ <p>Fixed Arabic prompt pool, any dialect. Provides consistent benchmarking within a curated set of questions.</p>
672
+ <span class="card-tag card-tag-no">Win-rate only</span>
673
+ </div>
674
+ </div>
675
+ </section>
676
+
677
+ <!-- ═══════════════════════════════════════════════════
678
+ SCORING METHODOLOGY
679
+ ════════════════════════════════════════════════════ -->
680
+ <section>
681
+ <div class="section-title">Scoring Methodology</div>
682
+ <div class="section-sub">Statistically rigorous Bradley–Terry model with four key components</div>
683
+ <div class="method-steps">
684
+ <div class="step">
685
+ <div class="step-num">1</div>
686
+ <h4>Inverse-Probability Weighting</h4>
687
+ <p>Corrects for non-uniform model exposure using Ξ΅-greedy adaptive sampling weights, clamped to [P1, P99].</p>
688
+ </div>
689
+ <div class="step">
690
+ <div class="step-num">2</div>
691
+ <h4>Bootstrap Confidence Intervals</h4>
692
+ <p>200 vote-level resamples per run to produce 95% CIs on every model's BT score.</p>
693
+ </div>
694
+ <div class="step">
695
+ <div class="step-num">3</div>
696
+ <h4>Rankability Gate</h4>
697
+ <p>BT scores published only when the comparison graph is fully connected and ESS is sufficient; otherwise win-rate fallback is shown.</p>
698
+ </div>
699
+ <div class="step">
700
+ <div class="step-num">4</div>
701
+ <h4>Log-odds Scale</h4>
702
+ <p>1000-centered unscaled log-odds. A 1-point gap β‰ˆ 2.7:1 win odds. Full reproducibility: pipeline and dataset are open.</p>
703
+ </div>
704
+ </div>
705
+ </section>
706
+
707
+ <!-- ═══════════════════════════════════════════════════
708
+ DATASET CALLOUT
709
+ ════════════════════════════════════════════════════ -->
710
+ <section>
711
+ <div class="dataset-callout">
712
+ <div class="ds-icon">πŸ“¦</div>
713
+ <div style="flex:1">
714
+ <h3>Haakkim/Haakkim-1.0v β€” Battle Dataset</h3>
715
+ <p>1,273 battle records (Parquet, PII-scrubbed). Includes voted comparisons and skipped battles across all 11 dialects and 3 evaluation modes. Full conversation transcripts, sampling weights, category annotations.</p>
716
+ </div>
717
+ <a href="https://huggingface.co/datasets/Haakkim/Haakkim-1.0v" class="btn btn-dataset">View Dataset β†’</a>
718
+ </div>
719
+ </section>
720
+
721
+ <!-- ═══════════════════════════════════════════════════
722
+ TEAM
723
+ ════════════════════════════════════════════════════ -->
724
+ <section>
725
+ <div class="section-title">Team</div>
726
+ <div class="section-sub">College of Computing, Umm Al-Qura University β€” Mecca, Saudi Arabia</div>
727
+ <div class="team-grid">
728
+ <div class="team-card">
729
+ <div class="team-avatar">MM</div>
730
+ <div class="team-name">Mourad Mars</div>
731
+ <div class="team-role">Principal Investigator</div>
732
+ <a href="https://huggingface.co/mouradmars" class="team-hf">πŸ€— mouradmars</a>
733
+ </div>
734
+ <div class="team-card">
735
+ <div class="team-avatar">HB</div>
736
+ <div class="team-name">Hassan Barmandah</div>
737
+ <div class="team-role">AI Researcher</div>
738
+ <a href="https://huggingface.co/HassanB4" class="team-hf">πŸ€— HassanB4</a>
739
+ </div>
740
+ <div class="team-card">
741
+ <div class="team-avatar">AA</div>
742
+ <div class="team-name">Abdulrhman Alassaf</div>
743
+ <div class="team-role">Software Engineer</div>
744
+ <span style="font-size:0.78rem;color:var(--muted)">Umm Al-Qura University</span>
745
+ </div>
746
+ </div>
747
+ </section>
748
+
749
+ <!-- ═══════════════════════════════════════════════════
750
+ CITATION
751
+ ════════════════════════════════════════════════════ -->
752
+ <section>
753
+ <div class="section-title">Citation</div>
754
+ <div class="section-sub">If you use Haakkim or this dataset in your research, please cite:</div>
755
+ <pre style="background:var(--surface);border:1px solid var(--border);border-radius:var(--radius);padding:20px 24px;font-size:0.82rem;line-height:1.7;overflow-x:auto;color:var(--muted)"><code>@misc{mars2026haakkim,
756
+ title = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}},
757
+ author = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman},
758
+ year = {2026},
759
+ howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}},
760
+ note = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia}
761
+ }</code></pre>
762
+ </section>
763
+
764
+ </div><!-- /container -->
765
+
766
+ <footer>
767
+ <div>
768
+ <a href="https://haakkim.tech">haakkim.tech</a> &nbsp;Β·&nbsp;
769
+ <a href="https://haakkim.tech/#leaderboard">Leaderboard</a> &nbsp;Β·&nbsp;
770
+ <a href="https://huggingface.co/datasets/Haakkim/Haakkim-1.0v">Dataset</a>
771
+ </div>
772
+ <div style="margin-top:8px">College of Computing, Umm Al-Qura University Β· Mecca, Saudi Arabia Β· CC BY 4.0</div>
773
+ </footer>
774
+
775
+ </body>
776
+ </html>