Improve org README: clean layout, remove stale snapshot/leaderboard/methodology sections
Browse files
README.md
CHANGED
|
@@ -11,8 +11,6 @@ pinned: true
|
|
| 11 |
|
| 12 |
**An open arena-style human preference evaluation platform for Arabic LLMs.**
|
| 13 |
|
| 14 |
-
Haakkim collects blind pairwise judgments between Arabic language models and ranks them using a statistically grounded Bradley–Terry model with inverse-probability weighting and bootstrap confidence intervals.
|
| 15 |
-
|
| 16 |
🌐 [haakkim.tech](https://haakkim.tech) · 🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard) · 📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
|
| 17 |
|
| 18 |
---
|
|
@@ -30,35 +28,13 @@ Most Arabic LLM benchmarks rely on fixed tasks and MSA-only evaluation. Haakkim
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
-
##
|
| 34 |
-
|
| 35 |
-
| | |
|
| 36 |
-
|---|---|
|
| 37 |
-
| Total battles collected | 1,273 |
|
| 38 |
-
| Ranked-eligible (BT) | 831 |
|
| 39 |
-
| Models on leaderboard | 67 |
|
| 40 |
-
| Dialects covered | 11 |
|
| 41 |
-
| Graph | Fully connected · 774 edges · density 0.35 |
|
| 42 |
-
| ESS (clamped) | 465 |
|
| 43 |
-
|
| 44 |
-
---
|
| 45 |
-
|
| 46 |
-
## MSA Leaderboard — Top 10
|
| 47 |
|
| 48 |
-
|
|
| 49 |
|---|---|---|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
|
| 53 |
-
| 4 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 1001.14 |
|
| 54 |
-
| 5 | deepseek/deepseek-v3.2-exp | 1001.13 |
|
| 55 |
-
| 6 | deepseek/deepseek-v3.1 | 1000.99 |
|
| 56 |
-
| 7 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 1000.98 |
|
| 57 |
-
| 8 | deepseek/deepseek-r1-0528 | 1000.93 |
|
| 58 |
-
| 9 | openai/gpt-oss-120b | 1000.93 |
|
| 59 |
-
| 10 | deepseek/deepseek-v3.2 | 1000.89 |
|
| 60 |
-
|
| 61 |
-
Scores are 1000-centered log-odds units. Full leaderboard → [haakkim.tech/#leaderboard](https://haakkim.tech/#leaderboard)
|
| 62 |
|
| 63 |
---
|
| 64 |
|
|
@@ -68,19 +44,28 @@ The first public release of Haakkim battle data is available on Hugging Face:
|
|
| 68 |
|
| 69 |
📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
|
| 70 |
|
| 71 |
-
- 1,273 battle records (JSONL → Parquet, PII-scrubbed)
|
| 72 |
-
- Includes voted comparisons and skipped battles
|
| 73 |
-
- All 11 dialect varieties and all 3 evaluation modes
|
| 74 |
-
- Full conversation transcripts, sampling weights, category annotations
|
| 75 |
-
|
| 76 |
---
|
| 77 |
|
| 78 |
## Team
|
| 79 |
|
| 80 |
**College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
|
| 81 |
|
| 82 |
-
| | |
|
| 83 |
-
|---|---|
|
| 84 |
-
|
|
| 85 |
-
|
|
| 86 |
-
| Abdulrhman Alassaf | Software Engineer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
**An open arena-style human preference evaluation platform for Arabic LLMs.**
|
| 13 |
|
|
|
|
|
|
|
| 14 |
🌐 [haakkim.tech](https://haakkim.tech) · 🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard) · 📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
|
| 15 |
|
| 16 |
---
|
|
|
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
+
## Evaluation Modes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
| Mode | Description | Used for Ranking |
|
| 34 |
|---|---|---|
|
| 35 |
+
| **Ranked Arena** | Random model pairing, single-turn MSA, matched system instruction | ✅ Official BT leaderboard |
|
| 36 |
+
| **Side-by-Side** | User-selected model pair, any dialect | Win-rate only |
|
| 37 |
+
| **10 Questions** | Fixed Arabic prompt pool, any dialect | Win-rate only |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
---
|
| 40 |
|
|
|
|
| 44 |
|
| 45 |
📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
---
|
| 48 |
|
| 49 |
## Team
|
| 50 |
|
| 51 |
**College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
|
| 52 |
|
| 53 |
+
| Name | Role | Email |
|
| 54 |
+
|---|---|---|
|
| 55 |
+
| Dr. Mourad Mars | Principal Investigator | msmars@uqu.edu.sa |
|
| 56 |
+
| Hassan Barmandah | AI Researcher | hassanhbarmandah@gmail.com |
|
| 57 |
+
| Abdulrhman Alassaf | Software Engineer | aaalassaf@outlook.com |
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Citation
|
| 62 |
+
|
| 63 |
+
```bibtex
|
| 64 |
+
@misc{mars2026haakkim,
|
| 65 |
+
title = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}},
|
| 66 |
+
author = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman},
|
| 67 |
+
year = {2026},
|
| 68 |
+
howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}},
|
| 69 |
+
note = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia}
|
| 70 |
+
}
|
| 71 |
+
```
|