HassanB4 commited on
Commit
905ffb8
·
verified ·
1 Parent(s): 37f7eae

Improve org README: clean layout, remove stale snapshot/leaderboard/methodology sections

Browse files
Files changed (1) hide show
  1. README.md +24 -39
README.md CHANGED
@@ -11,8 +11,6 @@ pinned: true
11
 
12
  **An open arena-style human preference evaluation platform for Arabic LLMs.**
13
 
14
- Haakkim collects blind pairwise judgments between Arabic language models and ranks them using a statistically grounded Bradley–Terry model with inverse-probability weighting and bootstrap confidence intervals.
15
-
16
  🌐 [haakkim.tech](https://haakkim.tech)  ·  🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard)  ·  📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
17
 
18
  ---
@@ -30,35 +28,13 @@ Most Arabic LLM benchmarks rely on fixed tasks and MSA-only evaluation. Haakkim
30
 
31
  ---
32
 
33
- ## Current Snapshot (v1.0)
34
-
35
- | | |
36
- |---|---|
37
- | Total battles collected | 1,273 |
38
- | Ranked-eligible (BT) | 831 |
39
- | Models on leaderboard | 67 |
40
- | Dialects covered | 11 |
41
- | Graph | Fully connected · 774 edges · density 0.35 |
42
- | ESS (clamped) | 465 |
43
-
44
- ---
45
-
46
- ## MSA Leaderboard — Top 10
47
 
48
- | Rank | Model | BT Score |
49
  |---|---|---|
50
- | 1 | mistralai/ministral-3b-2512 | 1001.75 |
51
- | 2 | mistralai/ministral-8b-2512 | 1001.61 |
52
- | 3 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 1001.21 |
53
- | 4 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 1001.14 |
54
- | 5 | deepseek/deepseek-v3.2-exp | 1001.13 |
55
- | 6 | deepseek/deepseek-v3.1 | 1000.99 |
56
- | 7 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 1000.98 |
57
- | 8 | deepseek/deepseek-r1-0528 | 1000.93 |
58
- | 9 | openai/gpt-oss-120b | 1000.93 |
59
- | 10 | deepseek/deepseek-v3.2 | 1000.89 |
60
-
61
- Scores are 1000-centered log-odds units. Full leaderboard → [haakkim.tech/#leaderboard](https://haakkim.tech/#leaderboard)
62
 
63
  ---
64
 
@@ -68,19 +44,28 @@ The first public release of Haakkim battle data is available on Hugging Face:
68
 
69
  📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
70
 
71
- - 1,273 battle records (JSONL → Parquet, PII-scrubbed)
72
- - Includes voted comparisons and skipped battles
73
- - All 11 dialect varieties and all 3 evaluation modes
74
- - Full conversation transcripts, sampling weights, category annotations
75
-
76
  ---
77
 
78
  ## Team
79
 
80
  **College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
81
 
82
- | | |
83
- |---|---|
84
- | [Mourad Mars](https://huggingface.co/mouradmars) | Principal Investigator |
85
- | [Hassan Barmandah](https://huggingface.co/HassanB4) | AI Researcher |
86
- | Abdulrhman Alassaf | Software Engineer |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  **An open arena-style human preference evaluation platform for Arabic LLMs.**
13
 
 
 
14
  🌐 [haakkim.tech](https://haakkim.tech)  ·  🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard)  ·  📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
15
 
16
  ---
 
28
 
29
  ---
30
 
31
+ ## Evaluation Modes
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ | Mode | Description | Used for Ranking |
34
  |---|---|---|
35
+ | **Ranked Arena** | Random model pairing, single-turn MSA, matched system instruction | Official BT leaderboard |
36
+ | **Side-by-Side** | User-selected model pair, any dialect | Win-rate only |
37
+ | **10 Questions** | Fixed Arabic prompt pool, any dialect | Win-rate only |
 
 
 
 
 
 
 
 
 
38
 
39
  ---
40
 
 
44
 
45
  📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
46
 
 
 
 
 
 
47
  ---
48
 
49
  ## Team
50
 
51
  **College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
52
 
53
+ | Name | Role | Email |
54
+ |---|---|---|
55
+ | Dr. Mourad Mars | Principal Investigator | msmars@uqu.edu.sa |
56
+ | Hassan Barmandah | AI Researcher | hassanhbarmandah@gmail.com |
57
+ | Abdulrhman Alassaf | Software Engineer | aaalassaf@outlook.com |
58
+
59
+ ---
60
+
61
+ ## Citation
62
+
63
+ ```bibtex
64
+ @misc{mars2026haakkim,
65
+ title = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}},
66
+ author = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman},
67
+ year = {2026},
68
+ howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}},
69
+ note = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia}
70
+ }
71
+ ```