Haakkim — Arabic LLM Arena

How It Works

Evaluation Modes

Three ways to compare Arabic LLMs — only Ranked Arena feeds the official leaderboard.

⚔️

Ranked Arena

Random model pairing, single-turn MSA, matched system instruction. The only mode that feeds the official Bradley–Terry leaderboard.

✓ BT Leaderboard

↔️

Side-by-Side

User-selected model pair, any dialect. Useful for targeted comparisons — excluded from ranked scoring to prevent selection bias.

Win-rate only

❓

10 Questions

Fixed Arabic prompt pool, any dialect. Provides consistent benchmarking within a curated question set.

Win-rate only

Open Data

Battle Dataset

First public release of Haakkim battle data — fully open, PII-scrubbed, ready to use.

📦

Haakkim / Haakkim-1.0v

Battle records covering all 11 dialect varieties and all 3 evaluation modes. Includes full conversation transcripts, sampling weights, and category annotations.

1,273 battles 67 models 11 dialects CC BY 4.0 Parquet · PII-scrubbed

View Dataset →

People

Team

College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia

Dr. Mourad Mars

Principal Investigator

Umm Al-Qura University

✉ msmars@uqu.edu.sa

Hassan Barmandah

AI Researcher

Umm Al-Qura University

✉ hassanhbarmandah@gmail.com

Abdulrhman Alassaf

Software Engineer

Umm Al-Qura University

✉ aaalassaf@outlook.com

Research

Citation

If you use Haakkim or this dataset in your research, please cite:

@misc{mars2026haakkim, title = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}}, author = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman}, year = {2026}, howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}}, note = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia} }