Spaces:

Haakkim
/

README

Running

App Files Files Community

HassanB4 commited on 11 days ago

Commit

905ffb8

verified ·

1 Parent(s): 37f7eae

Improve org README: clean layout, remove stale snapshot/leaderboard/methodology sections

Browse files

Files changed (1) hide show

README.md +24 -39

README.md CHANGED Viewed

@@ -11,8 +11,6 @@ pinned: true
 **An open arena-style human preference evaluation platform for Arabic LLMs.**
-Haakkim collects blind pairwise judgments between Arabic language models and ranks them using a statistically grounded Bradley–Terry model with inverse-probability weighting and bootstrap confidence intervals.
 🌐 [haakkim.tech](https://haakkim.tech) &nbsp;·&nbsp; 🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard) &nbsp;·&nbsp; 📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
 ---
@@ -30,35 +28,13 @@ Most Arabic LLM benchmarks rely on fixed tasks and MSA-only evaluation. Haakkim
 ---
-## Current Snapshot (v1.0)
-| | |
-|---|---|
-| Total battles collected | 1,273 |
-| Ranked-eligible (BT) | 831 |
-| Models on leaderboard | 67 |
-| Dialects covered | 11 |
-| Graph | Fully connected · 774 edges · density 0.35 |
-| ESS (clamped) | 465 |
----
-## MSA Leaderboard — Top 10
-| Rank | Model | BT Score |
 |---|---|---|
-| 1 | mistralai/ministral-3b-2512 | 1001.75 |
-| 2 | mistralai/ministral-8b-2512 | 1001.61 |
-| 3 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 1001.21 |
-| 4 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 1001.14 |
-| 5 | deepseek/deepseek-v3.2-exp | 1001.13 |
-| 6 | deepseek/deepseek-v3.1 | 1000.99 |
-| 7 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 1000.98 |
-| 8 | deepseek/deepseek-r1-0528 | 1000.93 |
-| 9 | openai/gpt-oss-120b | 1000.93 |
-| 10 | deepseek/deepseek-v3.2 | 1000.89 |
-Scores are 1000-centered log-odds units. Full leaderboard → [haakkim.tech/#leaderboard](https://haakkim.tech/#leaderboard)
 ---
@@ -68,19 +44,28 @@ The first public release of Haakkim battle data is available on Hugging Face:
 📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
-- 1,273 battle records (JSONL → Parquet, PII-scrubbed)
-- Includes voted comparisons and skipped battles
-- All 11 dialect varieties and all 3 evaluation modes
-- Full conversation transcripts, sampling weights, category annotations
 ---
 ## Team
 **College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
-| | |
-|---|---|
-| [Mourad Mars](https://huggingface.co/mouradmars) | Principal Investigator |
-| [Hassan Barmandah](https://huggingface.co/HassanB4) | AI Researcher |
-| Abdulrhman Alassaf | Software Engineer |

 **An open arena-style human preference evaluation platform for Arabic LLMs.**
 🌐 [haakkim.tech](https://haakkim.tech) &nbsp;·&nbsp; 🏆 [Live Leaderboard](https://haakkim.tech/#leaderboard) &nbsp;·&nbsp; 📦 [Dataset](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)
 ---
 ---
+## Evaluation Modes
+| Mode | Description | Used for Ranking |
 |---|---|---|
+| **Ranked Arena** | Random model pairing, single-turn MSA, matched system instruction | ✅ Official BT leaderboard |
+| **Side-by-Side** | User-selected model pair, any dialect | Win-rate only |
+| **10 Questions** | Fixed Arabic prompt pool, any dialect | Win-rate only |
 ---
 📦 **[Haakkim/Haakkim-1.0v](https://huggingface.co/datasets/Haakkim/Haakkim-1.0v)**
 ---
 ## Team
 **College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia**
+| Name | Role | Email |
+|---|---|---|
+| Dr. Mourad Mars | Principal Investigator | msmars@uqu.edu.sa |
+| Hassan Barmandah | AI Researcher | hassanhbarmandah@gmail.com |
+| Abdulrhman Alassaf | Software Engineer | aaalassaf@outlook.com |
+---
+## Citation
+```bibtex
+@misc{mars2026haakkim,
+  title        = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}},
+  author       = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman},
+  year         = {2026},
+  howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}},
+  note         = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia}
+}
+```