model-index: - name: BEDAI-2B results: - task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics: - name: accuracy_norm type: accuracy value: 25.70 - task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics: - name: exact_match type: exact_match value: 9.9807 - name: f1 type: f1 value: 22.9314 - task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics: - name: exact_match type: exact_match value: 6.4706 - name: f1 type: f1 value: 13.0114 - task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics: - name: accuracy_norm type: accuracy value: 51.58 ## Evaluation (CETVEL – Turkish subsets) Raw artifacts: **[nurcunal/BEDAI-2B-cetvel-2025-10-31](https://huggingface.co/datasets/nurcunal/BEDAI-2B-cetvel-2025-10-31)** This quick sweep covers **MCQA** (`exams_tr`), **QA** (mean F1 of `tquad` + `xquad_tr`), and **TC** (`turkish_plu` acc_norm). **BEDAI-2B (this run):** MCQA **25.70**, QA **17.97**, TC **51.58**
| Model | MCQA | QA | TC |
|---|---|---|---|
| BEDAI-2B (this work) | 25.70 | 17.97 | 51.58 |
| CohereLabs__aya-expanse-32b | 52.47 | 20.48 | 50.67 |
| CohereLabs__aya-expanse-8b | 44.09 | 0.19 | 50.03 |
| google__gemma-2-9b-it | 48.20 | 4.46 | 45.38 |
| google__gemma-3-12b-it | 52.66 | 10.26 | 54.38 |
| google__gemma-3-27b-it | 55.40 | 10.56 | 53.65 |
| google__gemma-3-4b-it | 42.33 | 8.22 | 46.15 |
| Kumru-2B | 39.69 | 6.50 | 47.57 |
| Llama-3.1-8B-Instruct | 45.77 | 38.99 | 46.51 |
| Llama-3.3-70B-Instruct | 60.70 | 23.97 | 63.73 |
| meta-llama__Llama-3.2-11B-Vision-Instruct | 45.66 | 4.37 | 47.88 |
| meta-llama__Llama-3.2-3B-Instruct | 37.00 | 7.52 | 39.00 |
| Qwen__Qwen2-72B-Instruct | 61.27 | 0.83 | 60.47 |
| Qwen__Qwen2-7B-Instruct | 49.66 | 1.53 | 52.52 |
| Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0 | 53.28 | 0.17 | 54.06 |
| Trendyol__Trendyol-LLM-7B-chat-v4.1.0 | 54.94 | 0.34 | 52.12 |
| ytu-ce-cosmos__Turkish-Gemma-9b-v0.1 | 51.85 | 11.11 | 46.97 |
| ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1 | 35.20 | 0.28 | 52.77 |