YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

model-index:

  • name: BEDAI-2B results:
    • task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics:

      • name: accuracy_norm type: accuracy value: 25.70
    • task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics:

      • name: exact_match type: exact_match value: 9.9807
      • name: f1 type: f1 value: 22.9314
    • task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics:

      • name: exact_match type: exact_match value: 6.4706
      • name: f1 type: f1 value: 13.0114
    • task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics:

      • name: accuracy_norm type: accuracy value: 51.58

Evaluation (CETVEL โ€“ Turkish subsets)

Raw artifacts: nurcunal/BEDAI-2B-cetvel-2025-10-31 This quick sweep covers MCQA (exams_tr), QA (mean F1 of tquad + xquad_tr), and TC (turkish_plu acc_norm).

BEDAI-2B (this run): MCQA 25.70, QA 17.97, TC 51.58

ModelMCQAQATC
BEDAI-2B (this work) 25.70 17.97 51.58
CohereLabs__aya-expanse-32b 52.47 20.48 50.67
CohereLabs__aya-expanse-8b 44.09 0.19 50.03
google__gemma-2-9b-it 48.20 4.46 45.38
google__gemma-3-12b-it 52.66 10.26 54.38
google__gemma-3-27b-it 55.40 10.56 53.65
google__gemma-3-4b-it 42.33 8.22 46.15
Kumru-2B 39.69 6.50 47.57
Llama-3.1-8B-Instruct 45.77 38.99 46.51
Llama-3.3-70B-Instruct 60.70 23.97 63.73
meta-llama__Llama-3.2-11B-Vision-Instruct 45.66 4.37 47.88
meta-llama__Llama-3.2-3B-Instruct 37.00 7.52 39.00
Qwen__Qwen2-72B-Instruct 61.27 0.83 60.47
Qwen__Qwen2-7B-Instruct 49.66 1.53 52.52
Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0 53.28 0.17 54.06
Trendyol__Trendyol-LLM-7B-chat-v4.1.0 54.94 0.34 52.12
ytu-ce-cosmos__Turkish-Gemma-9b-v0.1 51.85 11.11 46.97
ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1 35.20 0.28 52.77

Notes
โ€ข QA = mean F1 over TQuAD and XQuAD-TR for this run.
โ€ข CETVEL has more tasks (GEC/MT/NLI/SUM); this compares shared Turkish subsets only.
โ€ข For reproducibility, see the dataset repo above and the exact command used.

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support