YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

model-index:

name: BEDAI-2B results:
- task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics:
  - name: accuracy_norm type: accuracy value: 25.70
- task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics:
  - name: exact_match type: exact_match value: 9.9807
  - name: f1 type: f1 value: 22.9314
- task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics:
  - name: exact_match type: exact_match value: 6.4706
  - name: f1 type: f1 value: 13.0114
- task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics:
  - name: accuracy_norm type: accuracy value: 51.58

Evaluation (CETVEL – Turkish subsets)

Raw artifacts: nurcunal/BEDAI-2B-cetvel-2025-10-31 This quick sweep covers MCQA (exams_tr), QA (mean F1 of tquad + xquad_tr), and TC (turkish_plu acc_norm).

BEDAI-2B (this run): MCQA 25.70, QA 17.97, TC 51.58

Model	MCQA	QA	TC
BEDAI-2B (this work)	25.70	17.97	51.58
CohereLabs__aya-expanse-32b	52.47	20.48	50.67
CohereLabs__aya-expanse-8b	44.09	0.19	50.03
google__gemma-2-9b-it	48.20	4.46	45.38
google__gemma-3-12b-it	52.66	10.26	54.38
google__gemma-3-27b-it	55.40	10.56	53.65
google__gemma-3-4b-it	42.33	8.22	46.15
Kumru-2B	39.69	6.50	47.57
Llama-3.1-8B-Instruct	45.77	38.99	46.51
Llama-3.3-70B-Instruct	60.70	23.97	63.73
meta-llama__Llama-3.2-11B-Vision-Instruct	45.66	4.37	47.88
meta-llama__Llama-3.2-3B-Instruct	37.00	7.52	39.00
Qwen__Qwen2-72B-Instruct	61.27	0.83	60.47
Qwen__Qwen2-7B-Instruct	49.66	1.53	52.52
Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0	53.28	0.17	54.06
Trendyol__Trendyol-LLM-7B-chat-v4.1.0	54.94	0.34	52.12
ytu-ce-cosmos__Turkish-Gemma-9b-v0.1	51.85	11.11	46.97
ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1	35.20	0.28	52.77

Notes
• QA = mean F1 over TQuAD and XQuAD-TR for this run.
• CETVEL has more tasks (GEC/MT/NLI/SUM); this compares shared Turkish subsets only.
• For reproducibility, see the dataset repo above and the exact command used.

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support