# nurcunal/BEDAI-2.4B Fine-tuned Turkish instruct model (law domain) based on `nurcunal/BEDAI-2B`, merged QLoRA adapters. ## Usage (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch m = "nurcunal/BEDAI-2.4B" tok = AutoTokenizer.from_pretrained(m, use_fast=True, trust_remote_code=True) mdl = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True) if tok.pad_token_id is None and tok.eos_token_id is not None: tok.pad_token_id = tok.eos_token_id p = "[SİSTEM]: Türk hukuku hakkında kısa ve net yanıt ver.\n[KULLANICI]: İdari yargıda yürütmenin durdurulması nedir?\n[ASİSTAN]:" x = tok(p, return_tensors="pt").to(mdl.device) y = mdl.generate(**x, max_new_tokens=200, temperature=0.7, top_p=0.9) print(tok.decode(y[0], skip_special_tokens=True)) ``` model-index: - name: BEDAI-2.4B results: - task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics: - name: accuracy_norm type: accuracy value: 32.31 - task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics: - name: f1 type: f1 value: 23.5035 - task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics: - name: f1 type: f1 value: 16.4439 - task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics: - name: accuracy_norm type: accuracy value: 51.26 ## Evaluation (CETVEL – Turkish subsets) **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58** **BEDAI-2.4B (this run, full):** MCQA **32.31**, QA **19.97** (mean of TQuAD/XQuAD-TR F1), TC **51.26**

Model MCQA QA TC

BEDAI-2B 25.70 17.97 51.58

BEDAI-2.4B (this work) 32.31 19.97 51.26

_{Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).}

Model MCQA QA TC

CohereLabs__aya-expanse-32b 52.47 20.48 50.67

CohereLabs__aya-expanse-8b 44.09 0.19 50.03

google__gemma-2-9b-it 48.20 4.46 45.38

google__gemma-3-12b-it 52.66 10.26 54.38

google__gemma-3-27b-it 55.40 10.56 53.65

google__gemma-3-4b-it 42.33 8.22 46.15

Kumru-2B (full) 19.59 10.00 31.62

Llama-3.1-8B-Instruct 45.77 38.99 46.51

Llama-3.3-70B-Instruct 60.70 23.97 63.73

meta-llama__Llama-3.2-11B-Vision-Instruct 45.66 4.37 47.88

meta-llama__Llama-3.2-3B-Instruct 37.00 7.52 39.00

Qwen__Qwen2-72B-Instruct 61.27 0.83 60.47

Qwen__Qwen2-7B-Instruct 49.66 1.53 52.52

Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0 53.28 0.17 54.06

Trendyol__Trendyol-LLM-7B-chat-v4.1.0 54.94 0.34 52.12

ytu-ce-cosmos__Turkish-Gemma-9b-v0.1 51.85 11.11 46.97

ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1 35.20 0.28 52.77

> **Notes** > • QA = mean F1 over **TQuAD (TR)** and **XQuAD (TR)** for this run.

Model	MCQA	QA	TC
BEDAI-2B	25.70	17.97	51.58
BEDAI-2.4B (this work)	32.31	19.97	51.26

Model	MCQA	QA	TC
CohereLabs__aya-expanse-32b	52.47	20.48	50.67
CohereLabs__aya-expanse-8b	44.09	0.19	50.03
google__gemma-2-9b-it	48.20	4.46	45.38
google__gemma-3-12b-it	52.66	10.26	54.38
google__gemma-3-27b-it	55.40	10.56	53.65
google__gemma-3-4b-it	42.33	8.22	46.15
Kumru-2B (full)	19.59	10.00	31.62
Llama-3.1-8B-Instruct	45.77	38.99	46.51
Llama-3.3-70B-Instruct	60.70	23.97	63.73
meta-llama__Llama-3.2-11B-Vision-Instruct	45.66	4.37	47.88
meta-llama__Llama-3.2-3B-Instruct	37.00	7.52	39.00
Qwen__Qwen2-72B-Instruct	61.27	0.83	60.47
Qwen__Qwen2-7B-Instruct	49.66	1.53	52.52
Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0	53.28	0.17	54.06
Trendyol__Trendyol-LLM-7B-chat-v4.1.0	54.94	0.34	52.12
ytu-ce-cosmos__Turkish-Gemma-9b-v0.1	51.85	11.11	46.97
ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1	35.20	0.28	52.77