# nurcunal/BEDAI-2.4B Fine-tuned Turkish instruct model (law domain) based on `nurcunal/BEDAI-2B`, merged QLoRA adapters. ## Usage (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch m = "nurcunal/BEDAI-2.4B" tok = AutoTokenizer.from_pretrained(m, use_fast=True, trust_remote_code=True) mdl = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True) if tok.pad_token_id is None and tok.eos_token_id is not None: tok.pad_token_id = tok.eos_token_id p = "[SİSTEM]: Türk hukuku hakkında kısa ve net yanıt ver.\n[KULLANICI]: İdari yargıda yürütmenin durdurulması nedir?\n[ASİSTAN]:" x = tok(p, return_tensors="pt").to(mdl.device) y = mdl.generate(**x, max_new_tokens=200, temperature=0.7, top_p=0.9) print(tok.decode(y[0], skip_special_tokens=True)) ``` model-index: - name: BEDAI-2.4B results: - task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics: - name: accuracy_norm type: accuracy value: 32.31 - task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics: - name: f1 type: f1 value: 23.5035 - task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics: - name: f1 type: f1 value: 16.4439 - task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics: - name: accuracy_norm type: accuracy value: 51.26 ## Evaluation (CETVEL – Turkish subsets) **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58** **BEDAI-2.4B (this run, full):** MCQA **32.31**, QA **19.97** (mean of TQuAD/XQuAD-TR F1), TC **51.26**
ModelMCQAQATC
BEDAI-2B 25.70 17.97 51.58
BEDAI-2.4B (this work) 32.31 19.97 51.26
Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).
ModelMCQAQATC
CohereLabs__aya-expanse-32b 52.47 20.48 50.67
CohereLabs__aya-expanse-8b 44.09 0.19 50.03
google__gemma-2-9b-it 48.20 4.46 45.38
google__gemma-3-12b-it 52.66 10.26 54.38
google__gemma-3-27b-it 55.40 10.56 53.65
google__gemma-3-4b-it 42.33 8.22 46.15
Kumru-2B (full) 19.59 10.00 31.62
Llama-3.1-8B-Instruct 45.77 38.99 46.51
Llama-3.3-70B-Instruct 60.70 23.97 63.73
meta-llama__Llama-3.2-11B-Vision-Instruct 45.66 4.37 47.88
meta-llama__Llama-3.2-3B-Instruct 37.00 7.52 39.00
Qwen__Qwen2-72B-Instruct 61.27 0.83 60.47
Qwen__Qwen2-7B-Instruct 49.66 1.53 52.52
Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0 53.28 0.17 54.06
Trendyol__Trendyol-LLM-7B-chat-v4.1.0 54.94 0.34 52.12
ytu-ce-cosmos__Turkish-Gemma-9b-v0.1 51.85 11.11 46.97
ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1 35.20 0.28 52.77
> **Notes** > • QA = mean F1 over **TQuAD (TR)** and **XQuAD (TR)** for this run.