BEDAI-2.4B / README.md

Update README.md

9b1a4f5 verified 3 months ago

6.28 kB

	# nurcunal/BEDAI-2.4B

	Fine-tuned Turkish instruct model (law domain) based on `nurcunal/BEDAI-2B`, merged QLoRA adapters.

	## Usage (Transformers)
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch
	m = "nurcunal/BEDAI-2.4B"
	tok = AutoTokenizer.from_pretrained(m, use_fast=True, trust_remote_code=True)
	mdl = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
	if tok.pad_token_id is None and tok.eos_token_id is not None:
	tok.pad_token_id = tok.eos_token_id
	p = "<s>[SİSTEM]: Türk hukuku hakkında kısa ve net yanıt ver.\n[KULLANICI]: İdari yargıda yürütmenin durdurulması nedir?\n[ASİSTAN]:"
	x = tok(p, return_tensors="pt").to(mdl.device)
	y = mdl.generate(**x, max_new_tokens=200, temperature=0.7, top_p=0.9)
	print(tok.decode(y[0], skip_special_tokens=True))
	```


	model-index:
	- name: BEDAI-2.4B
	results:
	- task:
	type: multiple-choice
	name: Exams (TR)
	dataset:
	name: exams_tr
	type: exams_tr
	args: {split: validation}
	metrics:
	- name: accuracy_norm
	type: accuracy
	value: 32.31

	- task:
	type: question-answering-extractive
	name: TQuAD (TR)
	dataset:
	name: tquad
	type: tquad
	args: {split: validation}
	metrics:
	- name: f1
	type: f1
	value: 23.5035

	- task:
	type: question-answering-extractive
	name: XQuAD (TR)
	dataset:
	name: xquad_tr
	type: xquad_tr
	args: {split: validation}
	metrics:
	- name: f1
	type: f1
	value: 16.4439

	- task:
	type: text-classification
	name: Turkish PLU (overall)
	dataset:
	name: turkish_plu
	type: turkish_plu
	args: {split: test}
	metrics:
	- name: accuracy_norm
	type: accuracy
	value: 51.26


	## Evaluation (CETVEL – Turkish subsets)

	BEDAI-2B: MCQA 25.70, QA 17.97, TC 51.58
	BEDAI-2.4B (this run, full): MCQA 32.31, QA 19.97 (mean of TQuAD/XQuAD-TR F1), TC 51.26

	<table>
	<thead>
	<tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
	</thead>
	<tbody>
	<tr><th style="text-align:left">BEDAI-2B</th>
	<td style="background:#f4cccc">25.70</td>
	<td style="background:#f8cbad">17.97</td>
	<td style="background:#ffeb9c">51.58</td></tr>

	<tr><th style="text-align:left">BEDAI-2.4B (this work)</th>
	<td style="background:#c6efce">32.31</td>
	<td style="background:#c6efce">19.97</td>
	<td style="background:#c6efce">51.26</td></tr>
	</tbody>
	</table>

	<sub>Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).</sub>


	<table>
	<thead>
	<tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
	</thead>
	<tbody>

	<tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
	<td style="background:#ffeb9c">52.47</td>
	<td style="background:#f8cbad">20.48</td>
	<td style="background:#ffeb9c">50.67</td></tr>

	<tr><th style="text-align:left">CohereLabs__aya-expanse-8b</th>
	<td style="background:#f8cbad">44.09</td>
	<td style="background:#f4cccc">0.19</td>
	<td style="background:#ffeb9c">50.03</td></tr>

	<tr><th style="text-align:left">google__gemma-2-9b-it</th>
	<td style="background:#ffeb9c">48.20</td>
	<td style="background:#f4cccc">4.46</td>
	<td style="background:#f8cbad">45.38</td></tr>

	<tr><th style="text-align:left">google__gemma-3-12b-it</th>
	<td style="background:#ffeb9c">52.66</td>
	<td style="background:#f4cccc">10.26</td>
	<td style="background:#ffeb9c">54.38</td></tr>

	<tr><th style="text-align:left">google__gemma-3-27b-it</th>
	<td style="background:#c6efce">55.40</td>
	<td style="background:#f4cccc">10.56</td>
	<td style="background:#ffeb9c">53.65</td></tr>

	<tr><th style="text-align:left">google__gemma-3-4b-it</th>
	<td style="background:#f8cbad">42.33</td>
	<td style="background:#f4cccc">8.22</td>
	<td style="background:#f8cbad">46.15</td></tr>

	<tr><th style="text-align:left">Kumru-2B (full)</th>
	<td style="background:#f4cccc">19.59</td>
	<td style="background:#f4cccc">10.00</td>
	<td style="background:#f4cccc">31.62</td></tr>

	<tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
	<td style="background:#ffeb9c">45.77</td>
	<td style="background:#c6efce">38.99</td>
	<td style="background:#f8cbad">46.51</td></tr>

	<tr><th style="text-align:left">Llama-3.3-70B-Instruct</th>
	<td style="background:#c6efce">60.70</td>
	<td style="background:#ffeb9c">23.97</td>
	<td style="background:#c6efce">63.73</td></tr>

	<tr><th style="text-align:left">meta-llama__Llama-3.2-11B-Vision-Instruct</th>
	<td style="background:#ffeb9c">45.66</td>
	<td style="background:#f4cccc">4.37</td>
	<td style="background:#f8cbad">47.88</td></tr>

	<tr><th style="text-align:left">meta-llama__Llama-3.2-3B-Instruct</th>
	<td style="background:#f8cbad">37.00</td>
	<td style="background:#f4cccc">7.52</td>
	<td style="background:#f4cccc">39.00</td></tr>

	<tr><th style="text-align:left">Qwen__Qwen2-72B-Instruct</th>
	<td style="background:#c6efce">61.27</td>
	<td style="background:#f4cccc">0.83</td>
	<td style="background:#c6efce">60.47</td></tr>

	<tr><th style="text-align:left">Qwen__Qwen2-7B-Instruct</th>
	<td style="background:#ffeb9c">49.66</td>
	<td style="background:#f4cccc">1.53</td>
	<td style="background:#ffeb9c">52.52</td></tr>

	<tr><th style="text-align:left">Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0</th>
	<td style="background:#c6efce">53.28</td>
	<td style="background:#f4cccc">0.17</td>
	<td style="background:#c6efce">54.06</td></tr>

	<tr><th style="text-align:left">Trendyol__Trendyol-LLM-7B-chat-v4.1.0</th>
	<td style="background:#c6efce">54.94</td>
	<td style="background:#f4cccc">0.34</td>
	<td style="background:#ffeb9c">52.12</td></tr>

	<tr><th style="text-align:left">ytu-ce-cosmos__Turkish-Gemma-9b-v0.1</th>
	<td style="background:#ffeb9c">51.85</td>
	<td style="background:#f4cccc">11.11</td>
	<td style="background:#f8cbad">46.97</td></tr>

	<tr><th style="text-align:left">ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1</th>
	<td style="background:#f8cbad">35.20</td>
	<td style="background:#f4cccc">0.28</td>
	<td style="background:#ffeb9c">52.77</td></tr>

	</tbody>
	</table>


	> Notes
	> • QA = mean F1 over TQuAD (TR) and XQuAD (TR) for this run.