nurcunal
/

BEDAI-2.4B

Safetensors

mistral

Model card Files Files and versions

xet

Community

nurcunal commited on Nov 5, 2025

Commit

aa32592

verified ·

1 Parent(s): 5707f35

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -14

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ model-index:
     metrics:
     - name: accuracy_norm
       type: accuracy
-      value: 34.00
   - task:
       type: question-answering-extractive
@@ -43,7 +43,7 @@ model-index:
     metrics:
     - name: f1
       type: f1
-      value: 19.5659
   - task:
       type: question-answering-extractive
@@ -55,7 +55,7 @@ model-index:
     metrics:
     - name: f1
       type: f1
-      value: 38.2748
   - task:
       type: text-classification
@@ -67,28 +67,33 @@ model-index:
     metrics:
     - name: accuracy_norm
       type: accuracy
-      value: 52.00
 ## Evaluation (CETVEL – Turkish subsets)
 **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
-**BEDAI-2.4B (this run, limit=50):** MCQA **34.00**, QA **28.92** (mean of TQuAD/XQuAD-TR F1), TC **52.00**
 <table>
 <thead>
 <tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
 </thead>
 <tbody>
 <tr><th style="text-align:left">BEDAI-2B</th>
 <td style="background:#f4cccc">25.70</td>
 <td style="background:#f8cbad">17.97</td>
 <td style="background:#ffeb9c">51.58</td></tr>
-<tr><th style="text-align:left">BEDAI-2.4B (this work) </th>
-<td style="background:#c6efce">34.00</td>
-<td style="background:#c6efce">28.92</td>
-<td style="background:#c6efce">52.00</td></tr>
 <tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
 <td style="background:#ffeb9c">52.47</td>
@@ -120,10 +125,10 @@ model-index:
 <td style="background:#f4cccc">8.22</td>
 <td style="background:#f8cbad">46.15</td></tr>
-<tr><th style="text-align:left">Kumru-2B</th>
-<td style="background:#f8cbad">39.69</td>
-<td style="background:#f4cccc">6.50</td>
-<td style="background:#ffeb9c">47.57</td></tr>
 <tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
 <td style="background:#ffeb9c">45.77</td>

     metrics:
     - name: accuracy_norm
       type: accuracy
+      value: 32.31
   - task:
       type: question-answering-extractive
     metrics:
     - name: f1
       type: f1
+      value: 23.5035
   - task:
       type: question-answering-extractive
     metrics:
     - name: f1
       type: f1
+      value: 16.4439
   - task:
       type: text-classification
     metrics:
     - name: accuracy_norm
       type: accuracy
+      value: 51.26
 ## Evaluation (CETVEL – Turkish subsets)
 **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
+**BEDAI-2.4B (this run, full):** MCQA **32.31**, QA **19.97** (mean of TQuAD/XQuAD-TR F1), TC **51.26**
 <table>
 <thead>
 <tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
 </thead>
 <tbody>
 <tr><th style="text-align:left">BEDAI-2B</th>
 <td style="background:#f4cccc">25.70</td>
 <td style="background:#f8cbad">17.97</td>
 <td style="background:#ffeb9c">51.58</td></tr>
+<tr><th style="text-align:left">BEDAI-2.4B (this work)</th>
+<td style="background:#c6efce">32.31</td>
+<td style="background:#c6efce">19.97</td>
+<td style="background:#c6efce">51.26</td></tr>
+</tbody>
+</table>
+<sub>Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).</sub>
 <tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
 <td style="background:#ffeb9c">52.47</td>
 <td style="background:#f4cccc">8.22</td>
 <td style="background:#f8cbad">46.15</td></tr>
+<tr><th style="text-align:left">Kumru-2B (full)</th>
+<td style="background:#f4cccc">19.59</td>
+<td style="background:#f4cccc">10.00</td>
+<td style="background:#f4cccc">31.62</td></tr>
 <tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
 <td style="background:#ffeb9c">45.77</td>