Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,7 @@ model-index:
|
|
| 31 |
metrics:
|
| 32 |
- name: accuracy_norm
|
| 33 |
type: accuracy
|
| 34 |
-
value:
|
| 35 |
|
| 36 |
- task:
|
| 37 |
type: question-answering-extractive
|
|
@@ -43,7 +43,7 @@ model-index:
|
|
| 43 |
metrics:
|
| 44 |
- name: f1
|
| 45 |
type: f1
|
| 46 |
-
value:
|
| 47 |
|
| 48 |
- task:
|
| 49 |
type: question-answering-extractive
|
|
@@ -55,7 +55,7 @@ model-index:
|
|
| 55 |
metrics:
|
| 56 |
- name: f1
|
| 57 |
type: f1
|
| 58 |
-
value:
|
| 59 |
|
| 60 |
- task:
|
| 61 |
type: text-classification
|
|
@@ -67,28 +67,33 @@ model-index:
|
|
| 67 |
metrics:
|
| 68 |
- name: accuracy_norm
|
| 69 |
type: accuracy
|
| 70 |
-
value:
|
|
|
|
| 71 |
|
| 72 |
## Evaluation (CETVEL – Turkish subsets)
|
| 73 |
|
| 74 |
**BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
|
| 75 |
-
**BEDAI-2.4B (this run,
|
| 76 |
|
| 77 |
<table>
|
| 78 |
<thead>
|
| 79 |
<tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
|
| 80 |
</thead>
|
| 81 |
<tbody>
|
| 82 |
-
|
| 83 |
<tr><th style="text-align:left">BEDAI-2B</th>
|
| 84 |
<td style="background:#f4cccc">25.70</td>
|
| 85 |
<td style="background:#f8cbad">17.97</td>
|
| 86 |
<td style="background:#ffeb9c">51.58</td></tr>
|
| 87 |
|
| 88 |
-
<tr><th style="text-align:left">BEDAI-2.4B (this work)
|
| 89 |
-
<td style="background:#c6efce">
|
| 90 |
-
<td style="background:#c6efce">
|
| 91 |
-
<td style="background:#c6efce">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
<tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
|
| 94 |
<td style="background:#ffeb9c">52.47</td>
|
|
@@ -120,10 +125,10 @@ model-index:
|
|
| 120 |
<td style="background:#f4cccc">8.22</td>
|
| 121 |
<td style="background:#f8cbad">46.15</td></tr>
|
| 122 |
|
| 123 |
-
<tr><th style="text-align:left">Kumru-2B</th>
|
| 124 |
-
<td style="background:#
|
| 125 |
-
<td style="background:#f4cccc">
|
| 126 |
-
<td style="background:#
|
| 127 |
|
| 128 |
<tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
|
| 129 |
<td style="background:#ffeb9c">45.77</td>
|
|
|
|
| 31 |
metrics:
|
| 32 |
- name: accuracy_norm
|
| 33 |
type: accuracy
|
| 34 |
+
value: 32.31
|
| 35 |
|
| 36 |
- task:
|
| 37 |
type: question-answering-extractive
|
|
|
|
| 43 |
metrics:
|
| 44 |
- name: f1
|
| 45 |
type: f1
|
| 46 |
+
value: 23.5035
|
| 47 |
|
| 48 |
- task:
|
| 49 |
type: question-answering-extractive
|
|
|
|
| 55 |
metrics:
|
| 56 |
- name: f1
|
| 57 |
type: f1
|
| 58 |
+
value: 16.4439
|
| 59 |
|
| 60 |
- task:
|
| 61 |
type: text-classification
|
|
|
|
| 67 |
metrics:
|
| 68 |
- name: accuracy_norm
|
| 69 |
type: accuracy
|
| 70 |
+
value: 51.26
|
| 71 |
+
|
| 72 |
|
| 73 |
## Evaluation (CETVEL – Turkish subsets)
|
| 74 |
|
| 75 |
**BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
|
| 76 |
+
**BEDAI-2.4B (this run, full):** MCQA **32.31**, QA **19.97** (mean of TQuAD/XQuAD-TR F1), TC **51.26**
|
| 77 |
|
| 78 |
<table>
|
| 79 |
<thead>
|
| 80 |
<tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
|
| 81 |
</thead>
|
| 82 |
<tbody>
|
|
|
|
| 83 |
<tr><th style="text-align:left">BEDAI-2B</th>
|
| 84 |
<td style="background:#f4cccc">25.70</td>
|
| 85 |
<td style="background:#f8cbad">17.97</td>
|
| 86 |
<td style="background:#ffeb9c">51.58</td></tr>
|
| 87 |
|
| 88 |
+
<tr><th style="text-align:left">BEDAI-2.4B (this work)</th>
|
| 89 |
+
<td style="background:#c6efce">32.31</td>
|
| 90 |
+
<td style="background:#c6efce">19.97</td>
|
| 91 |
+
<td style="background:#c6efce">51.26</td></tr>
|
| 92 |
+
</tbody>
|
| 93 |
+
</table>
|
| 94 |
+
|
| 95 |
+
<sub>Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).</sub>
|
| 96 |
+
|
| 97 |
|
| 98 |
<tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
|
| 99 |
<td style="background:#ffeb9c">52.47</td>
|
|
|
|
| 125 |
<td style="background:#f4cccc">8.22</td>
|
| 126 |
<td style="background:#f8cbad">46.15</td></tr>
|
| 127 |
|
| 128 |
+
<tr><th style="text-align:left">Kumru-2B (full)</th>
|
| 129 |
+
<td style="background:#f4cccc">19.59</td>
|
| 130 |
+
<td style="background:#f4cccc">10.00</td>
|
| 131 |
+
<td style="background:#f4cccc">31.62</td></tr>
|
| 132 |
|
| 133 |
<tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
|
| 134 |
<td style="background:#ffeb9c">45.77</td>
|