nurcunal commited on
Commit
aa32592
·
verified ·
1 Parent(s): 5707f35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -14
README.md CHANGED
@@ -31,7 +31,7 @@ model-index:
31
  metrics:
32
  - name: accuracy_norm
33
  type: accuracy
34
- value: 34.00
35
 
36
  - task:
37
  type: question-answering-extractive
@@ -43,7 +43,7 @@ model-index:
43
  metrics:
44
  - name: f1
45
  type: f1
46
- value: 19.5659
47
 
48
  - task:
49
  type: question-answering-extractive
@@ -55,7 +55,7 @@ model-index:
55
  metrics:
56
  - name: f1
57
  type: f1
58
- value: 38.2748
59
 
60
  - task:
61
  type: text-classification
@@ -67,28 +67,33 @@ model-index:
67
  metrics:
68
  - name: accuracy_norm
69
  type: accuracy
70
- value: 52.00
 
71
 
72
  ## Evaluation (CETVEL – Turkish subsets)
73
 
74
  **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
75
- **BEDAI-2.4B (this run, limit=50):** MCQA **34.00**, QA **28.92** (mean of TQuAD/XQuAD-TR F1), TC **52.00**
76
 
77
  <table>
78
  <thead>
79
  <tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
80
  </thead>
81
  <tbody>
82
-
83
  <tr><th style="text-align:left">BEDAI-2B</th>
84
  <td style="background:#f4cccc">25.70</td>
85
  <td style="background:#f8cbad">17.97</td>
86
  <td style="background:#ffeb9c">51.58</td></tr>
87
 
88
- <tr><th style="text-align:left">BEDAI-2.4B (this work) </th>
89
- <td style="background:#c6efce">34.00</td>
90
- <td style="background:#c6efce">28.92</td>
91
- <td style="background:#c6efce">52.00</td></tr>
 
 
 
 
 
92
 
93
  <tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
94
  <td style="background:#ffeb9c">52.47</td>
@@ -120,10 +125,10 @@ model-index:
120
  <td style="background:#f4cccc">8.22</td>
121
  <td style="background:#f8cbad">46.15</td></tr>
122
 
123
- <tr><th style="text-align:left">Kumru-2B</th>
124
- <td style="background:#f8cbad">39.69</td>
125
- <td style="background:#f4cccc">6.50</td>
126
- <td style="background:#ffeb9c">47.57</td></tr>
127
 
128
  <tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
129
  <td style="background:#ffeb9c">45.77</td>
 
31
  metrics:
32
  - name: accuracy_norm
33
  type: accuracy
34
+ value: 32.31
35
 
36
  - task:
37
  type: question-answering-extractive
 
43
  metrics:
44
  - name: f1
45
  type: f1
46
+ value: 23.5035
47
 
48
  - task:
49
  type: question-answering-extractive
 
55
  metrics:
56
  - name: f1
57
  type: f1
58
+ value: 16.4439
59
 
60
  - task:
61
  type: text-classification
 
67
  metrics:
68
  - name: accuracy_norm
69
  type: accuracy
70
+ value: 51.26
71
+
72
 
73
  ## Evaluation (CETVEL – Turkish subsets)
74
 
75
  **BEDAI-2B:** MCQA **25.70**, QA **17.97**, TC **51.58**
76
+ **BEDAI-2.4B (this run, full):** MCQA **32.31**, QA **19.97** (mean of TQuAD/XQuAD-TR F1), TC **51.26**
77
 
78
  <table>
79
  <thead>
80
  <tr><th style="text-align:left">Model</th><th>MCQA</th><th>QA</th><th>TC</th></tr>
81
  </thead>
82
  <tbody>
 
83
  <tr><th style="text-align:left">BEDAI-2B</th>
84
  <td style="background:#f4cccc">25.70</td>
85
  <td style="background:#f8cbad">17.97</td>
86
  <td style="background:#ffeb9c">51.58</td></tr>
87
 
88
+ <tr><th style="text-align:left">BEDAI-2.4B (this work)</th>
89
+ <td style="background:#c6efce">32.31</td>
90
+ <td style="background:#c6efce">19.97</td>
91
+ <td style="background:#c6efce">51.26</td></tr>
92
+ </tbody>
93
+ </table>
94
+
95
+ <sub>Setup: `lm-evaluation-harness` (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no `--limit`).</sub>
96
+
97
 
98
  <tr><th style="text-align:left">CohereLabs__aya-expanse-32b</th>
99
  <td style="background:#ffeb9c">52.47</td>
 
125
  <td style="background:#f4cccc">8.22</td>
126
  <td style="background:#f8cbad">46.15</td></tr>
127
 
128
+ <tr><th style="text-align:left">Kumru-2B (full)</th>
129
+ <td style="background:#f4cccc">19.59</td>
130
+ <td style="background:#f4cccc">10.00</td>
131
+ <td style="background:#f4cccc">31.62</td></tr>
132
 
133
  <tr><th style="text-align:left">Llama-3.1-8B-Instruct</th>
134
  <td style="background:#ffeb9c">45.77</td>