Add model card for swe_Latn classifier
Browse files
README.md
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- sw
|
|
@@ -82,7 +83,7 @@ print(max(scores))
|
|
| 82 |
```
|
| 83 |
|
| 84 |
## Training
|
| 85 |
-
The classifier was trained on
|
| 86 |
|
| 87 |
Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
|
| 88 |
```
|
|
@@ -117,29 +118,45 @@ After examining the extract:
|
|
| 117 |
- Conclude with the score using the format: "Educational score: <total points>"\
|
| 118 |
```
|
| 119 |
|
| 120 |
-
We added a classification head with a single regression output to
|
| 121 |
|
| 122 |
**Training Details:**
|
| 123 |
|
| 124 |
-
- Model:
|
| 125 |
-
- Dataset:
|
| 126 |
-
-
|
| 127 |
- Learning Rate: 3e-4
|
| 128 |
-
- class distribution:
|
| 129 |
- Evaluation Metric: F1 score
|
| 130 |
|
| 131 |
**Classification report**
|
| 132 |
|
| 133 |
-
We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of
|
| 134 |
```
|
| 135 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
```
|
| 137 |
|
| 138 |
**Confusion matrix**
|
| 139 |
|
| 140 |
We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
|
| 141 |
```
|
| 142 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
```
|
| 144 |
|
| 145 |
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
language:
|
| 4 |
- sw
|
|
|
|
| 83 |
```
|
| 84 |
|
| 85 |
## Training
|
| 86 |
+
The classifier was trained on 343680 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
|
| 87 |
|
| 88 |
Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
|
| 89 |
```
|
|
|
|
| 118 |
- Conclude with the score using the format: "Educational score: <total points>"\
|
| 119 |
```
|
| 120 |
|
| 121 |
+
We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
|
| 122 |
|
| 123 |
**Training Details:**
|
| 124 |
|
| 125 |
+
- Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
|
| 126 |
+
- Dataset: 343680 samples from Qwen3-235B-A22B-Instruct-2507 annotations
|
| 127 |
+
- Steps: 5000
|
| 128 |
- Learning Rate: 3e-4
|
| 129 |
+
- class distribution: {0: 143200, 1: 143200, 2: 14320, 3: 14320, 4: 14320, 5: 14320}
|
| 130 |
- Evaluation Metric: F1 score
|
| 131 |
|
| 132 |
**Classification report**
|
| 133 |
|
| 134 |
+
We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 14402 Qwen3-235B-A22B-Instruct-2507-annotated samples.
|
| 135 |
```
|
| 136 |
+
Validation Report:
|
| 137 |
+
| class | precision | recall | f1-score | support |
|
| 138 |
+
|--------:|------------:|---------:|-----------:|----------:|
|
| 139 |
+
| 0 | 0.76 | 0.79 | 0.77 | 6308 |
|
| 140 |
+
| 1 | 0.79 | 0.75 | 0.77 | 7539 |
|
| 141 |
+
| 2 | 0.34 | 0.43 | 0.38 | 349 |
|
| 142 |
+
| 3 | 0.26 | 0.4 | 0.32 | 104 |
|
| 143 |
+
| 4 | 0.51 | 0.47 | 0.49 | 88 |
|
| 144 |
+
| 5 | 0.27 | 0.57 | 0.36 | 14 |
|
| 145 |
```
|
| 146 |
|
| 147 |
**Confusion matrix**
|
| 148 |
|
| 149 |
We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
|
| 150 |
```
|
| 151 |
+
Confusion Matrix:
|
| 152 |
+
| class | 0 | 1 | 2 | 3 | 4 | 5 |
|
| 153 |
+
|---------:|-----:|-----:|----:|----:|----:|----:|
|
| 154 |
+
| 0 | 4986 | 1318 | 3 | 1 | 0 | 0 |
|
| 155 |
+
| 1 | 1594 | 5634 | 261 | 45 | 5 | 0 |
|
| 156 |
+
| 2 | 1 | 137 | 151 | 51 | 8 | 1 |
|
| 157 |
+
| 3 | 0 | 12 | 29 | 42 | 21 | 0 |
|
| 158 |
+
| 4 | 0 | 1 | 5 | 20 | 41 | 21 |
|
| 159 |
+
| 5 | 0 | 0 | 0 | 1 | 5 | 8 |
|
| 160 |
```
|
| 161 |
|
| 162 |
|