Add model card for ron_Latn classifier
Browse files
README.md
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- ro
|
|
@@ -82,7 +83,7 @@ print(max(scores))
|
|
| 82 |
```
|
| 83 |
|
| 84 |
## Training
|
| 85 |
-
The classifier was trained on
|
| 86 |
|
| 87 |
Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
|
| 88 |
```
|
|
@@ -117,29 +118,45 @@ After examining the extract:
|
|
| 117 |
- Conclude with the score using the format: "Educational score: <total points>"\
|
| 118 |
```
|
| 119 |
|
| 120 |
-
We added a classification head with a single regression output to
|
| 121 |
|
| 122 |
**Training Details:**
|
| 123 |
|
| 124 |
-
- Model:
|
| 125 |
-
- Dataset:
|
| 126 |
-
-
|
| 127 |
- Learning Rate: 3e-4
|
| 128 |
-
- class distribution:
|
| 129 |
- Evaluation Metric: F1 score
|
| 130 |
|
| 131 |
**Classification report**
|
| 132 |
|
| 133 |
-
We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of
|
| 134 |
```
|
| 135 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
```
|
| 137 |
|
| 138 |
**Confusion matrix**
|
| 139 |
|
| 140 |
We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
|
| 141 |
```
|
| 142 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
```
|
| 144 |
|
| 145 |
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
language:
|
| 4 |
- ro
|
|
|
|
| 83 |
```
|
| 84 |
|
| 85 |
## Training
|
| 86 |
+
The classifier was trained on 144960 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
|
| 87 |
|
| 88 |
Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
|
| 89 |
```
|
|
|
|
| 118 |
- Conclude with the score using the format: "Educational score: <total points>"\
|
| 119 |
```
|
| 120 |
|
| 121 |
+
We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
|
| 122 |
|
| 123 |
**Training Details:**
|
| 124 |
|
| 125 |
+
- Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
|
| 126 |
+
- Dataset: 144960 samples from Qwen3-235B-A22B-Instruct-2507 annotations
|
| 127 |
+
- Steps: 5000
|
| 128 |
- Learning Rate: 3e-4
|
| 129 |
+
- class distribution: {0: 60400, 1: 60400, 2: 6040, 3: 6040, 4: 6040, 5: 6040}
|
| 130 |
- Evaluation Metric: F1 score
|
| 131 |
|
| 132 |
**Classification report**
|
| 133 |
|
| 134 |
+
We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 13929 Qwen3-235B-A22B-Instruct-2507-annotated samples.
|
| 135 |
```
|
| 136 |
+
Validation Report:
|
| 137 |
+
| class | precision | recall | f1-score | support |
|
| 138 |
+
|--------:|------------:|---------:|-----------:|----------:|
|
| 139 |
+
| 0 | 0.9 | 0.74 | 0.81 | 8966 |
|
| 140 |
+
| 1 | 0.61 | 0.81 | 0.69 | 4662 |
|
| 141 |
+
| 2 | 0.28 | 0.39 | 0.32 | 183 |
|
| 142 |
+
| 3 | 0.34 | 0.43 | 0.38 | 58 |
|
| 143 |
+
| 4 | 0.6 | 0.61 | 0.61 | 54 |
|
| 144 |
+
| 5 | 0.2 | 0.17 | 0.18 | 6 |
|
| 145 |
```
|
| 146 |
|
| 147 |
**Confusion matrix**
|
| 148 |
|
| 149 |
We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
|
| 150 |
```
|
| 151 |
+
Confusion Matrix:
|
| 152 |
+
| class | 0 | 1 | 2 | 3 | 4 | 5 |
|
| 153 |
+
|---------:|-----:|-----:|----:|----:|----:|----:|
|
| 154 |
+
| 0 | 6625 | 2339 | 2 | 0 | 0 | 0 |
|
| 155 |
+
| 1 | 732 | 3754 | 160 | 13 | 3 | 0 |
|
| 156 |
+
| 2 | 0 | 82 | 71 | 25 | 5 | 0 |
|
| 157 |
+
| 3 | 0 | 5 | 17 | 25 | 11 | 0 |
|
| 158 |
+
| 4 | 0 | 3 | 4 | 10 | 33 | 4 |
|
| 159 |
+
| 5 | 0 | 0 | 1 | 1 | 3 | 1 |
|
| 160 |
```
|
| 161 |
|
| 162 |
|