Safetensors
un
modernbert
hynky commited on
Commit
24f36a1
·
verified ·
1 Parent(s): 793f5c9

Add model card for unknown classifier

Browse files
Files changed (1) hide show
  1. README.md +26 -9
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  language:
3
  - un
@@ -82,7 +83,7 @@ print(max(scores))
82
  ```
83
 
84
  ## Training
85
- The classifier was trained on 0 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
86
 
87
  Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
88
  ```
@@ -117,29 +118,45 @@ After examining the extract:
117
  - Conclude with the score using the format: "Educational score: <total points>"\
118
  ```
119
 
120
- We added a classification head with a single regression output to mmbert-colab/mmBERT-base, unroze the last 4 layers and trained the model for 5000 epochs with a learning rate of 3e-4.
121
 
122
  **Training Details:**
123
 
124
- - Model: mmbert-colab/mmBERT-base with a classification head
125
- - Dataset: 0 samples from Llama3 annotations
126
- - Epochs: 1
127
  - Learning Rate: 3e-4
128
- - class distribution:
129
  - Evaluation Metric: F1 score
130
 
131
  **Classification report**
132
 
133
- We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 0 Llama3-annotated samples.
134
  ```
135
-
 
 
 
 
 
 
 
 
136
  ```
137
 
138
  **Confusion matrix**
139
 
140
  We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
141
  ```
142
-
 
 
 
 
 
 
 
 
143
  ```
144
 
145
 
 
1
+
2
  ---
3
  language:
4
  - un
 
83
  ```
84
 
85
  ## Training
86
+ The classifier was trained on 49996 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
87
 
88
  Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
89
  ```
 
118
  - Conclude with the score using the format: "Educational score: <total points>"\
119
  ```
120
 
121
+ We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
122
 
123
  **Training Details:**
124
 
125
+ - Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
126
+ - Dataset: 49996 samples from Qwen3-235B-A22B-Instruct-2507 annotations
127
+ - Steps: 5000
128
  - Learning Rate: 3e-4
129
+ - class distribution: {0: 20400, 1: 20400, 2: 3076, 3: 2040, 4: 2040, 5: 2040}
130
  - Evaluation Metric: F1 score
131
 
132
  **Classification report**
133
 
134
+ We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 11783 Qwen3-235B-A22B-Instruct-2507-annotated samples.
135
  ```
136
+ Validation Report:
137
+ | class | precision | recall | f1-score | support |
138
+ |--------:|------------:|---------:|-----------:|----------:|
139
+ | 0 | 0.78 | 0.9 | 0.83 | 7410 |
140
+ | 1 | 0.72 | 0.5 | 0.6 | 4137 |
141
+ | 2 | 0.21 | 0.37 | 0.26 | 123 |
142
+ | 3 | 0.25 | 0.36 | 0.3 | 58 |
143
+ | 4 | 0.66 | 0.62 | 0.64 | 53 |
144
+ | 5 | 0 | 0 | 0 | 2 |
145
  ```
146
 
147
  **Confusion matrix**
148
 
149
  We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
150
  ```
151
+ Confusion Matrix:
152
+ | class | 0 | 1 | 2 | 3 | 4 | 5 |
153
+ |---------:|-----:|-----:|----:|----:|----:|----:|
154
+ | 0 | 6651 | 728 | 29 | 2 | 0 | 0 |
155
+ | 1 | 1892 | 2088 | 121 | 35 | 1 | 0 |
156
+ | 2 | 6 | 54 | 45 | 15 | 3 | 0 |
157
+ | 3 | 2 | 10 | 14 | 21 | 11 | 0 |
158
+ | 4 | 0 | 1 | 8 | 11 | 33 | 0 |
159
+ | 5 | 0 | 0 | 0 | 0 | 2 | 0 |
160
  ```
161
 
162