HuggingFaceFW
/

finepdfs_edu_classifier_unknown

Safetensors

modernbert

Model card Files Files and versions

xet

Community

hynky commited on Oct 6, 2025

Commit

24f36a1

verified ·

1 Parent(s): 793f5c9

Add model card for unknown classifier

Browse files

Files changed (1) hide show

README.md +26 -9

README.md CHANGED Viewed

@@ -1,3 +1,4 @@
 ---
 language:
 - un
@@ -82,7 +83,7 @@ print(max(scores))
 ```
 ## Training
-The classifier was trained on 0 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
@@ -117,29 +118,45 @@ After examining the extract:
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
-We added a classification head with a single regression output to mmbert-colab/mmBERT-base, unroze the last 4 layers and trained the model for 5000 epochs with a learning rate of 3e-4.
 **Training Details:**
-- Model: mmbert-colab/mmBERT-base with a classification head
-- Dataset: 0 samples from Llama3 annotations
-- Epochs: 1
 - Learning Rate: 3e-4
-- class distribution:
 - Evaluation Metric: F1 score
 **Classification report**
-We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 0 Llama3-annotated samples.
 ```
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
 ```

 ---
 language:
 - un
 ```
 ## Training
+The classifier was trained on 49996 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
+We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
 **Training Details:**
+- Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
+- Dataset: 49996 samples from Qwen3-235B-A22B-Instruct-2507 annotations
+- Steps: 5000
 - Learning Rate: 3e-4
+- class distribution: {0: 20400, 1: 20400, 2: 3076, 3: 2040, 4: 2040, 5: 2040}
 - Evaluation Metric: F1 score
 **Classification report**
+We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 11783 Qwen3-235B-A22B-Instruct-2507-annotated samples.
 ```
+Validation Report:
+|   class |   precision |   recall |   f1-score |   support |
+|--------:|------------:|---------:|-----------:|----------:|
+|       0 |        0.78 |     0.9  |       0.83 |      7410 |
+|       1 |        0.72 |     0.5  |       0.6  |      4137 |
+|       2 |        0.21 |     0.37 |       0.26 |       123 |
+|       3 |        0.25 |     0.36 |       0.3  |        58 |
+|       4 |        0.66 |     0.62 |       0.64 |        53 |
+|       5 |        0    |     0    |       0    |         2 |
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
+Confusion Matrix:
+|   class  |    0 |    1 |   2 |   3 |   4 |   5 |
+|---------:|-----:|-----:|----:|----:|----:|----:|
+|        0 | 6651 |  728 |  29 |   2 |   0 |   0 |
+|        1 | 1892 | 2088 | 121 |  35 |   1 |   0 |
+|        2 |    6 |   54 |  45 |  15 |   3 |   0 |
+|        3 |    2 |   10 |  14 |  21 |  11 |   0 |
+|        4 |    0 |    1 |   8 |  11 |  33 |   0 |
+|        5 |    0 |    0 |   0 |   0 |   2 |   0 |
 ```