HuggingFaceFW
/

finepdfs_edu_classifier_ces_Latn

Safetensors

Chechen

modernbert

Model card Files Files and versions

xet

Community

hynky commited on Oct 6, 2025

Commit

14740f9

verified ·

1 Parent(s): 006c668

Add model card for ces_Latn classifier

Browse files

Files changed (1) hide show

README.md +26 -9

README.md CHANGED Viewed

@@ -1,3 +1,4 @@
 ---
 language:
 - ce
@@ -82,7 +83,7 @@ print(max(scores))
 ```
 ## Training
-The classifier was trained on 0 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
@@ -117,29 +118,45 @@ After examining the extract:
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
-We added a classification head with a single regression output to mmbert-colab/mmBERT-base, unroze the last 4 layers and trained the model for 5000 epochs with a learning rate of 3e-4.
 **Training Details:**
-- Model: mmbert-colab/mmBERT-base with a classification head
-- Dataset: 0 samples from Llama3 annotations
-- Epochs: 1
 - Learning Rate: 3e-4
-- class distribution:
 - Evaluation Metric: F1 score
 **Classification report**
-We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 0 Llama3-annotated samples.
 ```
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
 ```

 ---
 language:
 - ce
 ```
 ## Training
+The classifier was trained on 285120 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
+We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
 **Training Details:**
+- Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
+- Dataset: 285120 samples from Qwen3-235B-A22B-Instruct-2507 annotations
+- Steps: 5000
 - Learning Rate: 3e-4
+- class distribution: {0: 118800, 1: 118800, 2: 11880, 3: 11880, 4: 11880, 5: 11880}
 - Evaluation Metric: F1 score
 **Classification report**
+We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 13955 Qwen3-235B-A22B-Instruct-2507-annotated samples.
 ```
+Validation Report:
+|   class |   precision |   recall |   f1-score |   support |
+|--------:|------------:|---------:|-----------:|----------:|
+|       0 |        0.8  |     0.8  |       0.8  |      6818 |
+|       1 |        0.76 |     0.77 |       0.77 |      6526 |
+|       2 |        0.37 |     0.33 |       0.35 |       369 |
+|       3 |        0.31 |     0.41 |       0.35 |       126 |
+|       4 |        0.61 |     0.53 |       0.57 |       104 |
+|       5 |        0.5  |     0.5  |       0.5  |        12 |
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
+Confusion Matrix:
+|   class  |    0 |    1 |   2 |   3 |   4 |   5 |
+|---------:|-----:|-----:|----:|----:|----:|----:|
+|        0 | 5461 | 1355 |   2 |   0 |   0 |   0 |
+|        1 | 1323 | 5017 | 155 |  29 |   2 |   0 |
+|        2 |    0 |  184 | 120 |  57 |   8 |   0 |
+|        3 |    0 |   16 |  38 |  52 |  20 |   0 |
+|        4 |    0 |    3 |   9 |  31 |  55 |   6 |
+|        5 |    0 |    0 |   0 |   1 |   5 |   6 |
 ```