HuggingFaceFW
/

finepdfs_edu_classifier_swe_Latn

Safetensors

Swahili

modernbert

Model card Files Files and versions

xet

Community

hynky commited on Oct 6, 2025

Commit

76bb5a6

verified ·

1 Parent(s): 6c577af

Add model card for swe_Latn classifier

Browse files

Files changed (1) hide show

README.md +26 -9

README.md CHANGED Viewed

@@ -1,3 +1,4 @@
 ---
 language:
 - sw
@@ -82,7 +83,7 @@ print(max(scores))
 ```
 ## Training
-The classifier was trained on 0 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
@@ -117,29 +118,45 @@ After examining the extract:
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
-We added a classification head with a single regression output to mmbert-colab/mmBERT-base, unroze the last 4 layers and trained the model for 5000 epochs with a learning rate of 3e-4.
 **Training Details:**
-- Model: mmbert-colab/mmBERT-base with a classification head
-- Dataset: 0 samples from Llama3 annotations
-- Epochs: 1
 - Learning Rate: 3e-4
-- class distribution:
 - Evaluation Metric: F1 score
 **Classification report**
-We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 0 Llama3-annotated samples.
 ```
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
 ```

 ---
 language:
 - sw
 ```
 ## Training
+The classifier was trained on 343680 pairs of web samples and their scores from 0 to 5, generated by Qwen3-235B-A22B-Instruct-2507. The samples were annotated based on their educational quality with 0 being not educational and 5 being highly educational.
 Below is the prompt used for Qwen3-235B-A22B-Instruct-2507 annotations:
 ```
 - Conclude with the score using the format: "Educational score: <total points>"\
 ```
+We added a classification head with a single regression output to [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base), unroze the last 4 layers and trained the model for 5000 steps with a learning rate of 3e-4.
 **Training Details:**
+- Model: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with a classification head
+- Dataset: 343680 samples from Qwen3-235B-A22B-Instruct-2507 annotations
+- Steps: 5000
 - Learning Rate: 3e-4
+- class distribution: {0: 143200, 1: 143200, 2: 14320, 3: 14320, 4: 14320, 5: 14320}
 - Evaluation Metric: F1 score
 **Classification report**
+We treat the regression model's predictions as discrete classes to calculate the metrics on a hold-out set of 14402 Qwen3-235B-A22B-Instruct-2507-annotated samples.
 ```
+Validation Report:
+|   class |   precision |   recall |   f1-score |   support |
+|--------:|------------:|---------:|-----------:|----------:|
+|       0 |        0.76 |     0.79 |       0.77 |      6308 |
+|       1 |        0.79 |     0.75 |       0.77 |      7539 |
+|       2 |        0.34 |     0.43 |       0.38 |       349 |
+|       3 |        0.26 |     0.4  |       0.32 |       104 |
+|       4 |        0.51 |     0.47 |       0.49 |        88 |
+|       5 |        0.27 |     0.57 |       0.36 |        14 |
 ```
 **Confusion matrix**
 We verify that the predicted educational scores are indeed close to their ground truth, and are mostry impacted by the noisy annotation.
 ```
+Confusion Matrix:
+|   class  |    0 |    1 |   2 |   3 |   4 |   5 |
+|---------:|-----:|-----:|----:|----:|----:|----:|
+|        0 | 4986 | 1318 |   3 |   1 |   0 |   0 |
+|        1 | 1594 | 5634 | 261 |  45 |   5 |   0 |
+|        2 |    1 |  137 | 151 |  51 |   8 |   1 |
+|        3 |    0 |   12 |  29 |  42 |  21 |   0 |
+|        4 |    0 |    1 |   5 |  20 |  41 |  21 |
+|        5 |    0 |    0 |   0 |   1 |   5 |   8 |
 ```