metadata
license: mit
language:
- 'no'
- nn
- da
- sv
- en
Scandinavian Education Classifier Snowflake
!!! We recomment using our bert-based model instead for production
Trained using code from: [CosmoPedia)[]https://github.com/huggingface/cosmopedia/tree/main/classification], and the nb-bert-base as starting point. The data used in classification is from GlotCC and have been annotated using Gemini 1.5 Flash.
The following command where used for training:
python train_edu_bert.py --base_model_name="NbAiLab/nb-bert-base" --dataset_name="north/scandinavian-educational-annotations" --target_column="score" --checkpoint_dir="/home/pere/checkpoints/scandinavian_bert/"
Classification Report
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| 0 | 0.76 | 0.64 | 0.70 | 18274 |
| 1 | 0.63 | 0.76 | 0.69 | 23348 |
| 2 | 0.48 | 0.40 | 0.43 | 6621 |
| 3 | 0.57 | 0.28 | 0.38 | 1314 |
| 4 | 0.56 | 0.06 | 0.12 | 433 |
| 5 | 0.00 | 0.00 | 0.00 | 10 |
| Metric | Value |
|---|---|
| Accuracy | 0.65 |
| Macro Avg | |
| - Precision | 0.50 |
| - Recall | 0.36 |
| - F1-Score | 0.38 |
| Weighted Avg | |
| - Precision | 0.65 |
| - Recall | 0.65 |
| - F1-Score | 0.64 |
| Total Support | 50000 |
Confusion Matrix
| Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | |
|---|---|---|---|---|---|---|
| Class 0 | 11725 | 6460 | 88 | 1 | 0 | 0 |
| Class 1 | 3598 | 17758 | 1978 | 14 | 0 | 0 |
| Class 2 | 128 | 3733 | 2618 | 142 | 0 | 0 |
| Class 3 | 6 | 272 | 645 | 369 | 22 | 0 |
| Class 4 | 2 | 121 | 161 | 121 | 28 | 0 |
| Class 5 | 0 | 2 | 8 | 0 | 0 | 0 |
Evaluation Metrics
| Metric | Value |
|---|---|
| Eval Loss | 0.3311704695224762 |
| Eval Precision | 0.49857140934204414 |
| Eval Recall | 0.35718277242555724 |
| Eval F1 Macro | 0.38442290605864393 |
| Eval Accuracy | 0.64996 |
| Eval Runtime | 86.1773 |
| Eval Samples Per Second | 580.199 |
| Eval Steps Per Second | 4.537 |
| Epoch | 19.91 |
Training Metrics
| Metric | Value |
|---|---|
| Loss | 0.318 |
| Grad Norm | 0.6617229580879211 |
| Learning Rate | 5.119453924914675e-07 |
| Epoch | 19.97 |
Training Runtime
| Metric | Value |
|---|---|
| Train Runtime | 19583.1034 |
| Train Samples Per Second | 459.58 |
| Train Steps Per Second | 1.795 |
| Train Loss | 0.341879387194793 |
| Epoch | 20.0 |