| | --- |
| | license: apache-2.0 |
| | base_model: answerdotai/ModernBERT-base |
| | datasets: |
| | - HuggingFaceFW/fineweb-edu |
| | - HuggingFaceFW/fineweb |
| | language: |
| | - en |
| | tags: |
| | - text-classification |
| | - multi-label-classification |
| | - modernbert |
| | - fineweb |
| | - education |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | ## Summary |
| |
|
| | A fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model for **multi-label subject classification** of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |---|---| |
| | | Base model | `answerdotai/ModernBERT-base` | |
| | | Architecture | `ModernBertForSequenceClassification` | |
| | | Task | Multi-label classification | |
| | | Number of labels | 17 | |
| | | Max input length | 512 tokens | |
| | | Hidden size | 768 | |
| | | Attention heads | 12 | |
| | | Transformer layers | 22 (alternating full + sliding window attention) | |
| | | Pooling | Mean pooling | |
| |
|
| | ## Labels |
| |
|
| | | Index | Field | Display Name | |
| | |---|---|---| |
| | | 0 | `mathematics_statistics` | Mathematics Statistics | |
| | | 1 | `computer_science_software_engineering` | Computer Science Software Engineering | |
| | | 2 | `machine_learning_ai` | Machine Learning AI | |
| | | 3 | `physical_sciences` | Physical Sciences | |
| | | 4 | `life_sciences_biology` | Life Sciences Biology | |
| | | 5 | `medicine_health` | Medicine Health | |
| | | 6 | `engineering_technology` | Engineering Technology | |
| | | 7 | `business_economics` | Business Economics | |
| | | 8 | `law_government` | Law Government | |
| | | 9 | `social_sciences` | Social Sciences | |
| | | 10 | `history_geography` | History Geography | |
| | | 11 | `philosophy_ethics` | Philosophy Ethics | |
| | | 12 | `education_pedagogy` | Education Pedagogy | |
| | | 13 | `language_writing` | Language Writing | |
| | | 14 | `arts_humanities` | Arts Humanities | |
| | | 15 | `environmental_science_energy` | Environmental Science Energy | |
| | | 16 | `personal_finance_practical_life` | Personal Finance Practical Life | |
| |
|
| | ## Training Data |
| |
|
| | - Source: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (CC-MAIN-2021-04 shard) plus ~50K rows from [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (10BT sample) |
| | - Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits) |
| | - Data was split 80% train / 10% val / 10% test (random seed 42) |
| |
|
| | ## Training Configuration |
| |
|
| | | Hyperparameter | Value | |
| | |---|---| |
| | | Epochs | 3 | |
| | | Batch size | 32 | |
| | | Learning rate | 2e-5 | |
| | | Weight decay | 0.01 | |
| | | Warmup ratio | 0.1 | |
| | | Max token length | 512 | |
| | | Optimizer | AdamW | |
| | | Scheduler | Linear with warmup | |
| | | AMP | bf16 (on CUDA) | |
| | | Gradient clipping | max norm 1.0 | |
| |
|
| | Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2). |
| |
|
| | ## Test Set Performance |
| |
|
| | | Metric | Score | |
| | |---|---| |
| | | Micro F1 | **0.8545** | |
| | | Macro F1 | **0.8264** | |
| | | Precision (micro) | **0.8799** | |
| | | Recall (micro) | **0.8304** | |
| | | Loss | 0.1222 | |
| |
|