metadata
license: apache-2.0
base_model: answerdotai/ModernBERT-base
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceFW/fineweb
language:
- en
tags:
- text-classification
- multi-label-classification
- modernbert
- fineweb
- education
pipeline_tag: text-classification
Summary
A fine-tuned ModernBERT-base model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.
Model Details
| Property | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Architecture | ModernBertForSequenceClassification |
| Task | Multi-label classification |
| Number of labels | 17 |
| Max input length | 512 tokens |
| Hidden size | 768 |
| Attention heads | 12 |
| Transformer layers | 22 (alternating full + sliding window attention) |
| Pooling | Mean pooling |
Labels
| Index | Field | Display Name |
|---|---|---|
| 0 | mathematics_statistics |
Mathematics Statistics |
| 1 | computer_science_software_engineering |
Computer Science Software Engineering |
| 2 | machine_learning_ai |
Machine Learning AI |
| 3 | physical_sciences |
Physical Sciences |
| 4 | life_sciences_biology |
Life Sciences Biology |
| 5 | medicine_health |
Medicine Health |
| 6 | engineering_technology |
Engineering Technology |
| 7 | business_economics |
Business Economics |
| 8 | law_government |
Law Government |
| 9 | social_sciences |
Social Sciences |
| 10 | history_geography |
History Geography |
| 11 | philosophy_ethics |
Philosophy Ethics |
| 12 | education_pedagogy |
Education Pedagogy |
| 13 | language_writing |
Language Writing |
| 14 | arts_humanities |
Arts Humanities |
| 15 | environmental_science_energy |
Environmental Science Energy |
| 16 | personal_finance_practical_life |
Personal Finance Practical Life |
Training Data
- Source: HuggingFaceFW/fineweb-edu (CC-MAIN-2021-04 shard) plus ~50K rows from HuggingFaceFW/fineweb (10BT sample)
- Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
- Data was split 80% train / 10% val / 10% test (random seed 42)
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Max token length | 512 |
| Optimizer | AdamW |
| Scheduler | Linear with warmup |
| AMP | bf16 (on CUDA) |
| Gradient clipping | max norm 1.0 |
Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).
Test Set Performance
| Metric | Score |
|---|---|
| Micro F1 | 0.8545 |
| Macro F1 | 0.8264 |
| Precision (micro) | 0.8799 |
| Recall (micro) | 0.8304 |
| Loss | 0.1222 |