--- license: apache-2.0 base_model: answerdotai/ModernBERT-base datasets: - HuggingFaceFW/fineweb-edu - HuggingFaceFW/fineweb language: - en tags: - text-classification - multi-label-classification - modernbert - fineweb - education pipeline_tag: text-classification --- ## Summary A fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model for **multi-label subject classification** of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply. ## Model Details | Property | Value | |---|---| | Base model | `answerdotai/ModernBERT-base` | | Architecture | `ModernBertForSequenceClassification` | | Task | Multi-label classification | | Number of labels | 17 | | Max input length | 512 tokens | | Hidden size | 768 | | Attention heads | 12 | | Transformer layers | 22 (alternating full + sliding window attention) | | Pooling | Mean pooling | ## Labels | Index | Field | Display Name | |---|---|---| | 0 | `mathematics_statistics` | Mathematics Statistics | | 1 | `computer_science_software_engineering` | Computer Science Software Engineering | | 2 | `machine_learning_ai` | Machine Learning AI | | 3 | `physical_sciences` | Physical Sciences | | 4 | `life_sciences_biology` | Life Sciences Biology | | 5 | `medicine_health` | Medicine Health | | 6 | `engineering_technology` | Engineering Technology | | 7 | `business_economics` | Business Economics | | 8 | `law_government` | Law Government | | 9 | `social_sciences` | Social Sciences | | 10 | `history_geography` | History Geography | | 11 | `philosophy_ethics` | Philosophy Ethics | | 12 | `education_pedagogy` | Education Pedagogy | | 13 | `language_writing` | Language Writing | | 14 | `arts_humanities` | Arts Humanities | | 15 | `environmental_science_energy` | Environmental Science Energy | | 16 | `personal_finance_practical_life` | Personal Finance Practical Life | ## Training Data - Source: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (CC-MAIN-2021-04 shard) plus ~50K rows from [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (10BT sample) - Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits) - Data was split 80% train / 10% val / 10% test (random seed 42) ## Training Configuration | Hyperparameter | Value | |---|---| | Epochs | 3 | | Batch size | 32 | | Learning rate | 2e-5 | | Weight decay | 0.01 | | Warmup ratio | 0.1 | | Max token length | 512 | | Optimizer | AdamW | | Scheduler | Linear with warmup | | AMP | bf16 (on CUDA) | | Gradient clipping | max norm 1.0 | Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2). ## Test Set Performance | Metric | Score | |---|---| | Micro F1 | **0.8545** | | Macro F1 | **0.8264** | | Precision (micro) | **0.8799** | | Recall (micro) | **0.8304** | | Loss | 0.1222 |