---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceFW/fineweb
language:
  - en
tags:
  - text-classification
  - multi-label-classification
  - modernbert
  - fineweb
  - education
pipeline_tag: text-classification
---

## Summary

A fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model for **multi-label subject classification** of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.

## Model Details

| Property | Value |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `ModernBertForSequenceClassification` |
| Task | Multi-label classification |
| Number of labels | 17 |
| Max input length | 512 tokens |
| Hidden size | 768 |
| Attention heads | 12 |
| Transformer layers | 22 (alternating full + sliding window attention) |
| Pooling | Mean pooling |

## Labels

| Index | Field | Display Name |
|---|---|---|
| 0 | `mathematics_statistics` | Mathematics Statistics |
| 1 | `computer_science_software_engineering` | Computer Science Software Engineering |
| 2 | `machine_learning_ai` | Machine Learning AI |
| 3 | `physical_sciences` | Physical Sciences |
| 4 | `life_sciences_biology` | Life Sciences Biology |
| 5 | `medicine_health` | Medicine Health |
| 6 | `engineering_technology` | Engineering Technology |
| 7 | `business_economics` | Business Economics |
| 8 | `law_government` | Law Government |
| 9 | `social_sciences` | Social Sciences |
| 10 | `history_geography` | History Geography |
| 11 | `philosophy_ethics` | Philosophy Ethics |
| 12 | `education_pedagogy` | Education Pedagogy |
| 13 | `language_writing` | Language Writing |
| 14 | `arts_humanities` | Arts Humanities |
| 15 | `environmental_science_energy` | Environmental Science Energy |
| 16 | `personal_finance_practical_life` | Personal Finance Practical Life |

## Training Data

- Source: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (CC-MAIN-2021-04 shard) plus ~50K rows from [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (10BT sample)
- Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
- Data was split 80% train / 10% val / 10% test (random seed 42)

## Training Configuration

| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Max token length | 512 |
| Optimizer | AdamW |
| Scheduler | Linear with warmup |
| AMP | bf16 (on CUDA) |
| Gradient clipping | max norm 1.0 |

Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).

## Test Set Performance

| Metric | Score |
|---|---|
| Micro F1 | **0.8545** |
| Macro F1 | **0.8264** |
| Precision (micro) | **0.8799** |
| Recall (micro) | **0.8304** |
| Loss | 0.1222 |