mdonigian's picture
Upload folder using huggingface_hub
5bcbc50 verified
metadata
license: apache-2.0
base_model: answerdotai/ModernBERT-base
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceFW/fineweb
language:
  - en
tags:
  - text-classification
  - multi-label-classification
  - modernbert
  - fineweb
  - education
pipeline_tag: text-classification

Summary

A fine-tuned ModernBERT-base model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.

Model Details

Property Value
Base model answerdotai/ModernBERT-base
Architecture ModernBertForSequenceClassification
Task Multi-label classification
Number of labels 17
Max input length 512 tokens
Hidden size 768
Attention heads 12
Transformer layers 22 (alternating full + sliding window attention)
Pooling Mean pooling

Labels

Index Field Display Name
0 mathematics_statistics Mathematics Statistics
1 computer_science_software_engineering Computer Science Software Engineering
2 machine_learning_ai Machine Learning AI
3 physical_sciences Physical Sciences
4 life_sciences_biology Life Sciences Biology
5 medicine_health Medicine Health
6 engineering_technology Engineering Technology
7 business_economics Business Economics
8 law_government Law Government
9 social_sciences Social Sciences
10 history_geography History Geography
11 philosophy_ethics Philosophy Ethics
12 education_pedagogy Education Pedagogy
13 language_writing Language Writing
14 arts_humanities Arts Humanities
15 environmental_science_energy Environmental Science Energy
16 personal_finance_practical_life Personal Finance Practical Life

Training Data

  • Source: HuggingFaceFW/fineweb-edu (CC-MAIN-2021-04 shard) plus ~50K rows from HuggingFaceFW/fineweb (10BT sample)
  • Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
  • Data was split 80% train / 10% val / 10% test (random seed 42)

Training Configuration

Hyperparameter Value
Epochs 3
Batch size 32
Learning rate 2e-5
Weight decay 0.01
Warmup ratio 0.1
Max token length 512
Optimizer AdamW
Scheduler Linear with warmup
AMP bf16 (on CUDA)
Gradient clipping max norm 1.0

Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).

Test Set Performance

Metric Score
Micro F1 0.8545
Macro F1 0.8264
Precision (micro) 0.8799
Recall (micro) 0.8304
Loss 0.1222