Upload folder using huggingface_hub

5bcbc50 verified 14 days ago

2.94 kB

license: apache-2.0
base_model: answerdotai/ModernBERT-base
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceFW/fineweb
language:
  - en
tags:
  - text-classification
  - multi-label-classification
  - modernbert
  - fineweb
  - education
pipeline_tag: text-classification

Summary

A fine-tuned ModernBERT-base model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.

Model Details

Property	Value
Base model	`answerdotai/ModernBERT-base`
Architecture	`ModernBertForSequenceClassification`
Task	Multi-label classification
Number of labels	17
Max input length	512 tokens
Hidden size	768
Attention heads	12
Transformer layers	22 (alternating full + sliding window attention)
Pooling	Mean pooling

Labels

Index	Field	Display Name
0	`mathematics_statistics`	Mathematics Statistics
1	`computer_science_software_engineering`	Computer Science Software Engineering
2	`machine_learning_ai`	Machine Learning AI
3	`physical_sciences`	Physical Sciences
4	`life_sciences_biology`	Life Sciences Biology
5	`medicine_health`	Medicine Health
6	`engineering_technology`	Engineering Technology
7	`business_economics`	Business Economics
8	`law_government`	Law Government
9	`social_sciences`	Social Sciences
10	`history_geography`	History Geography
11	`philosophy_ethics`	Philosophy Ethics
12	`education_pedagogy`	Education Pedagogy
13	`language_writing`	Language Writing
14	`arts_humanities`	Arts Humanities
15	`environmental_science_energy`	Environmental Science Energy
16	`personal_finance_practical_life`	Personal Finance Practical Life

Training Data

Source: HuggingFaceFW/fineweb-edu (CC-MAIN-2021-04 shard) plus ~50K rows from HuggingFaceFW/fineweb (10BT sample)
Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
Data was split 80% train / 10% val / 10% test (random seed 42)

Training Configuration

Hyperparameter	Value
Epochs	3
Batch size	32
Learning rate	2e-5
Weight decay	0.01
Warmup ratio	0.1
Max token length	512
Optimizer	AdamW
Scheduler	Linear with warmup
AMP	bf16 (on CUDA)
Gradient clipping	max norm 1.0

Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).

Test Set Performance

Metric	Score
Micro F1	0.8545
Macro F1	0.8264
Precision (micro)	0.8799
Recall (micro)	0.8304
Loss	0.1222