Upload folder using huggingface_hub

5bcbc50 verified 14 days ago

2.94 kB

	---
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceFW/fineweb
	language:
	- en
	tags:
	- text-classification
	- multi-label-classification
	- modernbert
	- fineweb
	- education
	pipeline_tag: text-classification
	---

	## Summary

	A fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Architecture \| `ModernBertForSequenceClassification` \|
	\| Task \| Multi-label classification \|
	\| Number of labels \| 17 \|
	\| Max input length \| 512 tokens \|
	\| Hidden size \| 768 \|
	\| Attention heads \| 12 \|
	\| Transformer layers \| 22 (alternating full + sliding window attention) \|
	\| Pooling \| Mean pooling \|

	## Labels

	\| Index \| Field \| Display Name \|
	\|---\|---\|---\|
	\| 0 \| `mathematics_statistics` \| Mathematics Statistics \|
	\| 1 \| `computer_science_software_engineering` \| Computer Science Software Engineering \|
	\| 2 \| `machine_learning_ai` \| Machine Learning AI \|
	\| 3 \| `physical_sciences` \| Physical Sciences \|
	\| 4 \| `life_sciences_biology` \| Life Sciences Biology \|
	\| 5 \| `medicine_health` \| Medicine Health \|
	\| 6 \| `engineering_technology` \| Engineering Technology \|
	\| 7 \| `business_economics` \| Business Economics \|
	\| 8 \| `law_government` \| Law Government \|
	\| 9 \| `social_sciences` \| Social Sciences \|
	\| 10 \| `history_geography` \| History Geography \|
	\| 11 \| `philosophy_ethics` \| Philosophy Ethics \|
	\| 12 \| `education_pedagogy` \| Education Pedagogy \|
	\| 13 \| `language_writing` \| Language Writing \|
	\| 14 \| `arts_humanities` \| Arts Humanities \|
	\| 15 \| `environmental_science_energy` \| Environmental Science Energy \|
	\| 16 \| `personal_finance_practical_life` \| Personal Finance Practical Life \|

	## Training Data

	- Source: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (CC-MAIN-2021-04 shard) plus ~50K rows from [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (10BT sample)
	- Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
	- Data was split 80% train / 10% val / 10% test (random seed 42)

	## Training Configuration

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Epochs \| 3 \|
	\| Batch size \| 32 \|
	\| Learning rate \| 2e-5 \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.1 \|
	\| Max token length \| 512 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Linear with warmup \|
	\| AMP \| bf16 (on CUDA) \|
	\| Gradient clipping \| max norm 1.0 \|

	Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).

	## Test Set Performance

	\| Metric \| Score \|
	\|---\|---\|
	\| Micro F1 \| 0.8545 \|
	\| Macro F1 \| 0.8264 \|
	\| Precision (micro) \| 0.8799 \|
	\| Recall (micro) \| 0.8304 \|
	\| Loss \| 0.1222 \|

	---
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceFW/fineweb
	language:
	- en
	tags:
	- text-classification
	- multi-label-classification
	- modernbert
	- fineweb
	- education
	pipeline_tag: text-classification
	---

	## Summary

	A fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Architecture \| `ModernBertForSequenceClassification` \|
	\| Task \| Multi-label classification \|
	\| Number of labels \| 17 \|
	\| Max input length \| 512 tokens \|
	\| Hidden size \| 768 \|
	\| Attention heads \| 12 \|
	\| Transformer layers \| 22 (alternating full + sliding window attention) \|
	\| Pooling \| Mean pooling \|

	## Labels

	\| Index \| Field \| Display Name \|
	\|---\|---\|---\|
	\| 0 \| `mathematics_statistics` \| Mathematics Statistics \|
	\| 1 \| `computer_science_software_engineering` \| Computer Science Software Engineering \|
	\| 2 \| `machine_learning_ai` \| Machine Learning AI \|
	\| 3 \| `physical_sciences` \| Physical Sciences \|
	\| 4 \| `life_sciences_biology` \| Life Sciences Biology \|
	\| 5 \| `medicine_health` \| Medicine Health \|
	\| 6 \| `engineering_technology` \| Engineering Technology \|
	\| 7 \| `business_economics` \| Business Economics \|
	\| 8 \| `law_government` \| Law Government \|
	\| 9 \| `social_sciences` \| Social Sciences \|
	\| 10 \| `history_geography` \| History Geography \|
	\| 11 \| `philosophy_ethics` \| Philosophy Ethics \|
	\| 12 \| `education_pedagogy` \| Education Pedagogy \|
	\| 13 \| `language_writing` \| Language Writing \|
	\| 14 \| `arts_humanities` \| Arts Humanities \|
	\| 15 \| `environmental_science_energy` \| Environmental Science Energy \|
	\| 16 \| `personal_finance_practical_life` \| Personal Finance Practical Life \|

	## Training Data

	- Source: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (CC-MAIN-2021-04 shard) plus ~50K rows from [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (10BT sample)
	- Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
	- Data was split 80% train / 10% val / 10% test (random seed 42)

	## Training Configuration

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Epochs \| 3 \|
	\| Batch size \| 32 \|
	\| Learning rate \| 2e-5 \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.1 \|
	\| Max token length \| 512 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Linear with warmup \|
	\| AMP \| bf16 (on CUDA) \|
	\| Gradient clipping \| max norm 1.0 \|

	Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).

	## Test Set Performance

	\| Metric \| Score \|
	\|---\|---\|
	\| Micro F1 \| 0.8545 \|
	\| Macro F1 \| 0.8264 \|
	\| Precision (micro) \| 0.8799 \|
	\| Recall (micro) \| 0.8304 \|
	\| Loss \| 0.1222 \|