mrehank209
/

bk-classification-bart-two-stage

Text Classification

bart_with_classifier

bibliographic-classification

german-libraries

multi-label-classification

Model card Files Files and versions

mrehank209 commited on Aug 29, 2025

Commit

f96b15a

·

verified ·

1 Parent(s): 787bf92

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +92 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+language: de
+license: mit
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- bibliographic-classification
+- bk-codes
+- german-libraries
+- multi-label-classification
+- bart
+datasets:
+- k10plus-catalog
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+- matthews_correlation
+---
+# BK Classification - Two-Stage BART
+This model performs automatic classification of German bibliographic records using BK (Basisklassifikation) codes.
+## Model Description
+This is a **two-stage fine-tuned BART-large model** for multi-label classification of bibliographic metadata into BK classification codes. The model achieved **state-of-the-art performance** on the K10plus library catalog dataset.
+### Performance
+- **Subset Accuracy**: 25.7%
+- **Matthews Correlation Coefficient (MCC)**: 0.498
+- **F1-Score (Micro)**: 47.9%
+- **F1-Score (Macro)**: 21.4%
+- **Precision (Micro)**: 66.1%
+- **Recall (Micro)**: 37.6%
+### Training Approach
+The model uses a **two-stage fine-tuning approach**:
+1. **Stage 1**: Train on parent BK categories (48 labels)
+2. **Stage 2**: Fine-tune on all BK codes (1,884 labels) using Stage 1 as initialization
+This approach outperformed both standard fine-tuning and hierarchical joint training.
+### Dataset
+- **Source**: K10plus German library catalog (2010-2020)
+- **Total Records**: 250,831 bibliographic entries
+- **Labels**: 1,884 unique BK classification codes
+- **Input Fields**: Title, Summary, Keywords, LOC Keywords, RVK codes
+### Usage
+```python
+import torch
+from transformers import AutoTokenizer
+from huggingface_hub import hf_hub_download
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("mrehank209/bk-classification-bart-two-stage")
+# Load model components (see repository for full inference code)
+# This model requires custom loading due to the classifier head
+```
+### Citation
+If you use this model, please cite:
+```bibtex
+@misc{bk-classification-bart,
+  title={Automatic BK Classification using Two-Stage BART Fine-tuning},
+  author={Khalid, M. Rehan},
+  year={2025},
+  howpublished={Hugging Face Model Hub},
+  url={https://huggingface.co/mrehank209/bk-classification-bart-two-stage}
+}
+```
+### Contact
+- **Author**: M. Rehan Khalid
+- **Email**: m.khalid@stud.uni-goettingen.de
+- **Affiliation**: University of Göttingen
+### License
+MIT License