File size: 2,325 Bytes

f96b15a


---
language: de
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- bibliographic-classification
- bk-codes
- german-libraries
- multi-label-classification
- bart
datasets:
- k10plus-catalog
metrics:
- accuracy
- f1
- precision
- recall
- matthews_correlation
---

# BK Classification - Two-Stage BART

This model performs automatic classification of German bibliographic records using BK (Basisklassifikation) codes.

## Model Description

This is a **two-stage fine-tuned BART-large model** for multi-label classification of bibliographic metadata into BK classification codes. The model achieved **state-of-the-art performance** on the K10plus library catalog dataset.

### Performance

- **Subset Accuracy**: 25.7%
- **Matthews Correlation Coefficient (MCC)**: 0.498
- **F1-Score (Micro)**: 47.9%
- **F1-Score (Macro)**: 21.4%
- **Precision (Micro)**: 66.1%
- **Recall (Micro)**: 37.6%

### Training Approach

The model uses a **two-stage fine-tuning approach**:

1. **Stage 1**: Train on parent BK categories (48 labels)
2. **Stage 2**: Fine-tune on all BK codes (1,884 labels) using Stage 1 as initialization

This approach outperformed both standard fine-tuning and hierarchical joint training.

### Dataset

- **Source**: K10plus German library catalog (2010-2020)
- **Total Records**: 250,831 bibliographic entries
- **Labels**: 1,884 unique BK classification codes
- **Input Fields**: Title, Summary, Keywords, LOC Keywords, RVK codes

### Usage

```python
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("mrehank209/bk-classification-bart-two-stage")

# Load model components (see repository for full inference code)
# This model requires custom loading due to the classifier head
```

### Citation

If you use this model, please cite:

```bibtex
@misc{bk-classification-bart,
  title={Automatic BK Classification using Two-Stage BART Fine-tuning},
  author={Khalid, M. Rehan},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/mrehank209/bk-classification-bart-two-stage}
}
```

### Contact

- **Author**: M. Rehan Khalid
- **Email**: m.khalid@stud.uni-goettingen.de
- **Affiliation**: University of Göttingen

### License

MIT License