File size: 2,325 Bytes
f96b15a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
language: de
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- bibliographic-classification
- bk-codes
- german-libraries
- multi-label-classification
- bart
datasets:
- k10plus-catalog
metrics:
- accuracy
- f1
- precision
- recall
- matthews_correlation
---
# BK Classification - Two-Stage BART
This model performs automatic classification of German bibliographic records using BK (Basisklassifikation) codes.
## Model Description
This is a **two-stage fine-tuned BART-large model** for multi-label classification of bibliographic metadata into BK classification codes. The model achieved **state-of-the-art performance** on the K10plus library catalog dataset.
### Performance
- **Subset Accuracy**: 25.7%
- **Matthews Correlation Coefficient (MCC)**: 0.498
- **F1-Score (Micro)**: 47.9%
- **F1-Score (Macro)**: 21.4%
- **Precision (Micro)**: 66.1%
- **Recall (Micro)**: 37.6%
### Training Approach
The model uses a **two-stage fine-tuning approach**:
1. **Stage 1**: Train on parent BK categories (48 labels)
2. **Stage 2**: Fine-tune on all BK codes (1,884 labels) using Stage 1 as initialization
This approach outperformed both standard fine-tuning and hierarchical joint training.
### Dataset
- **Source**: K10plus German library catalog (2010-2020)
- **Total Records**: 250,831 bibliographic entries
- **Labels**: 1,884 unique BK classification codes
- **Input Fields**: Title, Summary, Keywords, LOC Keywords, RVK codes
### Usage
```python
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("mrehank209/bk-classification-bart-two-stage")
# Load model components (see repository for full inference code)
# This model requires custom loading due to the classifier head
```
### Citation
If you use this model, please cite:
```bibtex
@misc{bk-classification-bart,
title={Automatic BK Classification using Two-Stage BART Fine-tuning},
author={Khalid, M. Rehan},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/mrehank209/bk-classification-bart-two-stage}
}
```
### Contact
- **Author**: M. Rehan Khalid
- **Email**: m.khalid@stud.uni-goettingen.de
- **Affiliation**: University of Göttingen
### License
MIT License
|