File size: 2,325 Bytes
f96b15a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

---
language: de
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- bibliographic-classification
- bk-codes
- german-libraries
- multi-label-classification
- bart
datasets:
- k10plus-catalog
metrics:
- accuracy
- f1
- precision
- recall
- matthews_correlation
---

# BK Classification - Two-Stage BART

This model performs automatic classification of German bibliographic records using BK (Basisklassifikation) codes.

## Model Description

This is a **two-stage fine-tuned BART-large model** for multi-label classification of bibliographic metadata into BK classification codes. The model achieved **state-of-the-art performance** on the K10plus library catalog dataset.

### Performance

- **Subset Accuracy**: 25.7%
- **Matthews Correlation Coefficient (MCC)**: 0.498
- **F1-Score (Micro)**: 47.9%
- **F1-Score (Macro)**: 21.4%
- **Precision (Micro)**: 66.1%
- **Recall (Micro)**: 37.6%

### Training Approach

The model uses a **two-stage fine-tuning approach**:

1. **Stage 1**: Train on parent BK categories (48 labels)
2. **Stage 2**: Fine-tune on all BK codes (1,884 labels) using Stage 1 as initialization

This approach outperformed both standard fine-tuning and hierarchical joint training.

### Dataset

- **Source**: K10plus German library catalog (2010-2020)
- **Total Records**: 250,831 bibliographic entries
- **Labels**: 1,884 unique BK classification codes
- **Input Fields**: Title, Summary, Keywords, LOC Keywords, RVK codes

### Usage

```python
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("mrehank209/bk-classification-bart-two-stage")

# Load model components (see repository for full inference code)
# This model requires custom loading due to the classifier head
```

### Citation

If you use this model, please cite:

```bibtex
@misc{bk-classification-bart,
  title={Automatic BK Classification using Two-Stage BART Fine-tuning},
  author={Khalid, M. Rehan},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/mrehank209/bk-classification-bart-two-stage}
}
```

### Contact

- **Author**: M. Rehan Khalid
- **Email**: m.khalid@stud.uni-goettingen.de
- **Affiliation**: University of Göttingen

### License

MIT License