mrehank209 commited on
Commit
f96b15a
·
verified ·
1 Parent(s): 787bf92

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: de
4
+ license: mit
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - bibliographic-classification
9
+ - bk-codes
10
+ - german-libraries
11
+ - multi-label-classification
12
+ - bart
13
+ datasets:
14
+ - k10plus-catalog
15
+ metrics:
16
+ - accuracy
17
+ - f1
18
+ - precision
19
+ - recall
20
+ - matthews_correlation
21
+ ---
22
+
23
+ # BK Classification - Two-Stage BART
24
+
25
+ This model performs automatic classification of German bibliographic records using BK (Basisklassifikation) codes.
26
+
27
+ ## Model Description
28
+
29
+ This is a **two-stage fine-tuned BART-large model** for multi-label classification of bibliographic metadata into BK classification codes. The model achieved **state-of-the-art performance** on the K10plus library catalog dataset.
30
+
31
+ ### Performance
32
+
33
+ - **Subset Accuracy**: 25.7%
34
+ - **Matthews Correlation Coefficient (MCC)**: 0.498
35
+ - **F1-Score (Micro)**: 47.9%
36
+ - **F1-Score (Macro)**: 21.4%
37
+ - **Precision (Micro)**: 66.1%
38
+ - **Recall (Micro)**: 37.6%
39
+
40
+ ### Training Approach
41
+
42
+ The model uses a **two-stage fine-tuning approach**:
43
+
44
+ 1. **Stage 1**: Train on parent BK categories (48 labels)
45
+ 2. **Stage 2**: Fine-tune on all BK codes (1,884 labels) using Stage 1 as initialization
46
+
47
+ This approach outperformed both standard fine-tuning and hierarchical joint training.
48
+
49
+ ### Dataset
50
+
51
+ - **Source**: K10plus German library catalog (2010-2020)
52
+ - **Total Records**: 250,831 bibliographic entries
53
+ - **Labels**: 1,884 unique BK classification codes
54
+ - **Input Fields**: Title, Summary, Keywords, LOC Keywords, RVK codes
55
+
56
+ ### Usage
57
+
58
+ ```python
59
+ import torch
60
+ from transformers import AutoTokenizer
61
+ from huggingface_hub import hf_hub_download
62
+
63
+ # Load tokenizer
64
+ tokenizer = AutoTokenizer.from_pretrained("mrehank209/bk-classification-bart-two-stage")
65
+
66
+ # Load model components (see repository for full inference code)
67
+ # This model requires custom loading due to the classifier head
68
+ ```
69
+
70
+ ### Citation
71
+
72
+ If you use this model, please cite:
73
+
74
+ ```bibtex
75
+ @misc{bk-classification-bart,
76
+ title={Automatic BK Classification using Two-Stage BART Fine-tuning},
77
+ author={Khalid, M. Rehan},
78
+ year={2025},
79
+ howpublished={Hugging Face Model Hub},
80
+ url={https://huggingface.co/mrehank209/bk-classification-bart-two-stage}
81
+ }
82
+ ```
83
+
84
+ ### Contact
85
+
86
+ - **Author**: M. Rehan Khalid
87
+ - **Email**: m.khalid@stud.uni-goettingen.de
88
+ - **Affiliation**: University of Göttingen
89
+
90
+ ### License
91
+
92
+ MIT License