File size: 2,405 Bytes
bf7401e
 
 
 
 
 
 
1e6ff57
bf7401e
 
638109b
e1fe80b
669f8dc
e1fe80b
638109b
 
 
 
 
 
 
 
 
1e6ff57
4ccdda8
e1fe80b
 
 
638109b
1e6ff57
e1fe80b
 
 
638109b
308fe75
 
e1fe80b
638109b
 
 
 
0b2caca
 
 
 
 
 
 
 
638109b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: README
emoji: 🚀
colorFrom: pink
colorTo: purple
sdk: static
pinned: false
license: mit
---

# 🧠 Open Multi-Label ASJC Classification

We present the first **multi-label classification model** built on the ASJC taxonomy that reliably assigns subject categories to individual documents—including those published in general-science or interdisciplinary journals—using Title, Container Title, and Abstract metadata.

## 👥 Team
- **Michael Gusenbauer** – Johannes Kepler University Linz | ORCID: [https://orcid.org/0000-0001-7768-2351](https://orcid.org/0000-0001-7768-2351)
- **Jochen Endermann** – University of Applied Sciences Kufstein
- **Harald Huber** – University of Applied Sciences Kufstein  
- **Simon Strasser** – University of Applied Sciences Kufstein  
- **Andreas-Nizar Granitzer** – Norwegian Geotechnical Institute  | ORCID: [https://orcid.org/0000-0002-5839-4300](https://orcid.org/0000-0002-5839-4300) 
- **Thomas Ströhle** – Universität Innsbruck | ORCID: [https://orcid.org/0000-0002-1954-6412](https://orcid.org/0000-0002-1954-6412)
  
## 🎯 Purpose
Traditional ASJC classification approaches are limited by incomplete sources, journal-level labels, or single-label assignments. This project provides:  
- **Multi-label classification across 307 subjects** (compare [google sheet](https://docs.google.com/spreadsheets/d/1kqmGk2x0msodbaKDYt2RixyyB3MqOGrWS2azRGNsodw) for all labels)   
- Fine-tuned **SciBERT model** trained on Crossref metadata  
- Methods for **collection-level analysis** (researcher portfolios, institutions, datasets)

## ✨ Features
- High performance 
- Works with or without source title metadata  
- Open, reproducible, and ready for research use

## 🗂 Content
- Fine-tuned model
- Sample code for model inference

## 📖 Citation
If you use this work, please cite:

```bibtex
@article{Gusenbauer.2025,
author = {Gusenbauer, Michael and Endermann, Jochen and Huber, Harald and Strasser, Simon and Granitzer, Andreas-Nizar and Ströhle, Thomas},
year = {2025},
title = {Fine-tuning SciBERT to enable ASJC-based assessments of the disciplinary orientation of research collections},
keywords = {All Science Journal Classification;Disciplinary coverage;Fine-tuning;multi-label classification;SciBERT;Transformer-based language models},
issn = {0138-9130},
journal = {Scientometrics},
doi = {10.1007/s11192-025-05490-0},
}