ERC Panels Classifier
This model is a fine-tuned version of allenai/specter2_base for multilabel scientific domain classification aligned with ERC panel taxonomy.
It achieves the following results on the held-out test set:
- Best validation loss: 0.0361
- Micro F1: 0.9386
- Micro ROC-AUC: 0.9718
- Subset accuracy: 0.7943
Model description
This model is a fine-tuned variant of SPECTER2 (allenai/specter2_base) adapted for multilabel classification of scientific documents into ERC research panels.
The model takes as input the title and abstract of a scientific publication and predicts one or more research panels.
Since scientific outputs may legitimately span multiple domains, the model is trained using sigmoid activation with binary cross-entropy loss, allowing independent assignment of multiple labels.
Key characteristics
- Base model: allenai/specter2_base
- Task: multilabel document classification
- Labels: 28 ERC scientific panels
- Activation: sigmoid (independent scores per label)
- Loss: BCEWithLogitsLoss
- Output: list of predicted panels with associated probabilities
- Decision threshold: 0.5 (tunable)
This model enables automatic research-domain tagging aligned with the ERC panel structure.
Intended uses & limitations
Intended uses
This model is designed for:
- Automatic assignment of ERC research panels
- Metadata enrichment for:
- research project databases
- institutional repositories
- funding and grant analysis pipelines
- Large-scale analytics such as:
- portfolio mapping
- thematic analysis of research outputs
- monitoring disciplinary coverage of funded projects
- Predicting subject areas for documents lacking structured domain metadata
The model supports:
- title only
- abstract only
- title + abstract (recommended)
Limitations
- ERC panels are high-level categories and do not represent fine-grained subdisciplines
- Labels are derived from curated datasets, semi-automatically annotated data
- Class imbalance may affect recall for underrepresented panels
- The model does not encode explicit hierarchical relationships between panels
Not suited for:
- fine-grained subfield classification
- journal recommendation
- evaluation of research quality or impact
- clinical, legal, or regulatory decision-making
Predictions should be treated as supportive metadata, not authoritative classifications.
How to use
from transformers import pipeline
# Replace with your actual model repo name on HuggingFace
MODEL_NAME = "nicolauduran45/erc_classifier_demo"
classifier = pipeline(task="text-classification", model=MODEL_NAME, tokenizer=MODEL_NAME)
text = ["Climate change impacts on Arctic ecosystems."]
classifier(text)
Training and evaluation data
Training data
- Scientific documents with ERC-style panel annotations
- Inputs:
- title
- abstract
- Task type: multilabel classification
Dataset characteristics
| Property | Value |
|---|---|
| Documents | ~40k |
| Labels | 28 panels |
| Input fields | Title, Abstract |
| Task type | Multilabel |
| License | Dataset-dependent |
Training procedure
Preprocessing
Input text constructed as:
title + ". " + abstractTokenization using the SPECTER2 tokenizer
Maximum sequence length: 512 tokens
Model
- Base model:
allenai/specter2_base - Classification head: linear → sigmoid
- Loss function: BCEWithLogitsLoss
- Predictions: independent probability per label
Training hyperparameters
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Train batch size | 16 |
| Eval batch size | 16 |
| Epochs | 6 |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Metric for best model | Micro F1 |
Training results
| Epoch | Training Loss | Validation Loss | Micro F1 | ROC-AUC | Accuracy |
|---|---|---|---|---|---|
| 1 | 0.2089 | 0.0968 | 0.7576 | 0.8347 | 0.4043 |
| 2 | 0.0961 | 0.0713 | 0.8231 | 0.8888 | 0.5171 |
| 3 | 0.0719 | 0.0578 | 0.8614 | 0.9209 | 0.5829 |
| 4 | 0.0579 | 0.0458 | 0.9072 | 0.9546 | 0.7029 |
| 5 | 0.0479 | 0.0390 | 0.9264 | 0.9620 | 0.7614 |
| 6 | 0.0407 | 0.0361 | 0.9386 | 0.9718 | 0.7943 |
Evaluation results (multilabel test set)
| Panel | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| Biotechnology and Biosystems Engineering | 0.88 | 0.70 | 0.78 | 30 |
| Cell Biology, Development, Stem Cells and Regeneration | 0.98 | 0.94 | 0.96 | 54 |
| Computer Science and Informatics | 0.96 | 0.98 | 0.97 | 95 |
| Condensed Matter Physics | 0.97 | 0.99 | 0.98 | 68 |
| Earth System Science | 0.94 | 0.98 | 0.96 | 64 |
| Environmental Biology, Ecology and Evolution | 0.91 | 0.96 | 0.94 | 54 |
| Fundamental Constituents of Matter | 0.97 | 0.94 | 0.95 | 32 |
| Human Mobility, Environment, and Space | 0.81 | 0.81 | 0.81 | 21 |
| Immunity, Infection and Immunotherapy | 1.00 | 0.97 | 0.99 | 40 |
| Individuals, Markets and Organisations | 0.94 | 0.98 | 0.96 | 48 |
| Institutions, Governance and Legal Systems | 0.89 | 0.92 | 0.91 | 26 |
| Integrative Biology: from Genes and Genomes to Systems | 0.91 | 0.98 | 0.94 | 49 |
| Materials Engineering | 0.81 | 0.93 | 0.87 | 75 |
| Mathematics | 1.00 | 1.00 | 1.00 | 36 |
| Molecules of Life: Biological Mechanisms, Structures and Functions | 0.94 | 0.98 | 0.96 | 111 |
| Neuroscience and Disorders of the Nervous System | 1.00 | 1.00 | 1.00 | 30 |
| Physical and Analytical Chemical Sciences | 0.89 | 0.93 | 0.91 | 94 |
| Physiology in Health, Disease and Ageing | 0.94 | 1.00 | 0.97 | 34 |
| Prevention, Diagnosis and Treatment of Human Diseases | 0.97 | 0.96 | 0.96 | 68 |
| Products and Processes Engineering | 0.90 | 0.97 | 0.93 | 109 |
| Studies of Cultures and Arts | 1.00 | 0.78 | 0.88 | 9 |
| Synthetic Chemistry and Materials | 0.82 | 0.77 | 0.79 | 47 |
| Systems and Communication Engineering | 0.94 | 0.97 | 0.95 | 87 |
| Texts and Concepts | 0.87 | 0.93 | 0.90 | 14 |
| The Human Mind and Its Complexity | 1.00 | 0.93 | 0.97 | 30 |
| The Social World and Its Interactions | 0.97 | 0.94 | 0.96 | 34 |
| The Study of the Human Past | 0.89 | 0.94 | 0.91 | 17 |
| Universe Sciences | 1.00 | 1.00 | 1.00 | 25 |
Overall performance
| Precision | Recall | F1-score | Support | |
|---|---|---|---|---|
| Micro avg | 0.93 | 0.95 | 0.94 | 1401 |
| Macro avg | 0.93 | 0.94 | 0.93 | 1401 |
| Weighted avg | 0.93 | 0.95 | 0.94 | 1401 |
| Samples avg | 0.93 | 0.94 | 0.93 | 1401 |
ERC-funded projects evaluation (multiclass recall)
This evaluation uses ERC-funded projects, where each project belongs to exactly one panel.
Only recall is reported.
| Panel | Recall |
|---|---|
| Biotechnology and Biosystems Engineering | 0.26 |
| Cell Biology, Development, Stem Cells and Regeneration | 0.81 |
| Computer Science and Informatics | 1.00 |
| Condensed Matter Physics | 0.77 |
| Earth System Science | 0.92 |
| Environmental Biology, Ecology and Evolution | 0.85 |
| Fundamental Constituents of Matter | 0.84 |
| Human Mobility, Environment, and Space | 0.61 |
| Immunity, Infection and Immunotherapy | 0.83 |
| Individuals, Markets and Organisations | 0.96 |
| Institutions, Governance and Legal Systems | 0.58 |
| Integrative Biology: from Genes and Genomes to Systems | 0.73 |
| Materials Engineering | 0.75 |
| Mathematics | 0.96 |
| Molecules of Life: Biological Mechanisms, Structures and Functions | 0.95 |
| Neuroscience and Disorders of the Nervous System | 0.92 |
| Physical and Analytical Chemical Sciences | 0.83 |
| Physiology in Health, Disease and Ageing | 0.60 |
| Prevention, Diagnosis and Treatment of Human Diseases | 0.94 |
| Products and Processes Engineering | 0.58 |
| Studies of Cultures and Arts | 0.27 |
| Synthetic Chemistry and Materials | 0.67 |
| Systems and Communication Engineering | 0.75 |
| Texts and Concepts | 0.62 |
| The Human Mind and Its Complexity | 0.85 |
| The Social World and Its Interactions | 0.73 |
| The Study of the Human Past | 0.83 |
| Universe Sciences | 1.00 |
Overall performance Overall recall
- Micro recall: 0.77
- Macro recall: 0.76
Citation
@inproceedings{bovenzi2022mapping,
title={Mapping STI ecosystems via Open Data: Overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark},
author={Bovenzi, Nicandro and Duran-Silva, Nicolau and Massucci, Francesco Alessandro and Multari, Francesco and Parra-Rojas, C{\'e}sar and Pujol-Llatse, Josep},
booktitle={International Conference on Theory and Practice of Digital Libraries (TPDL)},
pages={495--499},
year={2022},
publisher={Springer International Publishing}
}
Framework versions
- Transformers: 4.57.x
- PyTorch: 2.8.0
- Datasets: 3.x
- Tokenizers: 0.22.x
- Downloads last month
- 26
Model tree for nicolauduran45/erc_classifier_demo
Base model
allenai/specter2_base