ERC Panels Classifier

This model is a fine-tuned version of allenai/specter2_base for multilabel scientific domain classification aligned with ERC panel taxonomy.
It achieves the following results on the held-out test set:

Best validation loss: 0.0361
Micro F1: 0.9386
Micro ROC-AUC: 0.9718
Subset accuracy: 0.7943

Model description

This model is a fine-tuned variant of SPECTER2 (allenai/specter2_base) adapted for multilabel classification of scientific documents into ERC research panels.

The model takes as input the title and abstract of a scientific publication and predicts one or more research panels.
Since scientific outputs may legitimately span multiple domains, the model is trained using sigmoid activation with binary cross-entropy loss, allowing independent assignment of multiple labels.

Key characteristics

Base model: allenai/specter2_base
Task: multilabel document classification
Labels: 28 ERC scientific panels
Activation: sigmoid (independent scores per label)
Loss: BCEWithLogitsLoss
Output: list of predicted panels with associated probabilities
Decision threshold: 0.5 (tunable)

This model enables automatic research-domain tagging aligned with the ERC panel structure.

Intended uses & limitations

Intended uses

This model is designed for:

Automatic assignment of ERC research panels
Metadata enrichment for:
- research project databases
- institutional repositories
- funding and grant analysis pipelines
Large-scale analytics such as:
- portfolio mapping
- thematic analysis of research outputs
- monitoring disciplinary coverage of funded projects
Predicting subject areas for documents lacking structured domain metadata

The model supports:

title only
abstract only
title + abstract (recommended)

Limitations

ERC panels are high-level categories and do not represent fine-grained subdisciplines
Labels are derived from curated datasets, semi-automatically annotated data
Class imbalance may affect recall for underrepresented panels
The model does not encode explicit hierarchical relationships between panels

Not suited for:

fine-grained subfield classification
journal recommendation
evaluation of research quality or impact
clinical, legal, or regulatory decision-making

Predictions should be treated as supportive metadata, not authoritative classifications.

How to use

from transformers import pipeline

# Replace with your actual model repo name on HuggingFace
MODEL_NAME = "nicolauduran45/erc_classifier_demo"

classifier = pipeline(task="text-classification", model=MODEL_NAME, tokenizer=MODEL_NAME)

text = ["Climate change impacts on Arctic ecosystems."]

classifier(text)

Training and evaluation data

Training data

Scientific documents with ERC-style panel annotations
Inputs:
- title
- abstract
Task type: multilabel classification

Dataset characteristics

Property	Value
Documents	~40k
Labels	28 panels
Input fields	Title, Abstract
Task type	Multilabel
License	Dataset-dependent

Training procedure

Preprocessing

Input text constructed as:

title + ". " + abstract
Tokenization using the SPECTER2 tokenizer
Maximum sequence length: 512 tokens

Model

Base model: allenai/specter2_base
Classification head: linear → sigmoid
Loss function: BCEWithLogitsLoss
Predictions: independent probability per label

Training hyperparameters

Hyperparameter	Value
Learning rate	2e-5
Train batch size	16
Eval batch size	16
Epochs	6
Weight decay	0.01
Optimizer	AdamW
Metric for best model	Micro F1

Training results

Epoch	Training Loss	Validation Loss	Micro F1	ROC-AUC	Accuracy
1	0.2089	0.0968	0.7576	0.8347	0.4043
2	0.0961	0.0713	0.8231	0.8888	0.5171
3	0.0719	0.0578	0.8614	0.9209	0.5829
4	0.0579	0.0458	0.9072	0.9546	0.7029
5	0.0479	0.0390	0.9264	0.9620	0.7614
6	0.0407	0.0361	0.9386	0.9718	0.7943

Evaluation results (multilabel test set)

Panel	Precision	Recall	F1-score	Support
Biotechnology and Biosystems Engineering	0.88	0.70	0.78	30
Cell Biology, Development, Stem Cells and Regeneration	0.98	0.94	0.96	54
Computer Science and Informatics	0.96	0.98	0.97	95
Condensed Matter Physics	0.97	0.99	0.98	68
Earth System Science	0.94	0.98	0.96	64
Environmental Biology, Ecology and Evolution	0.91	0.96	0.94	54
Fundamental Constituents of Matter	0.97	0.94	0.95	32
Human Mobility, Environment, and Space	0.81	0.81	0.81	21
Immunity, Infection and Immunotherapy	1.00	0.97	0.99	40
Individuals, Markets and Organisations	0.94	0.98	0.96	48
Institutions, Governance and Legal Systems	0.89	0.92	0.91	26
Integrative Biology: from Genes and Genomes to Systems	0.91	0.98	0.94	49
Materials Engineering	0.81	0.93	0.87	75
Mathematics	1.00	1.00	1.00	36
Molecules of Life: Biological Mechanisms, Structures and Functions	0.94	0.98	0.96	111
Neuroscience and Disorders of the Nervous System	1.00	1.00	1.00	30
Physical and Analytical Chemical Sciences	0.89	0.93	0.91	94
Physiology in Health, Disease and Ageing	0.94	1.00	0.97	34
Prevention, Diagnosis and Treatment of Human Diseases	0.97	0.96	0.96	68
Products and Processes Engineering	0.90	0.97	0.93	109
Studies of Cultures and Arts	1.00	0.78	0.88	9
Synthetic Chemistry and Materials	0.82	0.77	0.79	47
Systems and Communication Engineering	0.94	0.97	0.95	87
Texts and Concepts	0.87	0.93	0.90	14
The Human Mind and Its Complexity	1.00	0.93	0.97	30
The Social World and Its Interactions	0.97	0.94	0.96	34
The Study of the Human Past	0.89	0.94	0.91	17
Universe Sciences	1.00	1.00	1.00	25

Overall performance

	Precision	Recall	F1-score	Support
Micro avg	0.93	0.95	0.94	1401
Macro avg	0.93	0.94	0.93	1401
Weighted avg	0.93	0.95	0.94	1401
Samples avg	0.93	0.94	0.93	1401

ERC-funded projects evaluation (multiclass recall)

This evaluation uses ERC-funded projects, where each project belongs to exactly one panel.
Only recall is reported.

Panel	Recall
Biotechnology and Biosystems Engineering	0.26
Cell Biology, Development, Stem Cells and Regeneration	0.81
Computer Science and Informatics	1.00
Condensed Matter Physics	0.77
Earth System Science	0.92
Environmental Biology, Ecology and Evolution	0.85
Fundamental Constituents of Matter	0.84
Human Mobility, Environment, and Space	0.61
Immunity, Infection and Immunotherapy	0.83
Individuals, Markets and Organisations	0.96
Institutions, Governance and Legal Systems	0.58
Integrative Biology: from Genes and Genomes to Systems	0.73
Materials Engineering	0.75
Mathematics	0.96
Molecules of Life: Biological Mechanisms, Structures and Functions	0.95
Neuroscience and Disorders of the Nervous System	0.92
Physical and Analytical Chemical Sciences	0.83
Physiology in Health, Disease and Ageing	0.60
Prevention, Diagnosis and Treatment of Human Diseases	0.94
Products and Processes Engineering	0.58
Studies of Cultures and Arts	0.27
Synthetic Chemistry and Materials	0.67
Systems and Communication Engineering	0.75
Texts and Concepts	0.62
The Human Mind and Its Complexity	0.85
The Social World and Its Interactions	0.73
The Study of the Human Past	0.83
Universe Sciences	1.00

Overall performance Overall recall

Micro recall: 0.77
Macro recall: 0.76

Citation

@inproceedings{bovenzi2022mapping,
  title={Mapping STI ecosystems via Open Data: Overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark},
  author={Bovenzi, Nicandro and Duran-Silva, Nicolau and Massucci, Francesco Alessandro and Multari, Francesco and Parra-Rojas, C{\'e}sar and Pujol-Llatse, Josep},
  booktitle={International Conference on Theory and Practice of Digital Libraries (TPDL)},
  pages={495--499},
  year={2022},
  publisher={Springer International Publishing}
}

Framework versions

Transformers: 4.57.x
PyTorch: 2.8.0
Datasets: 3.x
Tokenizers: 0.22.x

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for nicolauduran45/erc_classifier_demo

Base model

allenai/specter2_base

Finetuned

(31)

this model

nicolauduran45
/

erc_classifier_demo