nicolauduran45's picture
Update README.md
c30b149 verified
---
library_name: transformers
license: apache-2.0
base_model:
- allenai/specter2_base
pipeline_tag: text-classification
datasets:
- nicolauduran45/horizon_clusters_annotated
---
# SPECTER2-base Multilabel Horizon Clusters Classifier
This model is based on **[SPECTER2-base](https://huggingface.co/allenai/specter2_base)**, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.
---
## Model Description
- **Base model:** [allenai/specter2_base](https://huggingface.co/allenai/specter2_base)
- **Task:** Multilabel classification (assigns one or more clusters per document)
- **Labels:** 6 Horizon Europe clusters (see below)
- **Languages:** English
- **Input:** Title and abstract concatenated
---
## Training Details
- **Training framework:** Hugging Face Transformers (`Trainer`)
- **Batch size:** 4
- **Learning rate:** 2e-5
- **Epochs:** 6
- **Optimizer:** AdamW with weight decay 0.01
- **Loss:** Binary Cross-Entropy with Logits
- **Best model selection:** F1-score on validation set
---
## Clusters (Labels)
- Civil Security for Society
- Climate, Energy and Mobility
- Culture, Creativity and Inclusive Society
- Digital, Industry and Space
- Food, Bioeconomy, Natural Resources, Agriculture and Environment
- Health
---
## Evaluation Metrics
| Epoch | Training Loss | Validation Loss | F1 | ROC AUC | Accuracy |
|-------|--------------|----------------|----------|-----------|----------|
| 1 | No log | 0.1774 | 0.910 | 0.9368 | 0.766 |
| 2 | 0.0606 | 0.1849 | 0.921 | 0.9454 | 0.787 |
| 3 | 0.0351 | 0.2071 | 0.919 | 0.9434 | 0.787 |
| 4 | 0.0180 | 0.2191 | 0.921 | 0.9451 | 0.793 |
| 5 | 0.0093 | 0.2295 | 0.921 | 0.9451 | 0.793 |
| 6 | 0.0060 | 0.2307 | 0.921 | 0.9451 | 0.793 |
**Best epoch:** 6 (highest F1 and accuracy, last improvement at epoch 4)
- **Final validation loss:** 0.2307
- **Final F1:** 0.9212
- **Final ROC AUC:** 0.9451
- **Final Accuracy:** 0.7927
---
## Per-Category Classification Report
| Label | Precision | Recall | F1-score | Support |
|---------------------------------------------------------------------|-----------|--------|----------|---------|
| Civil Security for Society | 0.97 | 0.79 | 0.87 | 39 |
| Climate, Energy and Mobility | 0.94 | 0.91 | 0.93 | 91 |
| Culture, Creativity and Inclusive Society | 0.89 | 0.88 | 0.88 | 96 |
| Digital, Industry and Space | 0.93 | 0.92 | 0.93 | 214 |
| Food, Bioeconomy, Natural Resources, Agriculture and Environment | 0.89 | 0.97 | 0.93 | 75 |
| Health | 0.96 | 0.96 | 0.96 | 73 |
| **micro avg** | 0.93 | 0.91 | 0.92 | 588 |
| **macro avg** | 0.93 | 0.91 | 0.92 | 588 |
| **weighted avg** | 0.93 | 0.91 | 0.92 | 588 |
| **samples avg** | 0.91 | 0.92 | 0.90 | 588 |
---
## License
This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).