| --- |
| library_name: transformers |
| license: apache-2.0 |
| base_model: |
| - allenai/specter2_base |
| pipeline_tag: text-classification |
| datasets: |
| - nicolauduran45/horizon_clusters_annotated |
| --- |
| |
| # SPECTER2-base Multilabel Horizon Clusters Classifier |
|
|
| This model is based on **[SPECTER2-base](https://huggingface.co/allenai/specter2_base)**, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters. |
|
|
| --- |
|
|
| ## Model Description |
|
|
| - **Base model:** [allenai/specter2_base](https://huggingface.co/allenai/specter2_base) |
| - **Task:** Multilabel classification (assigns one or more clusters per document) |
| - **Labels:** 6 Horizon Europe clusters (see below) |
| - **Languages:** English |
| - **Input:** Title and abstract concatenated |
|
|
| --- |
|
|
| ## Training Details |
|
|
| - **Training framework:** Hugging Face Transformers (`Trainer`) |
| - **Batch size:** 4 |
| - **Learning rate:** 2e-5 |
| - **Epochs:** 6 |
| - **Optimizer:** AdamW with weight decay 0.01 |
| - **Loss:** Binary Cross-Entropy with Logits |
| - **Best model selection:** F1-score on validation set |
|
|
| --- |
|
|
| ## Clusters (Labels) |
|
|
| - Civil Security for Society |
| - Climate, Energy and Mobility |
| - Culture, Creativity and Inclusive Society |
| - Digital, Industry and Space |
| - Food, Bioeconomy, Natural Resources, Agriculture and Environment |
| - Health |
|
|
| --- |
|
|
| ## Evaluation Metrics |
|
|
| | Epoch | Training Loss | Validation Loss | F1 | ROC AUC | Accuracy | |
| |-------|--------------|----------------|----------|-----------|----------| |
| | 1 | No log | 0.1774 | 0.910 | 0.9368 | 0.766 | |
| | 2 | 0.0606 | 0.1849 | 0.921 | 0.9454 | 0.787 | |
| | 3 | 0.0351 | 0.2071 | 0.919 | 0.9434 | 0.787 | |
| | 4 | 0.0180 | 0.2191 | 0.921 | 0.9451 | 0.793 | |
| | 5 | 0.0093 | 0.2295 | 0.921 | 0.9451 | 0.793 | |
| | 6 | 0.0060 | 0.2307 | 0.921 | 0.9451 | 0.793 | |
|
|
| **Best epoch:** 6 (highest F1 and accuracy, last improvement at epoch 4) |
|
|
| - **Final validation loss:** 0.2307 |
| - **Final F1:** 0.9212 |
| - **Final ROC AUC:** 0.9451 |
| - **Final Accuracy:** 0.7927 |
|
|
| --- |
|
|
| ## Per-Category Classification Report |
|
|
| | Label | Precision | Recall | F1-score | Support | |
| |---------------------------------------------------------------------|-----------|--------|----------|---------| |
| | Civil Security for Society | 0.97 | 0.79 | 0.87 | 39 | |
| | Climate, Energy and Mobility | 0.94 | 0.91 | 0.93 | 91 | |
| | Culture, Creativity and Inclusive Society | 0.89 | 0.88 | 0.88 | 96 | |
| | Digital, Industry and Space | 0.93 | 0.92 | 0.93 | 214 | |
| | Food, Bioeconomy, Natural Resources, Agriculture and Environment | 0.89 | 0.97 | 0.93 | 75 | |
| | Health | 0.96 | 0.96 | 0.96 | 73 | |
| | **micro avg** | 0.93 | 0.91 | 0.92 | 588 | |
| | **macro avg** | 0.93 | 0.91 | 0.92 | 588 | |
| | **weighted avg** | 0.93 | 0.91 | 0.92 | 588 | |
| | **samples avg** | 0.91 | 0.92 | 0.90 | 588 | |
|
|
| --- |
|
|
| ## License |
|
|
| This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). |