---
library_name: transformers
license: apache-2.0
base_model: allenai/specter2_base
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: results
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# 📗 SPECTER2–MAG (Multiclass Classification on MAG Level-0 Fields of Study)

This model is a fine-tuned version of [allenai/specter2_base](https://huggingface.co/allenai/specter2_base) for multiclass bibliometric classification using MAG Fields of Study – Level 0 (SciDocs).
It achieves the following results on the evaluation set:
- Loss: 1.0598
- Accuracy: 0.8310
- Precision Micro: 0.8310
- Precision Macro: 0.8290
- Recall Micro: 0.8310
- Recall Macro: 0.8276
- F1 Micro: 0.8310
- F1 Macro: 0.8263

## Model description

This model is a fine-tuned version of SPECTER2 (`allenai/specter2_base`) adapted for multiclass classification across the 19 top-level Fields of Study (FoS) from the Microsoft Academic Graph (MAG).

The model accepts the title, abstract, or title + abstract of a scientific publication and assigns it to exactly one of the MAG Level-0 domains (e.g., Biology, Chemistry, Computer Science, Engineering, Psychology).

Key characteristics:
* Base model: allenai/specter2_base
* Task: multiclass document classification
* Labels: 19 MAG Field of Study Level-0 categories
* Activation: softmax
* Loss: CrossEntropyLoss
* Output: single best-matching FoS category

MAG Level-0 represents broad disciplinary domains designed for high-level categorization of scientific documents.

## Intended uses & limitations

### Intended uses
This multiclass MAG model is suitable for:

- Assigning publications to **top-level scientific disciplines**
- Enriching metadata in:
  - repositories  
  - research output systems  
  - funding and project datasets  
  - bibliometric dashboards  
- Supporting scientometric analyses such as:
  - broad-discipline portfolio mapping  
  - domain-level clustering  
  - modeling research diversification  
- Classifying documents when only **title/abstract** is available

The model supports inputs such as:
- **title only**
- **abstract only**
- **title + abstract** (recommended)

### Limitations
- MAG Level-0 categories are **very coarse** (e.g., *Biology*, *Medicine*, *Engineering*), and do not represent subfields.
- Documents spanning multiple fields must be forced into **one** label—an inherent limitation of multiclass classification.
- The training labels come from **MAG’s automatic field assignment pipeline**, not manual expert annotation.
- Not suitable for:
  - fine-grained subdisciplines  
  - downstream tasks requiring multilabel outputs  
  - WoS Categories or ASJC Areas (use separate models)  
  - clinical or regulatory decision-making  

Predictions should be treated as **high-level disciplinary metadata**, not detailed field classification.

## Training and evaluation data

### Source dataset: **SciDocs**

Training data comes from the **SciDocs** dataset, introduced together with the original SPECTER paper:

> **SciDocs** provides citation graphs, titles, abstracts, and **MAG Fields of Study** for scientific documents derived from MAG.  
> For this model, we use **MAG Level-0 FoS**, the 19 top-level scientific domains.

Dataset characteristics:

| Property | Value |
|---------|-------|
| Documents | ~40k scientific papers |
| Labels | 19 FoS Level-0 categories |
| Input fields | Abstract |
| Task type | Multiclass |
| Source | SciDocs (SPECTER paper) |
| License | CC-BY |

## Training procedure

### Preprocessing
- Input text constructed as:  
  `abstract`
- Tokenization using the SPECTER2 tokenizer  
- Maximum sequence length: **512 tokens**

### Model
- Base model: `allenai/specter2_base`  
- Classification head: linear layer → softmax  
- Loss: **CrossEntropyLoss**  

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision Micro | Precision Macro | Recall Micro | Recall Macro | F1 Micro | F1 Macro |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------------:|:---------------:|:------------:|:------------:|:--------:|:--------:|
| 0.2603        | 1.0   | 1094 | 0.6733          | 0.8243   | 0.8243          | 0.8315          | 0.8243       | 0.8198       | 0.8243   | 0.8222   |
| 0.1779        | 2.0   | 2188 | 0.6955          | 0.8240   | 0.8240          | 0.8198          | 0.8240       | 0.8203       | 0.8240   | 0.8176   |
| 0.1628        | 3.0   | 3282 | 0.8130          | 0.8315   | 0.8315          | 0.8296          | 0.8315       | 0.8265       | 0.8315   | 0.8269   |
| 0.1136        | 4.0   | 4376 | 0.9842          | 0.8227   | 0.8227          | 0.8254          | 0.8227       | 0.8192       | 0.8227   | 0.8205   |
| 0.0666        | 5.0   | 5470 | 1.0598          | 0.8310   | 0.8310          | 0.8290          | 0.8310       | 0.8276       | 0.8310   | 0.8263   |

### Evaluation results

|                       |   precision |   recall |   f1-score |     support |
|:----------------------|------------:|---------:|-----------:|------------:|
| Art                   |    0.654867 | 0.845714 |   0.738155 |  175        |
| Biology               |    0.982222 | 0.973568 |   0.977876 |  227        |
| Business              |    0.914894 | 0.877551 |   0.895833 |  196        |
| Chemistry             |    0.97449  | 0.969543 |   0.97201  |  197        |
| Computer science      |    0.960452 | 0.894737 |   0.926431 |  190        |
| Economics             |    0.816425 | 0.782407 |   0.799054 |  216        |
| Engineering           |    0.906103 | 0.927885 |   0.916865 |  208        |
| Environmental science |    0.975369 | 0.916667 |   0.945107 |  216        |
| Geography             |    0.758454 | 0.912791 |   0.828496 |  172        |
| Geology               |    0.96729  | 0.976415 |   0.971831 |  212        |
| History               |    0.62987  | 0.518717 |   0.568915 |  187        |
| Materials science     |    0.932432 | 0.958333 |   0.945205 |  216        |
| Mathematics           |    0.938776 | 0.94359  |   0.941176 |  195        |
| Medicine              |    0.982558 | 0.923497 |   0.952113 |  183        |
| Philosophy            |    0.752874 | 0.748571 |   0.750716 |  175        |
| Physics               |    0.964824 | 0.974619 |   0.969697 |  197        |
| Political science     |    0.642512 | 0.661692 |   0.651961 |  201        |
| Psychology            |    0.806283 | 0.758621 |   0.781726 |  203        |
| Sociology             |    0.438889 | 0.427027 |   0.432877 |  185        |
| accuracy              |    0.845641 | 0.845641 |   0.845641 |    0.845641 |
| macro avg             |    0.842083 | 0.841681 |   0.840318 | 3751        |
| weighted avg          |    0.847843 | 0.845641 |   0.845311 | 3751        |


### Framework versions

- Transformers 4.57.1
- Pytorch 2.8.0+cu126
- Datasets 3.6.0
- Tokenizers 0.22.1