Model Card: Verdict-Normaliser-RoBERTa

Model Description

Verdict-Normaliser-RoBERTa is a fine-tuned RoBERTa model designed to normalise fact-checking verdicts into a unified six-point rating scale. Fact-checking organisations express their conclusions using diverse and often organisation-specific verdict formats. This model helps standardise those heterogeneous verdicts to support large-scale automated analysis.

The model was trained as part of the FACTors dataset pipeline, where verdict normalisation could not be handled through simple keyword mapping due to the high variability and complexity of original verdict formulations.

Target labels (6 classes):

True
Partially true
False
Misleading
Unverifiable
Other

Intended Use

Primary Use

This model is intended for:

Normalising original fact-checking verdict texts into a common label space
Supporting research on misinformation, fact-checking, and credibility analysis
Preprocessing heterogeneous fact-checking datasets for downstream NLP tasks

Out of Scope

This model is not intended to:

Independently verify factual claims
Replace human fact-checkers
Be used as a real-time truth assessment system

It only predicts a normalised verdict category based on patterns learned from past fact-checking data.

Training Data

The training data comes from the FACTors dataset, which aggregates fact-checks from multiple organisations.

Data Preparation Process

A three-step methodology was followed:

Manual mapping of short verdicts
All unique original verdict texts shorter than five words were manually reviewed. Verdicts that could be clearly aligned with one of the six predefined ratings were mapped directly.
- 68 unique original verdict formats were mapped
- Covering 72,309 fact-checks
- From 33 fact-checking organisations
Model-based normalisation
The content-verdict pairs from the manually mapped subset were used to fine-tune a base RoBERTa model. This model was then used to predict normalised labels for the remaining fact-checks.
Manual review of low-confidence predictions
Predictions with model confidence below 0.5 were manually reviewed.
- 1,564 predictions were inspected and corrected where necessary

Training Procedure

Base model: RoBERTa-base
Task: Multi-class text classification
Input: Fact-check content paired with its original verdict text
Learning rate: 3e-5
Epochs: 3
Train/test split: 90:10

Performance

Accuracy: 0.849 on the held-out test split

This performance is consistent with previously reported results for related verdict classification tasks in the literature.

Evaluation

Evaluation was performed using a random 90:10 train-test split on the manually mapped subset of the data. Accuracy was used as the primary evaluation metric.

Because verdict language varies substantially across organisations, real-world performance may differ when applied to new sources with unseen verdict styles.

Limitations

The model learns patterns from historical fact-checking language and may not generalise well to:
- New organisations with very different verdict phrasing
- Long narrative verdict explanations instead of short labels
The six-class scheme is a simplification and may not capture subtle distinctions used by some organisations
Model predictions reflect past human judgements and may inherit their biases and inconsistencies

Ethical Considerations

This model does not determine truth. It only maps existing verdict language into a standardised label space.
Using the model outside research or data normalisation contexts may lead to misinterpretation of its outputs as factual judgements.
Care should be taken when applying the model to politically or socially sensitive content.

Citation

If you use this model, please cite the FACTors dataset and the associated publication describing the verdict normalisation methodology as follows:

@inproceedings{FACTors2025,
  title={{FACTors}: A New Dataset for Studying Fact-checking Ecosystem},
  authors={Altuncu, Enes and 
           Ba\c{s}kent, Can. and 
           Bhattacherjee, Sanjay and 
           Li, Shujun and 
           Roy, Dwaipayan},
  year={2025},
  numpages={10},
  doi={10.1145/3726302.3730339},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25), July 13--18, 2025, Padua, Italy},
  publisher={ACM},
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ealtuncu/verdict-normaliser-roberta

Base model

FacebookAI/roberta-base

Finetuned

(2360)

this model