Model Card: Verdict-Normaliser-RoBERTa

Model Description

Verdict-Normaliser-RoBERTa is a fine-tuned RoBERTa model designed to normalise fact-checking verdicts into a unified six-point rating scale. Fact-checking organisations express their conclusions using diverse and often organisation-specific verdict formats. This model helps standardise those heterogeneous verdicts to support large-scale automated analysis.

The model was trained as part of the FACTors dataset pipeline, where verdict normalisation could not be handled through simple keyword mapping due to the high variability and complexity of original verdict formulations.

Target labels (6 classes):

  • True
  • Partially true
  • False
  • Misleading
  • Unverifiable
  • Other

Intended Use

Primary Use

This model is intended for:

  • Normalising original fact-checking verdict texts into a common label space
  • Supporting research on misinformation, fact-checking, and credibility analysis
  • Preprocessing heterogeneous fact-checking datasets for downstream NLP tasks

Out of Scope

This model is not intended to:

  • Independently verify factual claims
  • Replace human fact-checkers
  • Be used as a real-time truth assessment system

It only predicts a normalised verdict category based on patterns learned from past fact-checking data.


Training Data

The training data comes from the FACTors dataset, which aggregates fact-checks from multiple organisations.

Data Preparation Process

A three-step methodology was followed:

  1. Manual mapping of short verdicts
    All unique original verdict texts shorter than five words were manually reviewed. Verdicts that could be clearly aligned with one of the six predefined ratings were mapped directly.

    • 68 unique original verdict formats were mapped
    • Covering 72,309 fact-checks
    • From 33 fact-checking organisations
  2. Model-based normalisation
    The content-verdict pairs from the manually mapped subset were used to fine-tune a base RoBERTa model. This model was then used to predict normalised labels for the remaining fact-checks.

  3. Manual review of low-confidence predictions
    Predictions with model confidence below 0.5 were manually reviewed.

    • 1,564 predictions were inspected and corrected where necessary

Training Procedure

  • Base model: RoBERTa-base
  • Task: Multi-class text classification
  • Input: Fact-check content paired with its original verdict text
  • Learning rate: 3e-5
  • Epochs: 3
  • Train/test split: 90:10

Performance

  • Accuracy: 0.849 on the held-out test split

This performance is consistent with previously reported results for related verdict classification tasks in the literature.


Evaluation

Evaluation was performed using a random 90:10 train-test split on the manually mapped subset of the data. Accuracy was used as the primary evaluation metric.

Because verdict language varies substantially across organisations, real-world performance may differ when applied to new sources with unseen verdict styles.


Limitations

  • The model learns patterns from historical fact-checking language and may not generalise well to:
    • New organisations with very different verdict phrasing
    • Long narrative verdict explanations instead of short labels
  • The six-class scheme is a simplification and may not capture subtle distinctions used by some organisations
  • Model predictions reflect past human judgements and may inherit their biases and inconsistencies

Ethical Considerations

  • This model does not determine truth. It only maps existing verdict language into a standardised label space.
  • Using the model outside research or data normalisation contexts may lead to misinterpretation of its outputs as factual judgements.
  • Care should be taken when applying the model to politically or socially sensitive content.

Citation

If you use this model, please cite the FACTors dataset and the associated publication describing the verdict normalisation methodology as follows:

@inproceedings{FACTors2025,
  title={{FACTors}: A New Dataset for Studying Fact-checking Ecosystem},
  authors={Altuncu, Enes and 
           Ba\c{s}kent, Can. and 
           Bhattacherjee, Sanjay and 
           Li, Shujun and 
           Roy, Dwaipayan},
  year={2025},
  numpages={10},
  doi={10.1145/3726302.3730339},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25), July 13--18, 2025, Padua, Italy},
  publisher={ACM},
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ealtuncu/verdict-normaliser-roberta

Finetuned
(2360)
this model