Model Card for policlim

Model Description

This model detects climate change salience in (political) text. It fine-tunes base XLM-roberta using 3,434 manually annotated quasi-sentences from political manifestos (retrieved from the Manifesteo Project Database) to detect climate change salience. The model achieves a validation F1 score of .935 and accuracy of .957.

We used the model to classify the climate change salience of the rest of the political manifesto text available from the Manifesto Project Corpus (version 2024-1) for the target countries and time span, the first step of which is detailed in the paper below. The paper contains all relevant details of the training set, procedure, and evaluation of the model and final dataset.

Citation Information

@article{Sanford_Pianta_Schmid_Musto_2025, title={Policlim: A Dataset of Climate Change Discourse in the Political Manifestos of Forty-Five Countries from 1990 to 2022},
author={Sanford, Mary and Pianta, Silvia and Schmid, Nicolas and Musto, Giorgio},
journal={British Journal of Political Science},
year={2025},
volume={55},
DOI={10.1017/S0007123425100719},
pages={e131}}

How to get started with the model

You can use the model for text classification, or use it as a base model to fine-tune for additional tasks. The simpletransformers package makes this process very straightforward.

import simpletransformers
from simpletransformers.classification import ClassificationModel, ClassificationArgs

## To use for climate change salience detection:

# Load target data in whatever format preferred.
data = pd.read_csv('your_data.csv')

model = ClassificationModel(
     model_type = "xlmroberta", model_name = 'policlim'
 )

preds,output = model.predict(data['text'].tolist())

## To use for further fine-tuning
from sklearn.metrics import f1_score, precision, accuracy, recall

# Load training data. Need to have text in 'text' field and corresponding labels in 'labels' field.
new_train = pd.read_csv('your_new_train_data.csv')
new_test = pd.read_csv('your_new_test_data.csv')
new_eval = pd.read_csv('your_new_eval_data.csv')

# Initialize the model with the updated arguments
model = ClassificationModel(
    model_type="xlmroberta", 
    model_name="policlim",  
    num_labels=2,                 # Number of labels for the new task
#    args=model_args,             # Update arguments (labels, hyperparameters, processing details, model evaluation preferences) as necessary
#    weight = weights,            # For class weights   
    ignore_mismatched_sizes=True, # Required if new task has labels other than 2 
    use_cuda=True
)

# Train the model
model.train_model(train_df = new_train, eval_df = new_test,
                  f1_train = f1_score(labels, preds,average=None) # You can also add your own evaluation metrics
                  )

# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(val_df,
                                                            f1_eval = f1_score(labels, preds,average=None),
                                                            precision = precision(labels, preds,average=None),
                                                            recall = recall(labels, preds,average=None),
                                                            acc = accuracy_score(labels, preds,average=None)
                                                            )

print('\n\nThese are the results when testing the model on the test data set:\n')
print(result)

Model Sources

Repository: https://github.com/marysanford/policlim/tree/main
Paper: https://osf.io/preprints/osf/bq356
Data source: Manifesto Project Corpus, version 2024-1. Instructions for access and Terms of use available at their website https://manifesto-project.wzb.eu/ Lehmann, Pola / Franzmann, Simon / Al-Gaddooa, Denise / Burst, Tobias / Ivanusch, Christoph / Lewandowski, Jirka / Regel, Sven / Riethmüller, Felicia / Zehnter, Lisa (2024): Manifesto Corpus. Version: 2024-1. Berlin: WZB Berlin Social Science Center/Göttingen: Institute for Democracy Research (IfDem).

Model Card Authors

Mary Sanford, mary.sanford@cmcc.it

Downloads last month: 72

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for marysanford/policlim

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3973)

this model