Model Card for policlim
Model Description
This model detects climate change salience in (political) text. It fine-tunes base XLM-roberta using 3,434 manually annotated quasi-sentences from political manifestos (retrieved from the Manifesteo Project Database) to detect climate change salience. The model achieves a validation F1 score of .935 and accuracy of .957.
We have used the model to classify the climate change salience of political manifestos, the first step of which is detailed in the working paper below. The paper contains all relevant details of the training set, procedure, and evaluation of the model and final dataset.
Citation Information
@techreport{sanford2024policlim,
title={Policlim: A Dataset of Climate Change Discourse in the Political Manifestos of 45 Countries from 1990-2022},
author={Sanford, Mary and Pianta, Silvia and Schmid, Nicolas and Musto, Giorgio},
type={Working paper},
doi={https://osf.io/preprints/osf/bq356_v4},
year={2025}
}
How to get started with the model
You can use the model for text classification, or use it as a base model to fine-tune for additional tasks. The simpletransformers package makes this process very straightforward.
import simpletransformers
from simpletransformers.classification import ClassificationModel, ClassificationArgs
## To use for climate change salience detection:
# Load target data in whatever format preferred.
data = pd.read_csv('your_data.csv')
model = ClassificationModel(
model_type = "xlmroberta", model_name = 'policlim'
)
preds,output = model.predict(data['text'].tolist())
## To use for further fine-tuning
from sklearn.metrics import f1_score, precision, accuracy, recall
# Load training data. Need to have text in 'text' field and corresponding labels in 'labels' field.
new_train = pd.read_csv('your_new_train_data.csv')
new_test = pd.read_csv('your_new_test_data.csv')
new_eval = pd.read_csv('your_new_eval_data.csv')
# Initialize the model with the updated arguments
model = ClassificationModel(
model_type="xlmroberta",
model_name="policlim",
num_labels=2, # Number of labels for the new task
# args=model_args, # Update arguments (labels, hyperparameters, processing details, model evaluation preferences) as necessary
# weight = weights, # For class weights
ignore_mismatched_sizes=True, # Required if new task has labels other than 2
use_cuda=True
)
# Train the model
model.train_model(train_df = new_train, eval_df = new_test,
f1_train = f1_score(labels, preds,average=None) # You can also add your own evaluation metrics
)
# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(val_df,
f1_eval = f1_score(labels, preds,average=None),
precision = precision(labels, preds,average=None),
recall = recall(labels, preds,average=None),
acc = accuracy_score(labels, preds,average=None)
)
print('\n\nThese are the results when testing the model on the test data set:\n')
print(result)
Model Sources
- Repository: https://github.com/marysanford/policlim/tree/main
- Paper: https://osf.io/preprints/osf/bq356
- Data source: https://manifesto-project.wzb.eu/
Model Card Authors
Mary Sanford, mary.sanford@cmcc.it
- Downloads last month
- 9
Model tree for marysanford/policlim
Base model
FacebookAI/xlm-roberta-base