Benchmark result for paper titled "Towards Robust Sentiment Analysis of Temporally-Sensitive Policy-Related Online Text", accepted into ACL-SRW.
Classification of sentiments towards Climate change from Tweets (now known as X).
This model classifies sentiments towards AI perceptions "News" (2), "Neutral" (0), "Pro" (1) or "Anti" (-1)
Training Dataset
This was fine-tuned on the Twitter Climate Change Sentiment Dataset with microsoft/deberta-v3-large serving as the base model.
The Climate Change Twitter Dataset contains 43,943 annotated tweets surrounding climate change sentiments spanning Apr 27, 2015 and Feb 21, 2018.
Tweets are labeled as Pro-, Anti-, Neutral- and News- stance towards climate change.
10,000 data points were sampled through continous-time series clustering and were used to finetune the data in order to mimic the constraints of annotation and finetuning in policy-related studies. (To understand why we do that, refer to our paper at the reference below)
How to use model
Here is some source code to get you started on using the model to classify the sentiments.
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
from tqdm import tqdm
import pandas as pd
def classify_tweets(df, text_col, model, tokenizer):
df[text_col]=df[text_col].astype(str)
device = 0 if torch.cuda.is_available() else -1 # Use GPU if available
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
device=device,
truncation=True, # Ensures inputs don't exceed max length
max_length=512, # Manually set to avoid exceeding model's limit
padding="max_length" # Ensures all inputs have the same length
)
outcomes, probs, pred_labels = [], [], []
for text in tqdm(df[text_col]): # Fixed tqdm syntax
preds = classifier(text, return_all_scores=True)
outcomes.append(preds)
# Extract probabilities and predicted label
label_scores = {entry['label']: entry['score'] for entry in preds[0]}
probs.append(list(label_scores.values()))
pred_labels.append(max(label_scores, key=label_scores.get))
df["predicted_label"] = pred_labels
return df
model_name="cja5553/climate-change-sentiments-deBERTa-large"
id2label={2:"News",1:"Pro",0:"Neutral",-1:"Anti"}
label2id = {v: k for k, v in label2id.items()}
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=len(label2id),
label2id=label2id,id2label=id2label,
use_auth_token=False).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name,use_auth_token=False)
text_col="text" # change text column accordingly
df_with_classification=classify_tweets(df, text_col, model, tokenizer)
Citation
If you find this model useful, please cite the following paper:
@inproceedings{
author={Charles Alba, Benjamin C Warner, Akshar Saxena, Jiaxin Huang, Ruopeng An},
title={Towards Robust Sentiment Analysis of Temporally-Sensitive Policy-Related Online Text},
year={2025},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), Volume 4: Student Research Workshop.},
url={https://aclanthology.org/2025.acl-srw.70/}}
Code
Code used to train these models are available on GitHub at github.com/cja5553/ctscams
Questions?
contact me at alba@wustl.edu
- Downloads last month
- 7
Model tree for cja5553/climate-change-sentiments-deBERTa-large
Base model
microsoft/deberta-v3-large