--- language: en license: mit tags: - text-classification - multi-label-classification - emotion-analysis - political-text - tweets - distilbert datasets: - thomasrenault/us_tweet_speech_congress metrics: - rmse - mae base_model: distilbert-base-uncased pipeline_tag: text-classification --- # thomasrenault/emotion A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API. ## Labels The model predicts **8 independent emotion intensities** (sigmoid, range 0–1): | Label | |---|---| | `anger` | | `sadness` | | `fear` | | `disgust` | | `pride` | | `joy` | | `gratitude` | | `hope` | Scores are **independent** — multiple emotions can be high simultaneously. ## Training | Setting | Value | |---|---| | Base model | `distilbert-base-uncased` | | Architecture | `DistilBertForSequenceClassification` (multi-label) | | Problem type | `multi_label_classification` | | Training data | ~200,000 labeled documents | | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API | | Epochs | 4 | | Learning rate | 2e-5 | | Batch size | 16 | | Max length | 512 tokens | | Domain | US tweets about policy, campaign speeches and congressional floor speeches | ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_id = "thomasrenault/emotion" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) model.eval() EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"] THRESHOLD = 0.5 def predict(text): enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): probs = torch.sigmoid(model(**enc).logits).squeeze().tolist() matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD] return matched or ["no emotion"] sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"] for sentence in sentences: print(sentence, predict(sentence)) # Enough lies, enough hypocrisy ['anger'] # I'm so proud of our govenrment ['pride'] # Climate change is a risk to our planet ['fear'] # Trump is the president of the US ['no emotion'] ``` ## Intended Use - Academic research on emotion in political communication - Analysis of congressional speeches and social media - Temporal trend analysis of emotional rhetoric ## Limitations - Trained exclusively on **US English political text** — performance may degrade on other domains - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy - Labels are silver-standard (LLM-generated), not human-verified gold labels ## Citation If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ : ``` @article{algan2026emotions, title={Emotions and policy views}, author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.}, year={2026} } ```