File size: 3,190 Bytes
f2e0b5c f35ab69 f2e0b5c f35ab69 9b9204b f2e0b5c f35ab69 f2e0b5c f35ab69 f2e0b5c f35ab69 f2e0b5c f35ab69 f2e0b5c d07457b f2e0b5c d07457b f2e0b5c d07457b 8b7f353 f2e0b5c c3a2a4a f2e0b5c f35ab69 f2e0b5c d07457b f2e0b5c d07457b f2e0b5c f35ab69 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | ---
language: en
license: mit
tags:
- text-classification
- multi-label-classification
- emotion-analysis
- political-text
- tweets
- distilbert
datasets:
- thomasrenault/us_tweet_speech_congress
metrics:
- rmse
- mae
base_model: distilbert-base-uncased
pipeline_tag: text-classification
---
# thomasrenault/emotion
A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
## Labels
The model predicts **8 independent emotion intensities** (sigmoid, range 0–1):
| Label |
|---|---|
| `anger` |
| `sadness` |
| `fear` |
| `disgust` |
| `pride` |
| `joy` |
| `gratitude` |
| `hope` |
Scores are **independent** — multiple emotions can be high simultaneously.
## Training
| Setting | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
| Problem type | `multi_label_classification` |
| Training data | ~200,000 labeled documents |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
| Epochs | 4 |
| Learning rate | 2e-5 |
| Batch size | 16 |
| Max length | 512 tokens |
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "thomasrenault/emotion"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"]
THRESHOLD = 0.5
def predict(text):
enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD]
return matched or ["no emotion"]
sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"]
for sentence in sentences:
print(sentence, predict(sentence))
# Enough lies, enough hypocrisy ['anger']
# I'm so proud of our govenrment ['pride']
# Climate change is a risk to our planet ['fear']
# Trump is the president of the US ['no emotion']
```
## Intended Use
- Academic research on emotion in political communication
- Analysis of congressional speeches and social media
- Temporal trend analysis of emotional rhetoric
## Limitations
- Trained exclusively on **US English political text** — performance may degrade on other domains
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
- Labels are silver-standard (LLM-generated), not human-verified gold labels
## Citation
If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ :
```
@article{algan2026emotions,
title={Emotions and policy views},
author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.},
year={2026}
}
```
|