emotion / README.md
thomasrenault's picture
Update README.md
8b7f353 verified
---
language: en
license: mit
tags:
- text-classification
- multi-label-classification
- emotion-analysis
- political-text
- tweets
- distilbert
datasets:
- thomasrenault/us_tweet_speech_congress
metrics:
- rmse
- mae
base_model: distilbert-base-uncased
pipeline_tag: text-classification
---
# thomasrenault/emotion
A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
## Labels
The model predicts **8 independent emotion intensities** (sigmoid, range 0–1):
| Label |
|---|---|
| `anger` |
| `sadness` |
| `fear` |
| `disgust` |
| `pride` |
| `joy` |
| `gratitude` |
| `hope` |
Scores are **independent** — multiple emotions can be high simultaneously.
## Training
| Setting | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
| Problem type | `multi_label_classification` |
| Training data | ~200,000 labeled documents |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
| Epochs | 4 |
| Learning rate | 2e-5 |
| Batch size | 16 |
| Max length | 512 tokens |
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "thomasrenault/emotion"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"]
THRESHOLD = 0.5
def predict(text):
enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD]
return matched or ["no emotion"]
sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"]
for sentence in sentences:
print(sentence, predict(sentence))
# Enough lies, enough hypocrisy ['anger']
# I'm so proud of our govenrment ['pride']
# Climate change is a risk to our planet ['fear']
# Trump is the president of the US ['no emotion']
```
## Intended Use
- Academic research on emotion in political communication
- Analysis of congressional speeches and social media
- Temporal trend analysis of emotional rhetoric
## Limitations
- Trained exclusively on **US English political text** — performance may degrade on other domains
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
- Labels are silver-standard (LLM-generated), not human-verified gold labels
## Citation
If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ :
```
@article{algan2026emotions,
title={Emotions and policy views},
author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.},
year={2026}
}
```