File size: 3,190 Bytes

f2e0b5c
f35ab69
 
 
 
 
 
 
 
 
f2e0b5c
f35ab69
 
 
 
 
9b9204b
f2e0b5c
 
 
 
f35ab69
f2e0b5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f35ab69
f2e0b5c
f35ab69
f2e0b5c
 
 
f35ab69
f2e0b5c
 
 
 
 
 
 
 
 
 
 
 
 
d07457b
f2e0b5c
 
 
 
 
d07457b
 
f2e0b5c
d07457b
 
 
 
 
8b7f353
 
 
 
f2e0b5c
c3a2a4a
 
f2e0b5c
 
 
f35ab69
f2e0b5c
 
 
 
 
 
 
 
 
 
d07457b
f2e0b5c
 
d07457b
 
 
 
f2e0b5c
f35ab69

---
language: en
license: mit
tags:
  - text-classification
  - multi-label-classification
  - emotion-analysis
  - political-text
  - tweets
  - distilbert
datasets:
  - thomasrenault/us_tweet_speech_congress
metrics:
  - rmse
  - mae
base_model: distilbert-base-uncased
pipeline_tag: text-classification
---

# thomasrenault/emotion

A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches.  Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.

## Labels

The model predicts **8 independent emotion intensities** (sigmoid, range 0–1):

| Label | 
|---|---|
| `anger` | 
| `sadness` | 
| `fear` | 
| `disgust` | 
| `pride` | 
| `joy` | 
| `gratitude` | 
| `hope` | 

Scores are **independent** — multiple emotions can be high simultaneously.

## Training

| Setting | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
| Problem type | `multi_label_classification` |
| Training data | ~200,000 labeled documents |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
| Epochs | 4 |
| Learning rate | 2e-5 |
| Batch size | 16 |
| Max length | 512 tokens |
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "thomasrenault/emotion"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"]
THRESHOLD = 0.5

def predict(text):
    enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
    matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD]
    return matched or ["no emotion"]


sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"]
for sentence in sentences:
    print(sentence, predict(sentence))

# Enough lies, enough hypocrisy ['anger']
# I'm so proud of our govenrment ['pride']
# Climate change is a risk to our planet ['fear']
# Trump is the president of the US ['no emotion']

```

## Intended Use

- Academic research on emotion in political communication
- Analysis of congressional speeches and social media
- Temporal trend analysis of emotional rhetoric

## Limitations

- Trained exclusively on **US English political text** — performance may degrade on other domains
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
- Labels are silver-standard (LLM-generated), not human-verified gold labels

## Citation

If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ :

```
@article{algan2026emotions,
  title={Emotions and policy views},
  author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.},
  year={2026}
}
```