| --- |
| language: en |
| license: mit |
| tags: |
| - text-classification |
| - multi-label-classification |
| - emotion-analysis |
| - political-text |
| - tweets |
| - distilbert |
| datasets: |
| - thomasrenault/us_tweet_speech_congress |
| metrics: |
| - rmse |
| - mae |
| base_model: distilbert-base-uncased |
| pipeline_tag: text-classification |
| --- |
| |
| # thomasrenault/emotion |
|
|
| A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API. |
|
|
| ## Labels |
|
|
| The model predicts **8 independent emotion intensities** (sigmoid, range 0–1): |
|
|
| | Label | |
| |---|---| |
| | `anger` | |
| | `sadness` | |
| | `fear` | |
| | `disgust` | |
| | `pride` | |
| | `joy` | |
| | `gratitude` | |
| | `hope` | |
|
|
| Scores are **independent** — multiple emotions can be high simultaneously. |
|
|
| ## Training |
|
|
| | Setting | Value | |
| |---|---| |
| | Base model | `distilbert-base-uncased` | |
| | Architecture | `DistilBertForSequenceClassification` (multi-label) | |
| | Problem type | `multi_label_classification` | |
| | Training data | ~200,000 labeled documents | |
| | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API | |
| | Epochs | 4 | |
| | Learning rate | 2e-5 | |
| | Batch size | 16 | |
| | Max length | 512 tokens | |
| | Domain | US tweets about policy, campaign speeches and congressional floor speeches | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_id = "thomasrenault/emotion" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| model.eval() |
| |
| EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"] |
| THRESHOLD = 0.5 |
| |
| def predict(text): |
| enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| probs = torch.sigmoid(model(**enc).logits).squeeze().tolist() |
| matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD] |
| return matched or ["no emotion"] |
| |
| |
| sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"] |
| for sentence in sentences: |
| print(sentence, predict(sentence)) |
| |
| # Enough lies, enough hypocrisy ['anger'] |
| # I'm so proud of our govenrment ['pride'] |
| # Climate change is a risk to our planet ['fear'] |
| # Trump is the president of the US ['no emotion'] |
| |
| ``` |
|
|
| ## Intended Use |
|
|
| - Academic research on emotion in political communication |
| - Analysis of congressional speeches and social media |
| - Temporal trend analysis of emotional rhetoric |
|
|
| ## Limitations |
|
|
| - Trained exclusively on **US English political text** — performance may degrade on other domains |
| - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy |
| - Labels are silver-standard (LLM-generated), not human-verified gold labels |
|
|
| ## Citation |
|
|
| If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ : |
|
|
| ``` |
| @article{algan2026emotions, |
| title={Emotions and policy views}, |
| author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.}, |
| year={2026} |
| } |
| ``` |
|
|