thomasrenault
/

emotion

Text Classification

multi-label-classification

emotion-analysis

Model card Files Files and versions

emotion / README.md

thomasrenault's picture

Update README.md

8b7f353 verified 18 days ago

|

history blame contribute delete

3.19 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- multi-label-classification
	- emotion-analysis
	- political-text
	- tweets
	- distilbert
	datasets:
	- thomasrenault/us_tweet_speech_congress
	metrics:
	- rmse
	- mae
	base_model: distilbert-base-uncased
	pipeline_tag: text-classification
	---

	# thomasrenault/emotion

	A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.

	## Labels

	The model predicts 8 independent emotion intensities (sigmoid, range 0–1):

	\| Label \|
	\|---\|---\|
	\| `anger` \|
	\| `sadness` \|
	\| `fear` \|
	\| `disgust` \|
	\| `pride` \|
	\| `joy` \|
	\| `gratitude` \|
	\| `hope` \|

	Scores are independent — multiple emotions can be high simultaneously.

	## Training

	\| Setting \| Value \|
	\|---\|---\|
	\| Base model \| `distilbert-base-uncased` \|
	\| Architecture \| `DistilBertForSequenceClassification` (multi-label) \|
	\| Problem type \| `multi_label_classification` \|
	\| Training data \| ~200,000 labeled documents \|
	\| Annotation \| GPT-4o-mini (temperature=0) via OpenAI Batch API \|
	\| Epochs \| 4 \|
	\| Learning rate \| 2e-5 \|
	\| Batch size \| 16 \|
	\| Max length \| 512 tokens \|
	\| Domain \| US tweets about policy, campaign speeches and congressional floor speeches \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_id = "thomasrenault/emotion"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSequenceClassification.from_pretrained(model_id)
	model.eval()

	EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"]
	THRESHOLD = 0.5

	def predict(text):
	enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
	with torch.no_grad():
	probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
	matched = [t for t, p in zip(EMOTIONS, probs) if p >= THRESHOLD]
	return matched or ["no emotion"]


	sentences = ["Enough lies, enough hypocrisy", "I'm so proud of our govenrment", "Climate change is a risk to our planet","Trump is the president of the US"]
	for sentence in sentences:
	print(sentence, predict(sentence))

	# Enough lies, enough hypocrisy ['anger']
	# I'm so proud of our govenrment ['pride']
	# Climate change is a risk to our planet ['fear']
	# Trump is the president of the US ['no emotion']

	```

	## Intended Use

	- Academic research on emotion in political communication
	- Analysis of congressional speeches and social media
	- Temporal trend analysis of emotional rhetoric

	## Limitations

	- Trained exclusively on US English political text — performance may degrade on other domains
	- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
	- Labels are silver-standard (LLM-generated), not human-verified gold labels

	## Citation

	If you use this model, please cite https://socialeconomicslab.org/research/working-papers/emotions-and-policy/ :

	```
	@article{algan2026emotions,
	title={Emotions and policy views},
	author={Algan, Y., Davoine E., Renault, T. and Stantcheva, S.},
	year={2026}
	}
	```