Media Framing Detector with Longformer
A multi-label transformer model for detecting media frames in news articles and political text. This model identifies up to 15 distinct framing strategies used in communication, based on the Media Frames Corpus taxonomy (Card et al., 2015).
Model Description
This Longformer-based classifier detects media frames in text up to ~1,600 words, addressing key limitations in existing approaches:
- No hallucination: Unlike GenAI prompt-engineering approaches, this discriminative model cannot hallucinate non-existent frames
- Long context: Handles full articles (2048 tokens) using sparse attention, unlike sentence-level classifiers
- Human-aligned: Fine-tuned on gold-standard human annotations from 19 trained annotators, combining best available data for the task
Performance: Weighted F1 of 0.686 on held-out test set, outperforming GenAI approaches while being 47x smaller than mm-framing's Mistral-7B. Trained on 385,000 total articles across political spectrum from 24 independent news providers.
Read the full write-up: https://ry-rousseau.github.io/2026/02/09/longformer-framing-classifier.html
Intended Uses
- Automated detection of argumentation strategies in news media
- Analysis of political communication and polarization
- Social media content analysis (posts, threads, articles)
- Bias detection
| Frame | Description (Card et al., 2015 / Boydstun et al., 2014) |
|---|---|
| Economic | The costs, benefits, or other financial implications of the issue. |
| Capacity and Resources | The availability of physical, human, or financial resources, and the capacity of current systems. |
| Morality | Any perspective or policy objective/action compelled by religious doctrine, ethics, or social responsibility. |
| Fairness and Equality | The balance or distribution of rights, responsibilities, and resources; equality or inequality in the application of laws. |
| Legality, Constitutionality and Jurisprudence | The rights, freedoms, and authority of individuals, corporations, and government (focuses on the constraints/freedoms granted via the Constitution or laws). |
| Policy Prescription and Evaluation | Discussion of specific policies aimed at addressing problems, and the evaluation of whether certain policies will work or are working. |
| Crime and Punishment | The effectiveness and implications of laws and their enforcement (includes breaking laws, loopholes, sentencing, and punishment). |
| Security and Defense | Threats to the welfare of the individual, community, or nation (includes protection from not-yet-manifested threats). |
| Health and Safety | Health care access/effectiveness, sanitation, disease, mental health, and public safety (e.g., infrastructure safety). |
| Quality of Life | Threats and opportunities for the individual's wealth, happiness, and well-being (effects on mobility, ease of routine, community life). |
| Cultural Identity | The traditions, customs, or values of a social group in relation to a specific policy issue. |
| Public Opinion | Attitudes and opinions of the general public, including polling and demographics. |
| Political | Considerations related to politics and politicians, including lobbying, elections, and attempts to sway voters (explicit mentions of partisan maneuvering). |
| External Regulation and Reputation | The international reputation or foreign policy of the U.S. (relations with other nations, trade agreements). |
| Other | Any coherent group of frames not covered by the above categories. |
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "ry-rousseau/media-framing-longformer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input with topic injection (highly recommended)
topic = "immigration" # or use topic classifier
text = "Your article text here..."
input_text = f"TOPIC:{topic}\n{text}"
# Tokenize and predict
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=2048)
outputs = model(**inputs)
# Get predictions (apply optimized thresholds per class)
probs = torch.sigmoid(outputs.logits)
predictions = (probs > 0.5).int() # Use class-specific thresholds for better performance
# Frame labels
frames = [
"Economic", "Capacity and Resources", "Morality", "Fairness and Equality",
"Legality, Constitutionality and Jurisprudence", "Policy Prescription and Evaluation",
"Crime and Punishment", "Security and Defense", "Health and Safety",
"Quality of Life", "Cultural Identity", "Public Opinion", "Political",
"External Regulation and Reputation", "Other"
]
detected_frames = [frames[i] for i, pred in enumerate(predictions[0]) if pred == 1]
print("Detected frames:", detected_frames)
The topic classifier is RoBERTa with a head-tail truncation strategy for long articles, trained on 64,000 examples:
- Topic Classifier: Use this Model
Training Data
Gold Standard (Human Annotations)
- Media Frames Corpus (MFC): 2,224 NYT articles on immigration, same-sex marriage, and smoking (1990-2012)
- SemEval 2023 Task 3: 516 articles on Ukraine-Russia war, COVID-19, migration, abortion, climate change (2020-2022)
- Total: 2,740 articles with annotations from 19 trained human annotators
- Annotation strategy: Union of annotations (mean 4.14 frames per article)
Silver Training Data
- Source:
copenlu/mm-framingdataset - Size: ~380,000 articles labeled by Mistral-7B
- Purpose: Pre-training to learn frame patterns before gold fine-tuning
Training Procedure
Phase 1: Silver Training
Trained on machine-labeled data to learn broad frame patterns:
- Model:
allenai/longformer-base-4096 - Loss: Binary cross-entropy with class weights
- Epochs: 4
- Batch size: 16 (effective 32 with gradient accumulation)
- Learning rate: 2e-5
- Duration: 72 hours on NVIDIA A40 (48GB VRAM)
Phase 2: Gold Fine-Tuning
Fine-tuned on human annotations to align with expert reasoning:
- Loss: Focal loss (gamma=2.0) to focus on difficult examples
- Epochs: 10
- Batch size: 2 (effective 16 with gradient accumulation)
- Learning rate: 2e-5 with scheduler
- Validation: 90/10 split
Key Features
- Topic injection: Prepending article topic improves Micro F1 by +2.7%
- Global attention: Applied to [CLS] token and topic token for mixture-of-experts effect
- Threshold optimization: Post-training per-class threshold tuning for class imbalance
Evaluation Results
Overall Performance (Test Set)
| Metric | Score |
|---|---|
| Weighted F1 | 0.686 |
| Micro F1 | 0.685 |
| Macro F1 | 0.645 |
Per-Frame Performance
| Frame | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Economic | 0.78 | 0.70 | 0.74 | 87 |
| Capacity & Resources | 0.44 | 0.48 | 0.46 | 25 |
| Morality | 0.53 | 0.64 | 0.58 | 61 |
| Fairness & Equality | 0.60 | 0.52 | 0.56 | 63 |
| Legality | 0.83 | 0.79 | 0.81 | 164 |
| Policy Prescription | 0.58 | 0.85 | 0.69 | 110 |
| Crime & Punishment | 0.63 | 0.86 | 0.73 | 87 |
| Security & Defense | 0.49 | 0.72 | 0.58 | 29 |
| Health & Safety | 0.61 | 0.77 | 0.68 | 69 |
| Quality of Life | 0.57 | 0.74 | 0.64 | 96 |
| Cultural Identity | 0.46 | 0.74 | 0.57 | 72 |
| Public Opinion | 0.51 | 0.77 | 0.61 | 81 |
| Political | 0.70 | 0.94 | 0.81 | 134 |
| External Regulation | 0.92 | 0.41 | 0.57 | 29 |
Note: "Other" frame excluded from evaluation metrics, as per SemEval method.
More Info
- GitHub Repository: https://github.com/Ry-Rousseau/frame-delta
- Project Write-Up: https://ry-rousseau.github.io/2026/02/09/longformer-framing-classifier.html
Model will be iterated in future. It should run on most mid-range GPUs locally with gradient accumulation.
@misc{rousseau2026framing, title={15-Class Media Framing Detector with Longformer}, author={Rousseau, Ryan}, year={2026}, howpublished={\url{https://huggingface.co/ry-rousseau/media-framing-longformer}} }
- Downloads last month
- 12
Model tree for ry-rousseau/longformer-framing-gold
Base model
allenai/longformer-base-4096