Media Framing Detector with Longformer

A multi-label transformer model for detecting media frames in news articles and political text. This model identifies up to 15 distinct framing strategies used in communication, based on the Media Frames Corpus taxonomy (Card et al., 2015).

Model Description

This Longformer-based classifier detects media frames in text up to ~1,600 words, addressing key limitations in existing approaches:

No hallucination: Unlike GenAI prompt-engineering approaches, this discriminative model cannot hallucinate non-existent frames
Long context: Handles full articles (2048 tokens) using sparse attention, unlike sentence-level classifiers
Human-aligned: Fine-tuned on gold-standard human annotations from 19 trained annotators, combining best available data for the task

Performance: Weighted F1 of 0.686 on held-out test set, outperforming GenAI approaches while being 47x smaller than mm-framing's Mistral-7B. Trained on 385,000 total articles across political spectrum from 24 independent news providers.

Read the full write-up: https://ry-rousseau.github.io/2026/02/09/longformer-framing-classifier.html

Intended Uses

Automated detection of argumentation strategies in news media
Analysis of political communication and polarization
Social media content analysis (posts, threads, articles)
Bias detection

Frame	Description (Card et al., 2015 / Boydstun et al., 2014)
Economic	The costs, benefits, or other financial implications of the issue.
Capacity and Resources	The availability of physical, human, or financial resources, and the capacity of current systems.
Morality	Any perspective or policy objective/action compelled by religious doctrine, ethics, or social responsibility.
Fairness and Equality	The balance or distribution of rights, responsibilities, and resources; equality or inequality in the application of laws.
Legality, Constitutionality and Jurisprudence	The rights, freedoms, and authority of individuals, corporations, and government (focuses on the constraints/freedoms granted via the Constitution or laws).
Policy Prescription and Evaluation	Discussion of specific policies aimed at addressing problems, and the evaluation of whether certain policies will work or are working.
Crime and Punishment	The effectiveness and implications of laws and their enforcement (includes breaking laws, loopholes, sentencing, and punishment).
Security and Defense	Threats to the welfare of the individual, community, or nation (includes protection from not-yet-manifested threats).
Health and Safety	Health care access/effectiveness, sanitation, disease, mental health, and public safety (e.g., infrastructure safety).
Quality of Life	Threats and opportunities for the individual's wealth, happiness, and well-being (effects on mobility, ease of routine, community life).
Cultural Identity	The traditions, customs, or values of a social group in relation to a specific policy issue.
Public Opinion	Attitudes and opinions of the general public, including polling and demographics.
Political	Considerations related to politics and politicians, including lobbying, elections, and attempts to sway voters (explicit mentions of partisan maneuvering).
External Regulation and Reputation	The international reputation or foreign policy of the U.S. (relations with other nations, trade agreements).
Other	Any coherent group of frames not covered by the above categories.

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "ry-rousseau/media-framing-longformer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input with topic injection (highly recommended)
topic = "immigration" # or use topic classifier
text = "Your article text here..."
input_text = f"TOPIC:{topic}\n{text}"

# Tokenize and predict
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=2048)
outputs = model(**inputs)

# Get predictions (apply optimized thresholds per class)
probs = torch.sigmoid(outputs.logits)
predictions = (probs > 0.5).int() # Use class-specific thresholds for better performance

# Frame labels
frames = [
"Economic", "Capacity and Resources", "Morality", "Fairness and Equality",
"Legality, Constitutionality and Jurisprudence", "Policy Prescription and Evaluation",
"Crime and Punishment", "Security and Defense", "Health and Safety",
"Quality of Life", "Cultural Identity", "Public Opinion", "Political",
"External Regulation and Reputation", "Other"
]

detected_frames = [frames[i] for i, pred in enumerate(predictions[0]) if pred == 1]
print("Detected frames:", detected_frames)

The topic classifier is RoBERTa with a head-tail truncation strategy for long articles, trained on 64,000 examples:

Topic Classifier: Use this Model

Training Data

Gold Standard (Human Annotations)

Media Frames Corpus (MFC): 2,224 NYT articles on immigration, same-sex marriage, and smoking (1990-2012)
SemEval 2023 Task 3: 516 articles on Ukraine-Russia war, COVID-19, migration, abortion, climate change (2020-2022)
Total: 2,740 articles with annotations from 19 trained human annotators
Annotation strategy: Union of annotations (mean 4.14 frames per article)

Silver Training Data

Source: copenlu/mm-framing dataset
Size: ~380,000 articles labeled by Mistral-7B
Purpose: Pre-training to learn frame patterns before gold fine-tuning

Training Procedure

Phase 1: Silver Training

Trained on machine-labeled data to learn broad frame patterns:

Model: allenai/longformer-base-4096
Loss: Binary cross-entropy with class weights
Epochs: 4
Batch size: 16 (effective 32 with gradient accumulation)
Learning rate: 2e-5
Duration: 72 hours on NVIDIA A40 (48GB VRAM)

Phase 2: Gold Fine-Tuning

Fine-tuned on human annotations to align with expert reasoning:

Loss: Focal loss (gamma=2.0) to focus on difficult examples
Epochs: 10
Batch size: 2 (effective 16 with gradient accumulation)
Learning rate: 2e-5 with scheduler
Validation: 90/10 split

Key Features

Topic injection: Prepending article topic improves Micro F1 by +2.7%
Global attention: Applied to [CLS] token and topic token for mixture-of-experts effect
Threshold optimization: Post-training per-class threshold tuning for class imbalance

Evaluation Results

Overall Performance (Test Set)

Metric	Score
Weighted F1	0.686
Micro F1	0.685
Macro F1	0.645

Per-Frame Performance

Frame	Precision	Recall	F1	Support
Economic	0.78	0.70	0.74	87
Capacity & Resources	0.44	0.48	0.46	25
Morality	0.53	0.64	0.58	61
Fairness & Equality	0.60	0.52	0.56	63
Legality	0.83	0.79	0.81	164
Policy Prescription	0.58	0.85	0.69	110
Crime & Punishment	0.63	0.86	0.73	87
Security & Defense	0.49	0.72	0.58	29
Health & Safety	0.61	0.77	0.68	69
Quality of Life	0.57	0.74	0.64	96
Cultural Identity	0.46	0.74	0.57	72
Public Opinion	0.51	0.77	0.61	81
Political	0.70	0.94	0.81	134
External Regulation	0.92	0.41	0.57	29

Note: "Other" frame excluded from evaluation metrics, as per SemEval method.

More Info

GitHub Repository: https://github.com/Ry-Rousseau/frame-delta
Project Write-Up: https://ry-rousseau.github.io/2026/02/09/longformer-framing-classifier.html

Model will be iterated in future. It should run on most mid-range GPUs locally with gradient accumulation.

@misc{rousseau2026framing, title={15-Class Media Framing Detector with Longformer}, author={Rousseau, Ryan}, year={2026}, howpublished={\url{https://huggingface.co/ry-rousseau/media-framing-longformer}} }

Downloads last month: 22

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ry-rousseau/longformer-framing-gold

Base model

allenai/longformer-base-4096

Finetuned

(134)

this model

ry-rousseau
/

longformer-framing-gold