metadata
license: mit
datasets:
- copenlu/mm-framing
RoBERTa topic classifier for topic injection into the Longformer Framing Classifier. Classifies input text into one of 19 discrete topics:
- Business & Economy
- Crime & Safety
- Disaster & Accidents
- Education
- Entertainment
- Environment & Nature
- Health
- Immigration
- Infrastructure & Transport
- Legal
- Lifestyle & Culture
- Media
- Other/Unknown
- Politics
- Science & Technology
- Social Issues
- Sports
- War & Conflict
- Weather
These were derived empirically by consolidating the unstructured gpt_topic field from the mm_framing silver dataset into
discrete categories based on similarity.
Achieved a 76.4% validation accuracy on 64,000 examples, which was deemed sufficient for assisting domain-specific reasoning in downstream model.