Text Classification
Transformers
Safetensors
English
roberta
multi-label-classification
disinformation
narrative-detection
propaganda
media-analysis
text-embeddings-inference
Instructions to use pjait/narrative_classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pjait/narrative_classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="pjait/narrative_classifier")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("pjait/narrative_classifier") model = AutoModel.from_pretrained("pjait/narrative_classifier") - Notebooks
- Google Colab
- Kaggle
File size: 8,415 Bytes
cc11df7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | ---
language:
- en
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: text-classification
base_model: FacebookAI/roberta-large
tags:
- roberta
- text-classification
- multi-label-classification
- disinformation
- narrative-detection
- propaganda
- media-analysis
metrics:
- f1
- precision
- recall
---
# Narrative Classifier (RoBERTa-large, multi-label)
A multi-label text classifier that detects **disinformation / propaganda narratives** in news and
social-media text. Given a piece of text, the model predicts which of **41 predefined narratives**
(spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines,
gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present.
The model was developed at the **Polish-Japanese Academy of Information Technology (PJAIT / PJATK)**.
- **Architecture:** `RobertaNarrativeModel` — a `roberta-large` encoder + a single linear
classification head (`narrative_head`, 1024 → 41) applied to the `<s>` (CLS) token.
- **Task:** multi-label classification (one input can carry several narratives at once).
- **Base model:** [`FacebookAI/roberta-large`](https://huggingface.co/FacebookAI/roberta-large)
- **Parameters:** ~0.4B · **Precision:** FP32 · **Format:** safetensors
- **Language:** English
> **Note on the architecture.** This repository uses a *custom* model class
> (`RobertaNarrativeModel`) whose weights are stored under the `transformer.*` and
> `narrative_head.*` prefixes. It therefore does **not** load directly with
> `AutoModelForSequenceClassification`. Use the self-contained loading code in the
> [How to use](#how-to-use) section below.
## Labels
The model outputs 41 labels. The full mapping is in
[`narrative_labels.json`](./narrative_labels.json) / [`label_config.json`](./label_config.json).
<details>
<summary>Show all 41 narratives</summary>
| ID | Narrative |
|----|-----------|
| 0 | Abortion is evil/immoral/dangerous |
| 1 | Alternative treatments are more effective than conventional ones |
| 2 | Climate change is a hoax |
| 3 | Collapse of Western civilization is imminent |
| 4 | Conflict is a staged event prepared by outside forces |
| 5 | Contraception is against nature/dangerous/immoral |
| 6 | Conventional medicine is ineffective and corrupt |
| 7 | Conventional medicine is wrong about the causes of diseases |
| 8 | Elites manipulate elections |
| 9 | Elites want to take over the world |
| 10 | European Union is authoritarian |
| 11 | Feminism is a tool to destroy the natural order and traditional values |
| 12 | Global elites deliberately cause pandemics and diseases |
| 13 | Global warming does not exist/is not a serious threat |
| 14 | Governments fail to take proper action on migration crisis |
| 15 | Homosexuals are a threat |
| 16 | Humanity is not responsible for global warming |
| 17 | LGBT+ is a tool to destroy the natural order and traditional values |
| 18 | LGBT+ people are mentally ill |
| 19 | LGBT+ people are privileged |
| 20 | Media deliberately spreads lies |
| 21 | Migrants are a burden on the economy |
| 22 | Migrants are dangerous |
| 23 | Migrants are destroying local culture and breaking up local communities |
| 24 | Migration is a conspiracy of global elites |
| 25 | Most European countries are puppets of the West |
| 26 | NATO is authoritarian/warmongering |
| 27 | Official information is a tool to deceive citizens |
| 28 | Other |
| 29 | Russia is strong and winning the war |
| 30 | Sex education is a threat to children |
| 31 | Solutions to reduce human impact on environment and climate are a conspiracy |
| 32 | State and international institutions only serve to oppress citizens. |
| 33 | The West and their allies are immoral/hostile/ineffective |
| 34 | The energy crisis is artificially created |
| 35 | Transgender people are a threat |
| 36 | Ukraine is an evil, aggressive and dangerous country |
| 37 | Ukrainian refugees are a danger/burden |
| 38 | Vaccines are dangerous/ineffective/immoral |
| 39 | Western elites want to destroy the natural order and traditional values |
| 40 | other |
</details>
## How to use
```python
import json
import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, RobertaModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
REPO_ID = "pjait/narrative_classifier"
class RobertaNarrativeModel(nn.Module):
"""roberta-large encoder + a linear head over the <s> (CLS) token."""
def __init__(self, config, num_labels):
super().__init__()
self.transformer = RobertaModel(config, add_pooling_layer=False)
self.narrative_head = nn.Linear(config.hidden_size, num_labels)
def forward(self, input_ids, attention_mask=None):
out = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
cls = out.last_hidden_state[:, 0] # <s> token representation
return self.narrative_head(cls) # raw logits (multi-label)
# --- load config, labels and weights ---------------------------------------
config = AutoConfig.from_pretrained(REPO_ID)
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f:
labels = json.load(f)
id2narrative = {int(k): v for k, v in labels["id2narrative"].items()}
num_labels = labels["num_labels"]
model = RobertaNarrativeModel(config, num_labels)
state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors"))
model.load_state_dict(state_dict)
model.eval()
# --- inference --------------------------------------------------------------
text = "The vaccines were rushed and are far more dangerous than the virus itself."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs)
probs = torch.sigmoid(logits)[0] # multi-label -> sigmoid
THRESHOLD = 0.5
predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD]
print(sorted(predicted, key=lambda x: -x[1]))
```
`THRESHOLD` controls precision/recall trade-off; tune it on your own validation data.
## Evaluation
Metrics from [`metrics.txt`](./metrics.txt) (evaluation split, epoch 3):
| Metric | Value |
|--------|-------|
| Micro F1 | 0.494 |
| Macro F1 | 0.185 |
| Precision | 0.700 |
| Recall | 0.382 |
| Subset accuracy | 0.787 |
| Eval loss | 0.023 |
The gap between micro and macro F1, together with high precision but lower recall, indicates the
model is conservative and performs unevenly across narratives — likely better on
well-represented narratives and weaker on rare ones. Treat predictions as a **decision-support
signal**, not ground truth, and calibrate the threshold for your use case.
## Intended use & limitations
**Intended use.** Research and analysis of disinformation/propaganda narratives in English-language
media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread.
**Out of scope / cautions.**
- The model identifies whether text *expresses or discusses* a narrative; it does **not** establish
truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can
trigger labels).
- Trained on English; performance on other languages is not guaranteed.
- Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential
decisions about individuals without human review.
- Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the
training data. Human oversight is required for any deployment.
## Training
- **Base model:** `FacebookAI/roberta-large` fine-tuned for multi-label narrative classification.
- **Epochs:** 3 (see `training_args.bin` for the full `TrainingArguments`).
- **Objective:** multi-label classification (sigmoid + binary cross-entropy over 41 narratives).
## Citation
If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT)
and the author. *(Add the relevant paper / BibTeX here.)*
```bibtex
@misc{narrative_classifier_pjait,
title = {Narrative Classifier (RoBERTa-large, multi-label)},
author = {Sosnowski, Witold},
howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}},
note = {Polish-Japanese Academy of Information Technology (PJAIT)},
year = {2025}
}
```
|