--- language: - en license: cc-by-nc-4.0 library_name: transformers pipeline_tag: text-classification base_model: FacebookAI/roberta-large tags: - roberta - text-classification - multi-label-classification - disinformation - narrative-detection - propaganda - media-analysis metrics: - f1 - precision - recall --- # Narrative Classifier (RoBERTa-large, multi-label) A multi-label text classifier that detects **disinformation / propaganda narratives** in news and social-media text. Given a piece of text, the model predicts which of **41 predefined narratives** (spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines, gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present. The model was developed at the **Polish-Japanese Academy of Information Technology (PJAIT / PJATK)**. - **Architecture:** `RobertaNarrativeModel` — a `roberta-large` encoder + a single linear classification head (`narrative_head`, 1024 → 41) applied to the `` (CLS) token. - **Task:** multi-label classification (one input can carry several narratives at once). - **Base model:** [`FacebookAI/roberta-large`](https://huggingface.co/FacebookAI/roberta-large) - **Parameters:** ~0.4B · **Precision:** FP32 · **Format:** safetensors - **Language:** English > **Note on the architecture.** This repository uses a *custom* model class > (`RobertaNarrativeModel`) whose weights are stored under the `transformer.*` and > `narrative_head.*` prefixes. It therefore does **not** load directly with > `AutoModelForSequenceClassification`. Use the self-contained loading code in the > [How to use](#how-to-use) section below. ## Labels The model outputs 41 labels. The full mapping is in [`narrative_labels.json`](./narrative_labels.json) / [`label_config.json`](./label_config.json).
Show all 41 narratives | ID | Narrative | |----|-----------| | 0 | Abortion is evil/immoral/dangerous | | 1 | Alternative treatments are more effective than conventional ones | | 2 | Climate change is a hoax | | 3 | Collapse of Western civilization is imminent | | 4 | Conflict is a staged event prepared by outside forces | | 5 | Contraception is against nature/dangerous/immoral | | 6 | Conventional medicine is ineffective and corrupt | | 7 | Conventional medicine is wrong about the causes of diseases | | 8 | Elites manipulate elections | | 9 | Elites want to take over the world | | 10 | European Union is authoritarian | | 11 | Feminism is a tool to destroy the natural order and traditional values | | 12 | Global elites deliberately cause pandemics and diseases | | 13 | Global warming does not exist/is not a serious threat | | 14 | Governments fail to take proper action on migration crisis | | 15 | Homosexuals are a threat | | 16 | Humanity is not responsible for global warming | | 17 | LGBT+ is a tool to destroy the natural order and traditional values | | 18 | LGBT+ people are mentally ill | | 19 | LGBT+ people are privileged | | 20 | Media deliberately spreads lies | | 21 | Migrants are a burden on the economy | | 22 | Migrants are dangerous | | 23 | Migrants are destroying local culture and breaking up local communities | | 24 | Migration is a conspiracy of global elites | | 25 | Most European countries are puppets of the West | | 26 | NATO is authoritarian/warmongering | | 27 | Official information is a tool to deceive citizens | | 28 | Other | | 29 | Russia is strong and winning the war | | 30 | Sex education is a threat to children | | 31 | Solutions to reduce human impact on environment and climate are a conspiracy | | 32 | State and international institutions only serve to oppress citizens. | | 33 | The West and their allies are immoral/hostile/ineffective | | 34 | The energy crisis is artificially created | | 35 | Transgender people are a threat | | 36 | Ukraine is an evil, aggressive and dangerous country | | 37 | Ukrainian refugees are a danger/burden | | 38 | Vaccines are dangerous/ineffective/immoral | | 39 | Western elites want to destroy the natural order and traditional values | | 40 | other |
## How to use ```python import json import torch from torch import nn from transformers import AutoTokenizer, AutoConfig, RobertaModel from huggingface_hub import hf_hub_download from safetensors.torch import load_file REPO_ID = "pjait/narrative_classifier" class RobertaNarrativeModel(nn.Module): """roberta-large encoder + a linear head over the (CLS) token.""" def __init__(self, config, num_labels): super().__init__() self.transformer = RobertaModel(config, add_pooling_layer=False) self.narrative_head = nn.Linear(config.hidden_size, num_labels) def forward(self, input_ids, attention_mask=None): out = self.transformer(input_ids=input_ids, attention_mask=attention_mask) cls = out.last_hidden_state[:, 0] # token representation return self.narrative_head(cls) # raw logits (multi-label) # --- load config, labels and weights --------------------------------------- config = AutoConfig.from_pretrained(REPO_ID) tokenizer = AutoTokenizer.from_pretrained(REPO_ID) with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f: labels = json.load(f) id2narrative = {int(k): v for k, v in labels["id2narrative"].items()} num_labels = labels["num_labels"] model = RobertaNarrativeModel(config, num_labels) state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors")) model.load_state_dict(state_dict) model.eval() # --- inference -------------------------------------------------------------- text = "The vaccines were rushed and are far more dangerous than the virus itself." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): logits = model(**inputs) probs = torch.sigmoid(logits)[0] # multi-label -> sigmoid THRESHOLD = 0.5 predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD] print(sorted(predicted, key=lambda x: -x[1])) ``` `THRESHOLD` controls precision/recall trade-off; tune it on your own validation data. ## Evaluation Metrics from [`metrics.txt`](./metrics.txt) (evaluation split, epoch 3): | Metric | Value | |--------|-------| | Micro F1 | 0.494 | | Macro F1 | 0.185 | | Precision | 0.700 | | Recall | 0.382 | | Subset accuracy | 0.787 | | Eval loss | 0.023 | The gap between micro and macro F1, together with high precision but lower recall, indicates the model is conservative and performs unevenly across narratives — likely better on well-represented narratives and weaker on rare ones. Treat predictions as a **decision-support signal**, not ground truth, and calibrate the threshold for your use case. ## Intended use & limitations **Intended use.** Research and analysis of disinformation/propaganda narratives in English-language media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread. **Out of scope / cautions.** - The model identifies whether text *expresses or discusses* a narrative; it does **not** establish truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can trigger labels). - Trained on English; performance on other languages is not guaranteed. - Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential decisions about individuals without human review. - Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the training data. Human oversight is required for any deployment. ## Training - **Base model:** `FacebookAI/roberta-large` fine-tuned for multi-label narrative classification. - **Epochs:** 3 (see `training_args.bin` for the full `TrainingArguments`). - **Objective:** multi-label classification (sigmoid + binary cross-entropy over 41 narratives). ## Citation If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT) and the author. *(Add the relevant paper / BibTeX here.)* ```bibtex @misc{narrative_classifier_pjait, title = {Narrative Classifier (RoBERTa-large, multi-label)}, author = {Sosnowski, Witold}, howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}}, note = {Polish-Japanese Academy of Information Technology (PJAIT)}, year = {2025} } ```