File size: 8,415 Bytes

cc11df7

---
language:
- en
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: text-classification
base_model: FacebookAI/roberta-large
tags:
- roberta
- text-classification
- multi-label-classification
- disinformation
- narrative-detection
- propaganda
- media-analysis
metrics:
- f1
- precision
- recall
---

# Narrative Classifier (RoBERTa-large, multi-label)

A multi-label text classifier that detects **disinformation / propaganda narratives** in news and
social-media text. Given a piece of text, the model predicts which of **41 predefined narratives**
(spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines,
gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present.

The model was developed at the **Polish-Japanese Academy of Information Technology (PJAIT / PJATK)**.

- **Architecture:** `RobertaNarrativeModel` — a `roberta-large` encoder + a single linear
  classification head (`narrative_head`, 1024 → 41) applied to the `<s>` (CLS) token.
- **Task:** multi-label classification (one input can carry several narratives at once).
- **Base model:** [`FacebookAI/roberta-large`](https://huggingface.co/FacebookAI/roberta-large)
- **Parameters:** ~0.4B · **Precision:** FP32 · **Format:** safetensors
- **Language:** English

> **Note on the architecture.** This repository uses a *custom* model class
> (`RobertaNarrativeModel`) whose weights are stored under the `transformer.*` and
> `narrative_head.*` prefixes. It therefore does **not** load directly with
> `AutoModelForSequenceClassification`. Use the self-contained loading code in the
> [How to use](#how-to-use) section below.

## Labels

The model outputs 41 labels. The full mapping is in
[`narrative_labels.json`](./narrative_labels.json) / [`label_config.json`](./label_config.json).

<details>
<summary>Show all 41 narratives</summary>

| ID | Narrative |
|----|-----------|
| 0  | Abortion is evil/immoral/dangerous |
| 1  | Alternative treatments are more effective than conventional ones |
| 2  | Climate change is a hoax |
| 3  | Collapse of Western civilization is imminent |
| 4  | Conflict is a staged event prepared by outside forces |
| 5  | Contraception is against nature/dangerous/immoral |
| 6  | Conventional medicine is ineffective and corrupt |
| 7  | Conventional medicine is wrong about the causes of diseases |
| 8  | Elites manipulate elections |
| 9  | Elites want to take over the world |
| 10 | European Union is authoritarian |
| 11 | Feminism is a tool to destroy the natural order and traditional values |
| 12 | Global elites deliberately cause pandemics and diseases |
| 13 | Global warming does not exist/is not a serious threat |
| 14 | Governments fail to take proper action on migration crisis |
| 15 | Homosexuals are a threat |
| 16 | Humanity is not responsible for global warming |
| 17 | LGBT+ is a tool to destroy the natural order and traditional values |
| 18 | LGBT+ people are mentally ill |
| 19 | LGBT+ people are privileged |
| 20 | Media deliberately spreads lies |
| 21 | Migrants are a burden on the economy |
| 22 | Migrants are dangerous |
| 23 | Migrants are destroying local culture and breaking up local communities |
| 24 | Migration is a conspiracy of global elites |
| 25 | Most European countries are puppets of the West |
| 26 | NATO is authoritarian/warmongering |
| 27 | Official information is a tool to deceive citizens |
| 28 | Other |
| 29 | Russia is strong and winning the war |
| 30 | Sex education is a threat to children |
| 31 | Solutions to reduce human impact on environment and climate are a conspiracy |
| 32 | State and international institutions only serve to oppress citizens. |
| 33 | The West and their allies are immoral/hostile/ineffective |
| 34 | The energy crisis is artificially created |
| 35 | Transgender people are a threat |
| 36 | Ukraine is an evil, aggressive and dangerous country |
| 37 | Ukrainian refugees are a danger/burden |
| 38 | Vaccines are dangerous/ineffective/immoral |
| 39 | Western elites want to destroy the natural order and traditional values |
| 40 | other |

</details>

## How to use

```python
import json
import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, RobertaModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

REPO_ID = "pjait/narrative_classifier"


class RobertaNarrativeModel(nn.Module):
    """roberta-large encoder + a linear head over the <s> (CLS) token."""

    def __init__(self, config, num_labels):
        super().__init__()
        self.transformer = RobertaModel(config, add_pooling_layer=False)
        self.narrative_head = nn.Linear(config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        out = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        cls = out.last_hidden_state[:, 0]          # <s> token representation
        return self.narrative_head(cls)            # raw logits (multi-label)


# --- load config, labels and weights ---------------------------------------
config = AutoConfig.from_pretrained(REPO_ID)
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f:
    labels = json.load(f)
id2narrative = {int(k): v for k, v in labels["id2narrative"].items()}
num_labels = labels["num_labels"]

model = RobertaNarrativeModel(config, num_labels)
state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors"))
model.load_state_dict(state_dict)
model.eval()

# --- inference --------------------------------------------------------------
text = "The vaccines were rushed and are far more dangerous than the virus itself."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)
    probs = torch.sigmoid(logits)[0]              # multi-label -> sigmoid

THRESHOLD = 0.5
predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD]
print(sorted(predicted, key=lambda x: -x[1]))
```

`THRESHOLD` controls precision/recall trade-off; tune it on your own validation data.

## Evaluation

Metrics from [`metrics.txt`](./metrics.txt) (evaluation split, epoch 3):

| Metric | Value |
|--------|-------|
| Micro F1 | 0.494 |
| Macro F1 | 0.185 |
| Precision | 0.700 |
| Recall | 0.382 |
| Subset accuracy | 0.787 |
| Eval loss | 0.023 |

The gap between micro and macro F1, together with high precision but lower recall, indicates the
model is conservative and performs unevenly across narratives — likely better on
well-represented narratives and weaker on rare ones. Treat predictions as a **decision-support
signal**, not ground truth, and calibrate the threshold for your use case.

## Intended use & limitations

**Intended use.** Research and analysis of disinformation/propaganda narratives in English-language
media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread.

**Out of scope / cautions.**
- The model identifies whether text *expresses or discusses* a narrative; it does **not** establish
  truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can
  trigger labels).
- Trained on English; performance on other languages is not guaranteed.
- Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential
  decisions about individuals without human review.
- Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the
  training data. Human oversight is required for any deployment.

## Training

- **Base model:** `FacebookAI/roberta-large` fine-tuned for multi-label narrative classification.
- **Epochs:** 3 (see `training_args.bin` for the full `TrainingArguments`).
- **Objective:** multi-label classification (sigmoid + binary cross-entropy over 41 narratives).

## Citation

If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT)
and the author. *(Add the relevant paper / BibTeX here.)*

```bibtex
@misc{narrative_classifier_pjait,
  title  = {Narrative Classifier (RoBERTa-large, multi-label)},
  author = {Sosnowski, Witold},
  howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}},
  note   = {Polish-Japanese Academy of Information Technology (PJAIT)},
  year   = {2025}
}
```