Text Classification
Transformers
Safetensors
English
roberta
multi-label-classification
disinformation
narrative-detection
propaganda
media-analysis
text-embeddings-inference
Instructions to use pjait/narrative_classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pjait/narrative_classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="pjait/narrative_classifier")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("pjait/narrative_classifier") model = AutoModel.from_pretrained("pjait/narrative_classifier") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: cc-by-nc-4.0 | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| base_model: FacebookAI/roberta-large | |
| tags: | |
| - roberta | |
| - text-classification | |
| - multi-label-classification | |
| - disinformation | |
| - narrative-detection | |
| - propaganda | |
| - media-analysis | |
| metrics: | |
| - f1 | |
| - precision | |
| - recall | |
| # Narrative Classifier (RoBERTa-large, multi-label) | |
| A multi-label text classifier that detects **disinformation / propaganda narratives** in news and | |
| social-media text. Given a piece of text, the model predicts which of **41 predefined narratives** | |
| (spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines, | |
| gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present. | |
| The model was developed at the **Polish-Japanese Academy of Information Technology (PJAIT / PJATK)**. | |
| - **Architecture:** `RobertaNarrativeModel` — a `roberta-large` encoder + a single linear | |
| classification head (`narrative_head`, 1024 → 41) applied to the `<s>` (CLS) token. | |
| - **Task:** multi-label classification (one input can carry several narratives at once). | |
| - **Base model:** [`FacebookAI/roberta-large`](https://huggingface.co/FacebookAI/roberta-large) | |
| - **Parameters:** ~0.4B · **Precision:** FP32 · **Format:** safetensors | |
| - **Language:** English | |
| > **Note on the architecture.** This repository uses a *custom* model class | |
| > (`RobertaNarrativeModel`) whose weights are stored under the `transformer.*` and | |
| > `narrative_head.*` prefixes. It therefore does **not** load directly with | |
| > `AutoModelForSequenceClassification`. Use the self-contained loading code in the | |
| > [How to use](#how-to-use) section below. | |
| ## Labels | |
| The model outputs 41 labels. The full mapping is in | |
| [`narrative_labels.json`](./narrative_labels.json) / [`label_config.json`](./label_config.json). | |
| <details> | |
| <summary>Show all 41 narratives</summary> | |
| | ID | Narrative | | |
| |----|-----------| | |
| | 0 | Abortion is evil/immoral/dangerous | | |
| | 1 | Alternative treatments are more effective than conventional ones | | |
| | 2 | Climate change is a hoax | | |
| | 3 | Collapse of Western civilization is imminent | | |
| | 4 | Conflict is a staged event prepared by outside forces | | |
| | 5 | Contraception is against nature/dangerous/immoral | | |
| | 6 | Conventional medicine is ineffective and corrupt | | |
| | 7 | Conventional medicine is wrong about the causes of diseases | | |
| | 8 | Elites manipulate elections | | |
| | 9 | Elites want to take over the world | | |
| | 10 | European Union is authoritarian | | |
| | 11 | Feminism is a tool to destroy the natural order and traditional values | | |
| | 12 | Global elites deliberately cause pandemics and diseases | | |
| | 13 | Global warming does not exist/is not a serious threat | | |
| | 14 | Governments fail to take proper action on migration crisis | | |
| | 15 | Homosexuals are a threat | | |
| | 16 | Humanity is not responsible for global warming | | |
| | 17 | LGBT+ is a tool to destroy the natural order and traditional values | | |
| | 18 | LGBT+ people are mentally ill | | |
| | 19 | LGBT+ people are privileged | | |
| | 20 | Media deliberately spreads lies | | |
| | 21 | Migrants are a burden on the economy | | |
| | 22 | Migrants are dangerous | | |
| | 23 | Migrants are destroying local culture and breaking up local communities | | |
| | 24 | Migration is a conspiracy of global elites | | |
| | 25 | Most European countries are puppets of the West | | |
| | 26 | NATO is authoritarian/warmongering | | |
| | 27 | Official information is a tool to deceive citizens | | |
| | 28 | Other | | |
| | 29 | Russia is strong and winning the war | | |
| | 30 | Sex education is a threat to children | | |
| | 31 | Solutions to reduce human impact on environment and climate are a conspiracy | | |
| | 32 | State and international institutions only serve to oppress citizens. | | |
| | 33 | The West and their allies are immoral/hostile/ineffective | | |
| | 34 | The energy crisis is artificially created | | |
| | 35 | Transgender people are a threat | | |
| | 36 | Ukraine is an evil, aggressive and dangerous country | | |
| | 37 | Ukrainian refugees are a danger/burden | | |
| | 38 | Vaccines are dangerous/ineffective/immoral | | |
| | 39 | Western elites want to destroy the natural order and traditional values | | |
| | 40 | other | | |
| </details> | |
| ## How to use | |
| ```python | |
| import json | |
| import torch | |
| from torch import nn | |
| from transformers import AutoTokenizer, AutoConfig, RobertaModel | |
| from huggingface_hub import hf_hub_download | |
| from safetensors.torch import load_file | |
| REPO_ID = "pjait/narrative_classifier" | |
| class RobertaNarrativeModel(nn.Module): | |
| """roberta-large encoder + a linear head over the <s> (CLS) token.""" | |
| def __init__(self, config, num_labels): | |
| super().__init__() | |
| self.transformer = RobertaModel(config, add_pooling_layer=False) | |
| self.narrative_head = nn.Linear(config.hidden_size, num_labels) | |
| def forward(self, input_ids, attention_mask=None): | |
| out = self.transformer(input_ids=input_ids, attention_mask=attention_mask) | |
| cls = out.last_hidden_state[:, 0] # <s> token representation | |
| return self.narrative_head(cls) # raw logits (multi-label) | |
| # --- load config, labels and weights --------------------------------------- | |
| config = AutoConfig.from_pretrained(REPO_ID) | |
| tokenizer = AutoTokenizer.from_pretrained(REPO_ID) | |
| with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f: | |
| labels = json.load(f) | |
| id2narrative = {int(k): v for k, v in labels["id2narrative"].items()} | |
| num_labels = labels["num_labels"] | |
| model = RobertaNarrativeModel(config, num_labels) | |
| state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors")) | |
| model.load_state_dict(state_dict) | |
| model.eval() | |
| # --- inference -------------------------------------------------------------- | |
| text = "The vaccines were rushed and are far more dangerous than the virus itself." | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) | |
| with torch.no_grad(): | |
| logits = model(**inputs) | |
| probs = torch.sigmoid(logits)[0] # multi-label -> sigmoid | |
| THRESHOLD = 0.5 | |
| predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD] | |
| print(sorted(predicted, key=lambda x: -x[1])) | |
| ``` | |
| `THRESHOLD` controls precision/recall trade-off; tune it on your own validation data. | |
| ## Evaluation | |
| Metrics from [`metrics.txt`](./metrics.txt) (evaluation split, epoch 3): | |
| | Metric | Value | | |
| |--------|-------| | |
| | Micro F1 | 0.494 | | |
| | Macro F1 | 0.185 | | |
| | Precision | 0.700 | | |
| | Recall | 0.382 | | |
| | Subset accuracy | 0.787 | | |
| | Eval loss | 0.023 | | |
| The gap between micro and macro F1, together with high precision but lower recall, indicates the | |
| model is conservative and performs unevenly across narratives — likely better on | |
| well-represented narratives and weaker on rare ones. Treat predictions as a **decision-support | |
| signal**, not ground truth, and calibrate the threshold for your use case. | |
| ## Intended use & limitations | |
| **Intended use.** Research and analysis of disinformation/propaganda narratives in English-language | |
| media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread. | |
| **Out of scope / cautions.** | |
| - The model identifies whether text *expresses or discusses* a narrative; it does **not** establish | |
| truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can | |
| trigger labels). | |
| - Trained on English; performance on other languages is not guaranteed. | |
| - Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential | |
| decisions about individuals without human review. | |
| - Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the | |
| training data. Human oversight is required for any deployment. | |
| ## Training | |
| - **Base model:** `FacebookAI/roberta-large` fine-tuned for multi-label narrative classification. | |
| - **Epochs:** 3 (see `training_args.bin` for the full `TrainingArguments`). | |
| - **Objective:** multi-label classification (sigmoid + binary cross-entropy over 41 narratives). | |
| ## Citation | |
| If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT) | |
| and the author. *(Add the relevant paper / BibTeX here.)* | |
| ```bibtex | |
| @misc{narrative_classifier_pjait, | |
| title = {Narrative Classifier (RoBERTa-large, multi-label)}, | |
| author = {Sosnowski, Witold}, | |
| howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}}, | |
| note = {Polish-Japanese Academy of Information Technology (PJAIT)}, | |
| year = {2025} | |
| } | |
| ``` | |