NCI Technique Classifier v2

Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.

Model Description

This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:

  • Stage 1: Binary detection (synapti/nci-binary-detector-v2) determines if propaganda exists
  • Stage 2: This model identifies which specific techniques are used

Techniques Detected

ID Technique Description
0 Loaded_Language Using words with strong emotional implications
1 Appeal_to_fear-prejudice Seeking to build support by instilling fear
2 Exaggeration,Minimisation Overstating or understating aspects of issues
3 Repetition Repeating the same message multiple times
4 Flag-Waving Appeals to patriotism or group identity
5 Name_Calling,Labeling Giving a subject a name with negative connotations
6 Reductio_ad_hitlerum Comparing to Hitler or Nazis to discredit
7 Black-and-White_Fallacy Presenting only two options when more exist
8 Causal_Oversimplification Assuming a single cause for complex issues
9 Whataboutism,Straw_Men,Red_Herring Deflection and misrepresentation tactics
10 Straw_Man Misrepresenting someone's argument
11 Red_Herring Introducing irrelevant information
12 Doubt Questioning credibility of sources
13 Appeal_to_Authority Citing authorities to support claims
14 Thought-terminating_Cliches Using clichés to end discussion
15 Bandwagon Appeal to popularity
16 Slogans Brief, striking phrases
17 Obfuscation,Intentional_Vagueness,Confusion Being deliberately unclear

Training

  • Base Model: answerdotai/ModernBERT-base
  • Dataset: synapti/nci-propaganda-production (19,581 train, 1,727 val, 1,729 test)
  • Loss: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
  • Epochs: 5
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Hardware: NVIDIA A10G GPU

Performance

Metric Score
Micro F1 80.2%
Macro F1 63.9%
Micro Precision 83.4%
Micro Recall 77.4%

Per-Technique Performance (selected)

Technique F1 Score
Loaded_Language 97.0%
Appeal_to_fear-prejudice 89.7%
Name_Calling,Labeling 84.3%
Flag-Waving 82.1%

Usage

With Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")

text = "The radical left is DESTROYING our great nation!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get techniques above threshold
threshold = 0.5
techniques = list(model.config.id2label.values())
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
print(detected)

With NCI Protocol

from nci.transformers.two_stage_pipeline import TwoStagePipeline

pipeline = TwoStagePipeline.from_pretrained(
    binary_model="synapti/nci-binary-detector-v2",
    technique_model="synapti/nci-technique-classifier-v2",
)

result = pipeline.analyze("The radical left is DESTROYING our great nation!")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")

ONNX Inference

ONNX model available in onnx/model.onnx for faster inference (~1.25x speedup).

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
session = ort.InferenceSession("onnx/model.onnx")

text = "WAKE UP AMERICA!"
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)

outputs = session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
probs = 1 / (1 + np.exp(-outputs[0]))  # sigmoid

Limitations

  • Trained primarily on English news articles
  • May not generalize well to social media or other domains
  • Threshold of 0.5 may need adjustment for specific use cases
  • Multi-label classification means multiple techniques can be detected per text

Citation

@misc{nci-technique-classifier-v2,
  author = {Synapti},
  title = {NCI Technique Classifier v2},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
}

License

MIT License

Downloads last month
160
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for synapti/nci-technique-classifier-v2

Quantized
(14)
this model

Dataset used to train synapti/nci-technique-classifier-v2