NCI Technique Classifier v2
Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.
Model Description
This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
- Stage 1: Binary detection (
synapti/nci-binary-detector-v2) determines if propaganda exists - Stage 2: This model identifies which specific techniques are used
Techniques Detected
| ID | Technique | Description |
|---|---|---|
| 0 | Loaded_Language | Using words with strong emotional implications |
| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear |
| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues |
| 3 | Repetition | Repeating the same message multiple times |
| 4 | Flag-Waving | Appeals to patriotism or group identity |
| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations |
| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit |
| 7 | Black-and-White_Fallacy | Presenting only two options when more exist |
| 8 | Causal_Oversimplification | Assuming a single cause for complex issues |
| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics |
| 10 | Straw_Man | Misrepresenting someone's argument |
| 11 | Red_Herring | Introducing irrelevant information |
| 12 | Doubt | Questioning credibility of sources |
| 13 | Appeal_to_Authority | Citing authorities to support claims |
| 14 | Thought-terminating_Cliches | Using clichés to end discussion |
| 15 | Bandwagon | Appeal to popularity |
| 16 | Slogans | Brief, striking phrases |
| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear |
Training
- Base Model:
answerdotai/ModernBERT-base - Dataset:
synapti/nci-propaganda-production(19,581 train, 1,727 val, 1,729 test) - Loss: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
- Epochs: 5
- Batch Size: 16
- Learning Rate: 2e-5
- Hardware: NVIDIA A10G GPU
Performance
| Metric | Score |
|---|---|
| Micro F1 | 80.2% |
| Macro F1 | 63.9% |
| Micro Precision | 83.4% |
| Micro Recall | 77.4% |
Per-Technique Performance (selected)
| Technique | F1 Score |
|---|---|
| Loaded_Language | 97.0% |
| Appeal_to_fear-prejudice | 89.7% |
| Name_Calling,Labeling | 84.3% |
| Flag-Waving | 82.1% |
Usage
With Transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
text = "The radical left is DESTROYING our great nation!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)[0]
# Get techniques above threshold
threshold = 0.5
techniques = list(model.config.id2label.values())
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
print(detected)
With NCI Protocol
from nci.transformers.two_stage_pipeline import TwoStagePipeline
pipeline = TwoStagePipeline.from_pretrained(
binary_model="synapti/nci-binary-detector-v2",
technique_model="synapti/nci-technique-classifier-v2",
)
result = pipeline.analyze("The radical left is DESTROYING our great nation!")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
ONNX Inference
ONNX model available in onnx/model.onnx for faster inference (~1.25x speedup).
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
session = ort.InferenceSession("onnx/model.onnx")
text = "WAKE UP AMERICA!"
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)
outputs = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid
Limitations
- Trained primarily on English news articles
- May not generalize well to social media or other domains
- Threshold of 0.5 may need adjustment for specific use cases
- Multi-label classification means multiple techniques can be detected per text
Citation
@misc{nci-technique-classifier-v2,
author = {Synapti},
title = {NCI Technique Classifier v2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
}
License
MIT License
- Downloads last month
- 160
Model tree for synapti/nci-technique-classifier-v2
Base model
answerdotai/ModernBERT-base