NCI Technique Classifier v5.2

Multi-label propaganda technique classifier based on ModernBERT, trained to identify 18 propaganda techniques from the SemEval-2020 Task 11 taxonomy.

Model Description

This model is part of the NCI (Narrative Coordination Index) Protocol for detecting coordinated influence operations. It classifies text into 18 propaganda techniques with well-calibrated probability outputs.

Key Improvements in v5.2

  • Reduced False Positives: Scientific/factual content false positive rate reduced from 35% (v4) to 8.8%
  • Better Calibration: ASL loss with clip=0.02 provides more discriminative probability outputs
  • Hard Negatives Training: Trained on v5 dataset with 1000+ hard negative examples (scientific, business, factual content)
  • Document-Level Analysis: Works well with full documents, no need for sentence-level splitting

Training Details

  • Base Model: answerdotai/ModernBERT-base
  • Dataset: synapti/nci-propaganda-v5 (24,037 samples)
  • Loss Function: Asymmetric Loss (ASL)
    • gamma_neg: 4.0
    • gamma_pos: 1.0
    • clip: 0.02 (reduced from 0.05 to minimize probability shifting)
  • Training: 3 epochs, lr=2e-5, batch_size=16
  • Validation: 4/7 tests passed (57%)

Techniques Detected

ID Technique Description
0 Loaded_Language Words with strong emotional implications
1 Appeal_to_fear-prejudice Building support through fear or prejudice
2 Exaggeration,Minimisation Overstating or understating facts
3 Repetition Repeating messages for reinforcement
4 Flag-Waving Appealing to patriotism/national identity
5 Name_Calling,Labeling Using labels to evoke prejudice
6 Reductio_ad_hitlerum Comparing to Hitler/Nazis
7 Black-and-White_Fallacy Presenting only two choices
8 Causal_Oversimplification Assuming single cause for complex issues
9 Whataboutism,Straw_Men,Red_Herring Deflection techniques
10 Straw_Man Misrepresenting opponent's position
11 Red_Herring Introducing irrelevant topics
12 Doubt Questioning credibility
13 Appeal_to_Authority Using authority figures to support claims
14 Thought-terminating_Cliches Phrases that end rational thought
15 Bandwagon "Everyone is doing it" appeals
16 Slogans Catchy phrases for memorability
17 Obfuscation,Intentional_Vagueness,Confusion Deliberately confusing language

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_id = "synapti/nci-technique-classifier-v5.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "This is OUTRAGEOUS! They are LYING to you. WAKE UP!"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get techniques with probability > 0.5
LABELS = [
    "Loaded_Language", "Appeal_to_fear-prejudice", "Exaggeration,Minimisation",
    "Repetition", "Flag-Waving", "Name_Calling,Labeling", "Reductio_ad_hitlerum",
    "Black-and-White_Fallacy", "Causal_Oversimplification",
    "Whataboutism,Straw_Men,Red_Herring", "Straw_Man", "Red_Herring", "Doubt",
    "Appeal_to_Authority", "Thought-terminating_Cliches", "Bandwagon", "Slogans",
    "Obfuscation,Intentional_Vagueness,Confusion"
]

for i, (label, prob) in enumerate(zip(LABELS, probs)):
    if prob > 0.5:
        print(f"{label}: {prob:.1%}")

Performance

Validation Results

Test Case v5.2 v4 Status
Pure Propaganda 66.8% 70.8% βœ“ Detected
Neutral News 6.9% 5.5% βœ“ Clean
SpaceX Factual 3.7% - βœ“ Clean
Multi-Label Propaganda 76.5% - βœ“ Detected
Mixed Content 7.3% - -
Fear Appeal 69.9% - βœ“ Detected
Scientific Report 8.8% 35.4% βœ“ Clean

Key Metrics

  • Scientific Report FPR: 8.8% (vs 35% in v4) - 75% reduction
  • Factual News FPR: 4.6% (vs 29% in v4) - 84% reduction
  • Propaganda Detection: Maintained (73.7% max confidence on propaganda)

Citation

@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation",
    year = "2020",
}

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for synapti/nci-technique-classifier-v5.2

Quantized
(14)
this model

Dataset used to train synapti/nci-technique-classifier-v5.2