NCI Technique Classifier

Multi-label classifier that identifies specific propaganda techniques in text.

Model Description

This model is Stage 2 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

  • Stage 1: Fast binary detection - "Does this text contain propaganda?"
  • Stage 2 (this model): Multi-label technique classification - "Which specific techniques are used?"

The classifier identifies 18 propaganda techniques from the SemEval-2020 Task 11 taxonomy.

Propaganda Techniques

# Technique F1 Score Optimal Threshold
0 Loaded_Language 95.3% 0.3
1 Appeal_to_fear-prejudice 85.1% 0.3
2 Exaggeration,Minimisation 49.0% 0.4
3 Repetition 55.9% 0.4
4 Flag-Waving 50.9% 0.4
5 Name_Calling,Labeling 79.0% 0.1
6 Reductio_ad_hitlerum 82.4% 0.3
7 Black-and-White_Fallacy 68.8% 0.5
8 Causal_Oversimplification 67.9% 0.4
9 Whataboutism,Straw_Men,Red_Herring 47.7% 0.3
10 Straw_Man 60.3% 0.5
11 Red_Herring 86.3% 0.5
12 Doubt 63.4% 0.3
13 Appeal_to_Authority 50.0% 0.3
14 Thought-terminating_Cliches 71.2% 0.5
15 Bandwagon 46.7% 0.5
16 Slogans 46.0% 0.3
17 Obfuscation,Intentional_Vagueness,Confusion 86.3% 0.5

Performance

Test Set Results (1,729 samples):

Metric Default (0.5) Optimized Thresholds
Micro F1 72.7% 80.3%
Macro F1 62.5% 68.3%
ECE (Calibration Error) - 0.0096

Usage

Basic Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None  # Return all labels
)

text = "The radical left is DESTROYING our country!"
results = classifier(text)[0]

# Get detected techniques (using default 0.5 threshold)
detected = [r for r in results if r["score"] > 0.5]
for d in detected:
    print(f"{d['label']}: {d['score']:.2%}")

With Calibration Config (Recommended)

The model includes a calibration_config.json file with optimized per-technique thresholds and temperature scaling for better calibrated confidence scores.

import json
from transformers import pipeline
from huggingface_hub import hf_hub_download

# Load calibration config
config_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="calibration_config.json"
)
with open(config_path) as f:
    config = json.load(f)

temperature = config["temperature"]  # 0.75
thresholds = config["thresholds"]
labels = config["technique_labels"]

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None
)

text = "Your text here..."
results = classifier(text)[0]

# Apply per-technique thresholds
detected = []
for r in results:
    idx = int(r["label"].split("_")[1])
    technique = labels[idx]
    threshold = thresholds.get(technique, 0.5)
    if r["score"] > threshold:
        detected.append((technique, r["score"]))

ONNX Inference (Faster)

The model is also available in ONNX format for optimized inference:

import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import numpy as np

# Download ONNX model
onnx_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="onnx/model.onnx"
)

# Load tokenizer and ONNX session
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

# Inference
text = "Your text here..."
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=512, return_tensors="np")
onnx_inputs = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
}
logits = session.run(None, onnx_inputs)[0]
probs = 1 / (1 + np.exp(-logits))  # Sigmoid for multi-label

Two-Stage Pipeline

For best results, use with the binary detector:

from transformers import pipeline

# Stage 1: Binary detection (fast filter)
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Stage 2: Technique classification
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text to analyze..."

# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
    # Detailed technique analysis
    techniques = classifier(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
    for t in detected:
        print(f"{t['label']}: {t['score']:.2%}")
else:
    print("No propaganda detected")

Calibration Config

The calibration_config.json file contains:

{
  "temperature": 0.75,
  "thresholds": {
    "Loaded_Language": 0.3,
    "Appeal_to_fear-prejudice": 0.3,
    "Name_Calling,Labeling": 0.1,
    ...
  },
  "metrics": {
    "ece": 0.0096,
    "micro_f1_optimized": 0.803,
    "macro_f1_optimized": 0.683
  }
}

Training Data

Trained on synapti/nci-propaganda-production:

  • 23,000+ examples with multi-hot technique labels
  • Augmented data for minority techniques (MLSMOTE)
  • Hard negatives from LIAR2 and Qbias datasets
  • Class-weighted Focal Loss to handle imbalance

Model Architecture

  • Base Model: answerdotai/ModernBERT-base
  • Parameters: 149.6M
  • Max Sequence Length: 512 tokens
  • Output: 18 labels (multi-label sigmoid)
  • Calibration Temperature: 0.75

Available Files

File Description
model.safetensors PyTorch model weights
calibration_config.json Optimized thresholds & temperature
onnx/model.onnx ONNX model for fast inference
config.json Model configuration

Training Details

  • Loss Function: Class-weighted Focal Loss (gamma=2.0)
  • Class Weights: Inverse frequency weighting
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 8 (effective 32 with gradient accumulation)
  • Epochs: 5 with early stopping (patience=3)
  • Hardware: NVIDIA A10G GPU

Limitations

  • Trained primarily on English text
  • Performance varies by technique (see table above)
  • Some techniques overlap semantically
  • Should be used with binary detector for best results
  • Threshold optimization recommended for specific use cases

Related Models

Citation

@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of SemEval-2020",
    year = "2020",
}

@misc{nci-technique-classifier,
  author = {NCI Protocol Team},
  title = {NCI Technique Classifier},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-technique-classifier}
}

License

Apache 2.0

Downloads last month
151
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for synapti/nci-technique-classifier

Quantized
(14)
this model

Dataset used to train synapti/nci-technique-classifier