NCI Technique Classifier

Multi-label classifier that identifies specific propaganda techniques in text.

Model Description

This model is Stage 2 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

Stage 1: Fast binary detection - "Does this text contain propaganda?"
Stage 2 (this model): Multi-label technique classification - "Which specific techniques are used?"

The classifier identifies 18 propaganda techniques from the SemEval-2020 Task 11 taxonomy.

Propaganda Techniques

#	Technique	F1 Score	Optimal Threshold
0	Loaded_Language	95.3%	0.3
1	Appeal_to_fear-prejudice	85.1%	0.3
2	Exaggeration,Minimisation	49.0%	0.4
3	Repetition	55.9%	0.4
4	Flag-Waving	50.9%	0.4
5	Name_Calling,Labeling	79.0%	0.1
6	Reductio_ad_hitlerum	82.4%	0.3
7	Black-and-White_Fallacy	68.8%	0.5
8	Causal_Oversimplification	67.9%	0.4
9	Whataboutism,Straw_Men,Red_Herring	47.7%	0.3
10	Straw_Man	60.3%	0.5
11	Red_Herring	86.3%	0.5
12	Doubt	63.4%	0.3
13	Appeal_to_Authority	50.0%	0.3
14	Thought-terminating_Cliches	71.2%	0.5
15	Bandwagon	46.7%	0.5
16	Slogans	46.0%	0.3
17	Obfuscation,Intentional_Vagueness,Confusion	86.3%	0.5

Performance

Test Set Results (1,729 samples):

Metric	Default (0.5)	Optimized Thresholds
Micro F1	72.7%	80.3%
Macro F1	62.5%	68.3%
ECE (Calibration Error)	-	0.0096

Usage

Basic Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None  # Return all labels
)

text = "The radical left is DESTROYING our country!"
results = classifier(text)[0]

# Get detected techniques (using default 0.5 threshold)
detected = [r for r in results if r["score"] > 0.5]
for d in detected:
    print(f"{d['label']}: {d['score']:.2%}")

With Calibration Config (Recommended)

The model includes a calibration_config.json file with optimized per-technique thresholds and temperature scaling for better calibrated confidence scores.

import json
from transformers import pipeline
from huggingface_hub import hf_hub_download

# Load calibration config
config_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="calibration_config.json"
)
with open(config_path) as f:
    config = json.load(f)

temperature = config["temperature"]  # 0.75
thresholds = config["thresholds"]
labels = config["technique_labels"]

classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier",
    top_k=None
)

text = "Your text here..."
results = classifier(text)[0]

# Apply per-technique thresholds
detected = []
for r in results:
    idx = int(r["label"].split("_")[1])
    technique = labels[idx]
    threshold = thresholds.get(technique, 0.5)
    if r["score"] > threshold:
        detected.append((technique, r["score"]))

ONNX Inference (Faster)

The model is also available in ONNX format for optimized inference:

import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import numpy as np

# Download ONNX model
onnx_path = hf_hub_download(
    repo_id="synapti/nci-technique-classifier",
    filename="onnx/model.onnx"
)

# Load tokenizer and ONNX session
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

# Inference
text = "Your text here..."
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=512, return_tensors="np")
onnx_inputs = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
}
logits = session.run(None, onnx_inputs)[0]
probs = 1 / (1 + np.exp(-logits))  # Sigmoid for multi-label

Two-Stage Pipeline

For best results, use with the binary detector:

from transformers import pipeline

# Stage 1: Binary detection (fast filter)
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Stage 2: Technique classification
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text to analyze..."

# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
    # Detailed technique analysis
    techniques = classifier(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
    for t in detected:
        print(f"{t['label']}: {t['score']:.2%}")
else:
    print("No propaganda detected")

Calibration Config

The calibration_config.json file contains:

{
  "temperature": 0.75,
  "thresholds": {
    "Loaded_Language": 0.3,
    "Appeal_to_fear-prejudice": 0.3,
    "Name_Calling,Labeling": 0.1,
    ...
  },
  "metrics": {
    "ece": 0.0096,
    "micro_f1_optimized": 0.803,
    "macro_f1_optimized": 0.683
  }
}

Training Data

Trained on synapti/nci-propaganda-production:

23,000+ examples with multi-hot technique labels
Augmented data for minority techniques (MLSMOTE)
Hard negatives from LIAR2 and Qbias datasets
Class-weighted Focal Loss to handle imbalance

Model Architecture

Base Model: answerdotai/ModernBERT-base
Parameters: 149.6M
Max Sequence Length: 512 tokens
Output: 18 labels (multi-label sigmoid)
Calibration Temperature: 0.75

Available Files

File	Description
`model.safetensors`	PyTorch model weights
`calibration_config.json`	Optimized thresholds & temperature
`onnx/model.onnx`	ONNX model for fast inference
`config.json`	Model configuration

Training Details

Loss Function: Class-weighted Focal Loss (gamma=2.0)
Class Weights: Inverse frequency weighting
Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 8 (effective 32 with gradient accumulation)
Epochs: 5 with early stopping (patience=3)
Hardware: NVIDIA A10G GPU

Limitations

Trained primarily on English text
Performance varies by technique (see table above)
Some techniques overlap semantically
Should be used with binary detector for best results
Threshold optimization recommended for specific use cases

Related Models

synapti/nci-binary-detector - Stage 1 binary detector

Citation

@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of SemEval-2020",
    year = "2020",
}

@misc{nci-technique-classifier,
  author = {NCI Protocol Team},
  title = {NCI Technique Classifier},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-technique-classifier}
}

License

Apache 2.0

Downloads last month: 218

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for synapti/nci-technique-classifier

Base model

answerdotai/ModernBERT-base

Quantized

(22)

this model

synapti
/

nci-technique-classifier