| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - synapti/nci-propaganda-production |
| | base_model: answerdotai/ModernBERT-base |
| | tags: |
| | - transformers |
| | - modernbert |
| | - text-classification |
| | - propaganda-detection |
| | - multi-label-classification |
| | - nci-protocol |
| | - semeval-2020 |
| | - onnx |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | # NCI Technique Classifier |
| |
|
| | Multi-label classifier that identifies specific propaganda techniques in text. |
| |
|
| | ## Model Description |
| |
|
| | This model is **Stage 2** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline: |
| |
|
| | - **Stage 1**: Fast binary detection - "Does this text contain propaganda?" |
| | - **Stage 2 (this model)**: Multi-label technique classification - "Which specific techniques are used?" |
| |
|
| | The classifier identifies **18 propaganda techniques** from the SemEval-2020 Task 11 taxonomy. |
| |
|
| | ## Propaganda Techniques |
| |
|
| | | # | Technique | F1 Score | Optimal Threshold | |
| | |---|-----------|----------|-------------------| |
| | | 0 | Loaded_Language | 95.3% | 0.3 | |
| | | 1 | Appeal_to_fear-prejudice | 85.1% | 0.3 | |
| | | 2 | Exaggeration,Minimisation | 49.0% | 0.4 | |
| | | 3 | Repetition | 55.9% | 0.4 | |
| | | 4 | Flag-Waving | 50.9% | 0.4 | |
| | | 5 | Name_Calling,Labeling | 79.0% | 0.1 | |
| | | 6 | Reductio_ad_hitlerum | 82.4% | 0.3 | |
| | | 7 | Black-and-White_Fallacy | 68.8% | 0.5 | |
| | | 8 | Causal_Oversimplification | 67.9% | 0.4 | |
| | | 9 | Whataboutism,Straw_Men,Red_Herring | 47.7% | 0.3 | |
| | | 10 | Straw_Man | 60.3% | 0.5 | |
| | | 11 | Red_Herring | 86.3% | 0.5 | |
| | | 12 | Doubt | 63.4% | 0.3 | |
| | | 13 | Appeal_to_Authority | 50.0% | 0.3 | |
| | | 14 | Thought-terminating_Cliches | 71.2% | 0.5 | |
| | | 15 | Bandwagon | 46.7% | 0.5 | |
| | | 16 | Slogans | 46.0% | 0.3 | |
| | | 17 | Obfuscation,Intentional_Vagueness,Confusion | 86.3% | 0.5 | |
| |
|
| | ## Performance |
| |
|
| | **Test Set Results (1,729 samples):** |
| |
|
| | | Metric | Default (0.5) | Optimized Thresholds | |
| | |--------|--------------|---------------------| |
| | | Micro F1 | 72.7% | **80.3%** | |
| | | Macro F1 | 62.5% | **68.3%** | |
| | | ECE (Calibration Error) | - | **0.0096** | |
| |
|
| | ## Usage |
| |
|
| | ### Basic Usage |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline( |
| | "text-classification", |
| | model="synapti/nci-technique-classifier", |
| | top_k=None # Return all labels |
| | ) |
| | |
| | text = "The radical left is DESTROYING our country!" |
| | results = classifier(text)[0] |
| | |
| | # Get detected techniques (using default 0.5 threshold) |
| | detected = [r for r in results if r["score"] > 0.5] |
| | for d in detected: |
| | print(f"{d['label']}: {d['score']:.2%}") |
| | ``` |
| |
|
| | ### With Calibration Config (Recommended) |
| |
|
| | The model includes a `calibration_config.json` file with optimized per-technique thresholds and temperature scaling for better calibrated confidence scores. |
| |
|
| | ```python |
| | import json |
| | from transformers import pipeline |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Load calibration config |
| | config_path = hf_hub_download( |
| | repo_id="synapti/nci-technique-classifier", |
| | filename="calibration_config.json" |
| | ) |
| | with open(config_path) as f: |
| | config = json.load(f) |
| | |
| | temperature = config["temperature"] # 0.75 |
| | thresholds = config["thresholds"] |
| | labels = config["technique_labels"] |
| | |
| | classifier = pipeline( |
| | "text-classification", |
| | model="synapti/nci-technique-classifier", |
| | top_k=None |
| | ) |
| | |
| | text = "Your text here..." |
| | results = classifier(text)[0] |
| | |
| | # Apply per-technique thresholds |
| | detected = [] |
| | for r in results: |
| | idx = int(r["label"].split("_")[1]) |
| | technique = labels[idx] |
| | threshold = thresholds.get(technique, 0.5) |
| | if r["score"] > threshold: |
| | detected.append((technique, r["score"])) |
| | ``` |
| |
|
| | ### ONNX Inference (Faster) |
| |
|
| | The model is also available in ONNX format for optimized inference: |
| |
|
| | ```python |
| | import onnxruntime as ort |
| | from transformers import AutoTokenizer |
| | from huggingface_hub import hf_hub_download |
| | import numpy as np |
| | |
| | # Download ONNX model |
| | onnx_path = hf_hub_download( |
| | repo_id="synapti/nci-technique-classifier", |
| | filename="onnx/model.onnx" |
| | ) |
| | |
| | # Load tokenizer and ONNX session |
| | tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier") |
| | session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"]) |
| | |
| | # Inference |
| | text = "Your text here..." |
| | inputs = tokenizer(text, padding="max_length", truncation=True, max_length=512, return_tensors="np") |
| | onnx_inputs = { |
| | "input_ids": inputs["input_ids"], |
| | "attention_mask": inputs["attention_mask"], |
| | } |
| | logits = session.run(None, onnx_inputs)[0] |
| | probs = 1 / (1 + np.exp(-logits)) # Sigmoid for multi-label |
| | ``` |
| |
|
| | ### Two-Stage Pipeline |
| |
|
| | For best results, use with the binary detector: |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | # Stage 1: Binary detection (fast filter) |
| | detector = pipeline("text-classification", model="synapti/nci-binary-detector") |
| | |
| | # Stage 2: Technique classification |
| | classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None) |
| | |
| | text = "Your text to analyze..." |
| | |
| | # Quick check first |
| | detection = detector(text)[0] |
| | if detection["label"] == "has_propaganda" and detection["score"] > 0.5: |
| | # Detailed technique analysis |
| | techniques = classifier(text)[0] |
| | detected = [t for t in techniques if t["score"] > 0.3] |
| | for t in detected: |
| | print(f"{t['label']}: {t['score']:.2%}") |
| | else: |
| | print("No propaganda detected") |
| | ``` |
| |
|
| | ## Calibration Config |
| |
|
| | The `calibration_config.json` file contains: |
| |
|
| | ```json |
| | { |
| | "temperature": 0.75, |
| | "thresholds": { |
| | "Loaded_Language": 0.3, |
| | "Appeal_to_fear-prejudice": 0.3, |
| | "Name_Calling,Labeling": 0.1, |
| | ... |
| | }, |
| | "metrics": { |
| | "ece": 0.0096, |
| | "micro_f1_optimized": 0.803, |
| | "macro_f1_optimized": 0.683 |
| | } |
| | } |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production): |
| |
|
| | - **23,000+ examples** with multi-hot technique labels |
| | - **Augmented data** for minority techniques (MLSMOTE) |
| | - **Hard negatives** from LIAR2 and Qbias datasets |
| | - **Class-weighted Focal Loss** to handle imbalance |
| |
|
| | ## Model Architecture |
| |
|
| | - **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) |
| | - **Parameters**: 149.6M |
| | - **Max Sequence Length**: 512 tokens |
| | - **Output**: 18 labels (multi-label sigmoid) |
| | - **Calibration Temperature**: 0.75 |
| |
|
| | ## Available Files |
| |
|
| | | File | Description | |
| | |------|-------------| |
| | | `model.safetensors` | PyTorch model weights | |
| | | `calibration_config.json` | Optimized thresholds & temperature | |
| | | `onnx/model.onnx` | ONNX model for fast inference | |
| | | `config.json` | Model configuration | |
| |
|
| | ## Training Details |
| |
|
| | - **Loss Function**: Class-weighted Focal Loss (gamma=2.0) |
| | - **Class Weights**: Inverse frequency weighting |
| | - **Optimizer**: AdamW |
| | - **Learning Rate**: 2e-5 |
| | - **Batch Size**: 8 (effective 32 with gradient accumulation) |
| | - **Epochs**: 5 with early stopping (patience=3) |
| | - **Hardware**: NVIDIA A10G GPU |
| |
|
| | ## Limitations |
| |
|
| | - Trained primarily on English text |
| | - Performance varies by technique (see table above) |
| | - Some techniques overlap semantically |
| | - Should be used with binary detector for best results |
| | - Threshold optimization recommended for specific use cases |
| |
|
| | ## Related Models |
| |
|
| | - [synapti/nci-binary-detector](https://huggingface.co/synapti/nci-binary-detector) - Stage 1 binary detector |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{da-san-martino-etal-2020-semeval, |
| | title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles", |
| | author = "Da San Martino, Giovanni and others", |
| | booktitle = "Proceedings of SemEval-2020", |
| | year = "2020", |
| | } |
| | |
| | @misc{nci-technique-classifier, |
| | author = {NCI Protocol Team}, |
| | title = {NCI Technique Classifier}, |
| | year = {2024}, |
| | publisher = {HuggingFace}, |
| | url = {https://huggingface.co/synapti/nci-technique-classifier} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|