File size: 5,109 Bytes
e9202ac 1b8e0dd 8382f0d 6d86c84 1b8e0dd e9202ac 1b8e0dd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
language:
- en
license: mit
library_name: transformers
tags:
- propaganda-detection
- multi-label-classification
- modernbert
- nci-protocol
base_model: answerdotai/ModernBERT-base
datasets:
- synapti/nci-propaganda-production
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
---
# NCI Technique Classifier v2
Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.
## Model Description
This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
- **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists
- **Stage 2**: This model identifies which specific techniques are used
### Techniques Detected
| ID | Technique | Description |
|----|-----------|-------------|
| 0 | Loaded_Language | Using words with strong emotional implications |
| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear |
| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues |
| 3 | Repetition | Repeating the same message multiple times |
| 4 | Flag-Waving | Appeals to patriotism or group identity |
| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations |
| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit |
| 7 | Black-and-White_Fallacy | Presenting only two options when more exist |
| 8 | Causal_Oversimplification | Assuming a single cause for complex issues |
| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics |
| 10 | Straw_Man | Misrepresenting someone's argument |
| 11 | Red_Herring | Introducing irrelevant information |
| 12 | Doubt | Questioning credibility of sources |
| 13 | Appeal_to_Authority | Citing authorities to support claims |
| 14 | Thought-terminating_Cliches | Using clichés to end discussion |
| 15 | Bandwagon | Appeal to popularity |
| 16 | Slogans | Brief, striking phrases |
| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear |
## Training
- **Base Model**: `answerdotai/ModernBERT-base`
- **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test)
- **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Hardware**: NVIDIA A10G GPU
## Performance
| Metric | Score |
|--------|-------|
| Micro F1 | 80.2% |
| Macro F1 | 63.9% |
| Micro Precision | 83.4% |
| Micro Recall | 77.4% |
### Per-Technique Performance (selected)
| Technique | F1 Score |
|-----------|----------|
| Loaded_Language | 97.0% |
| Appeal_to_fear-prejudice | 89.7% |
| Name_Calling,Labeling | 84.3% |
| Flag-Waving | 82.1% |
## Usage
### With Transformers
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
text = "The radical left is DESTROYING our great nation!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)[0]
# Get techniques above threshold
threshold = 0.5
techniques = list(model.config.id2label.values())
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
print(detected)
```
### With NCI Protocol
```python
from nci.transformers.two_stage_pipeline import TwoStagePipeline
pipeline = TwoStagePipeline.from_pretrained(
binary_model="synapti/nci-binary-detector-v2",
technique_model="synapti/nci-technique-classifier-v2",
)
result = pipeline.analyze("The radical left is DESTROYING our great nation!")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
```
### ONNX Inference
ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup).
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
session = ort.InferenceSession("onnx/model.onnx")
text = "WAKE UP AMERICA!"
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)
outputs = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid
```
## Limitations
- Trained primarily on English news articles
- May not generalize well to social media or other domains
- Threshold of 0.5 may need adjustment for specific use cases
- Multi-label classification means multiple techniques can be detected per text
## Citation
```bibtex
@misc{nci-technique-classifier-v2,
author = {Synapti},
title = {NCI Technique Classifier v2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
}
```
## License
MIT License
|