|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- propaganda-detection |
|
|
- multi-label-classification |
|
|
- modernbert |
|
|
- nci-protocol |
|
|
base_model: answerdotai/ModernBERT-base |
|
|
datasets: |
|
|
- synapti/nci-propaganda-production |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# NCI Technique Classifier v2 |
|
|
|
|
|
Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model classifies text into 18 propaganda techniques as part of a two-stage pipeline: |
|
|
- **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists |
|
|
- **Stage 2**: This model identifies which specific techniques are used |
|
|
|
|
|
### Techniques Detected |
|
|
|
|
|
| ID | Technique | Description | |
|
|
|----|-----------|-------------| |
|
|
| 0 | Loaded_Language | Using words with strong emotional implications | |
|
|
| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear | |
|
|
| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues | |
|
|
| 3 | Repetition | Repeating the same message multiple times | |
|
|
| 4 | Flag-Waving | Appeals to patriotism or group identity | |
|
|
| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations | |
|
|
| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit | |
|
|
| 7 | Black-and-White_Fallacy | Presenting only two options when more exist | |
|
|
| 8 | Causal_Oversimplification | Assuming a single cause for complex issues | |
|
|
| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics | |
|
|
| 10 | Straw_Man | Misrepresenting someone's argument | |
|
|
| 11 | Red_Herring | Introducing irrelevant information | |
|
|
| 12 | Doubt | Questioning credibility of sources | |
|
|
| 13 | Appeal_to_Authority | Citing authorities to support claims | |
|
|
| 14 | Thought-terminating_Cliches | Using clichés to end discussion | |
|
|
| 15 | Bandwagon | Appeal to popularity | |
|
|
| 16 | Slogans | Brief, striking phrases | |
|
|
| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear | |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base Model**: `answerdotai/ModernBERT-base` |
|
|
- **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test) |
|
|
- **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques |
|
|
- **Epochs**: 5 |
|
|
- **Batch Size**: 16 |
|
|
- **Learning Rate**: 2e-5 |
|
|
- **Hardware**: NVIDIA A10G GPU |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| Micro F1 | 80.2% | |
|
|
| Macro F1 | 63.9% | |
|
|
| Micro Precision | 83.4% | |
|
|
| Micro Recall | 77.4% | |
|
|
|
|
|
### Per-Technique Performance (selected) |
|
|
|
|
|
| Technique | F1 Score | |
|
|
|-----------|----------| |
|
|
| Loaded_Language | 97.0% | |
|
|
| Appeal_to_fear-prejudice | 89.7% | |
|
|
| Name_Calling,Labeling | 84.3% | |
|
|
| Flag-Waving | 82.1% | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2") |
|
|
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2") |
|
|
|
|
|
text = "The radical left is DESTROYING our great nation!" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.sigmoid(outputs.logits)[0] |
|
|
|
|
|
# Get techniques above threshold |
|
|
threshold = 0.5 |
|
|
techniques = list(model.config.id2label.values()) |
|
|
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold] |
|
|
print(detected) |
|
|
``` |
|
|
|
|
|
### With NCI Protocol |
|
|
|
|
|
```python |
|
|
from nci.transformers.two_stage_pipeline import TwoStagePipeline |
|
|
|
|
|
pipeline = TwoStagePipeline.from_pretrained( |
|
|
binary_model="synapti/nci-binary-detector-v2", |
|
|
technique_model="synapti/nci-technique-classifier-v2", |
|
|
) |
|
|
|
|
|
result = pipeline.analyze("The radical left is DESTROYING our great nation!") |
|
|
print(f"Has propaganda: {result.has_propaganda}") |
|
|
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}") |
|
|
``` |
|
|
|
|
|
### ONNX Inference |
|
|
|
|
|
ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup). |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2") |
|
|
session = ort.InferenceSession("onnx/model.onnx") |
|
|
|
|
|
text = "WAKE UP AMERICA!" |
|
|
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512) |
|
|
|
|
|
outputs = session.run(None, { |
|
|
"input_ids": inputs["input_ids"], |
|
|
"attention_mask": inputs["attention_mask"] |
|
|
}) |
|
|
probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained primarily on English news articles |
|
|
- May not generalize well to social media or other domains |
|
|
- Threshold of 0.5 may need adjustment for specific use cases |
|
|
- Multi-label classification means multiple techniques can be detected per text |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{nci-technique-classifier-v2, |
|
|
author = {Synapti}, |
|
|
title = {NCI Technique Classifier v2}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/synapti/nci-technique-classifier-v2} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|