File size: 5,109 Bytes

e9202ac
1b8e0dd
 
 
8382f0d
6d86c84
1b8e0dd
 
 
 
 
 
 
 
 
 
 
 
e9202ac
 
1b8e0dd

---
language:
- en
license: mit
library_name: transformers
tags:
- propaganda-detection
- multi-label-classification
- modernbert
- nci-protocol
base_model: answerdotai/ModernBERT-base
datasets:
- synapti/nci-propaganda-production
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
---

# NCI Technique Classifier v2

Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.

## Model Description

This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
- **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists
- **Stage 2**: This model identifies which specific techniques are used

### Techniques Detected

| ID | Technique | Description |
|----|-----------|-------------|
| 0 | Loaded_Language | Using words with strong emotional implications |
| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear |
| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues |
| 3 | Repetition | Repeating the same message multiple times |
| 4 | Flag-Waving | Appeals to patriotism or group identity |
| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations |
| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit |
| 7 | Black-and-White_Fallacy | Presenting only two options when more exist |
| 8 | Causal_Oversimplification | Assuming a single cause for complex issues |
| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics |
| 10 | Straw_Man | Misrepresenting someone's argument |
| 11 | Red_Herring | Introducing irrelevant information |
| 12 | Doubt | Questioning credibility of sources |
| 13 | Appeal_to_Authority | Citing authorities to support claims |
| 14 | Thought-terminating_Cliches | Using clichés to end discussion |
| 15 | Bandwagon | Appeal to popularity |
| 16 | Slogans | Brief, striking phrases |
| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear |

## Training

- **Base Model**: `answerdotai/ModernBERT-base`
- **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test)
- **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Hardware**: NVIDIA A10G GPU

## Performance

| Metric | Score |
|--------|-------|
| Micro F1 | 80.2% |
| Macro F1 | 63.9% |
| Micro Precision | 83.4% |
| Micro Recall | 77.4% |

### Per-Technique Performance (selected)

| Technique | F1 Score |
|-----------|----------|
| Loaded_Language | 97.0% |
| Appeal_to_fear-prejudice | 89.7% |
| Name_Calling,Labeling | 84.3% |
| Flag-Waving | 82.1% |

## Usage

### With Transformers

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")

text = "The radical left is DESTROYING our great nation!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get techniques above threshold
threshold = 0.5
techniques = list(model.config.id2label.values())
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
print(detected)
```

### With NCI Protocol

```python
from nci.transformers.two_stage_pipeline import TwoStagePipeline

pipeline = TwoStagePipeline.from_pretrained(
    binary_model="synapti/nci-binary-detector-v2",
    technique_model="synapti/nci-technique-classifier-v2",
)

result = pipeline.analyze("The radical left is DESTROYING our great nation!")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
```

### ONNX Inference

ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup).

```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
session = ort.InferenceSession("onnx/model.onnx")

text = "WAKE UP AMERICA!"
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)

outputs = session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
probs = 1 / (1 + np.exp(-outputs[0]))  # sigmoid
```

## Limitations

- Trained primarily on English news articles
- May not generalize well to social media or other domains
- Threshold of 0.5 may need adjustment for specific use cases
- Multi-label classification means multiple techniques can be detected per text

## Citation

```bibtex
@misc{nci-technique-classifier-v2,
  author = {Synapti},
  title = {NCI Technique Classifier v2},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
}
```

## License

MIT License