synapti's picture
Update model card with metrics and usage examples
1b8e0dd verified
---
language:
- en
license: mit
library_name: transformers
tags:
- propaganda-detection
- multi-label-classification
- modernbert
- nci-protocol
base_model: answerdotai/ModernBERT-base
datasets:
- synapti/nci-propaganda-production
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
---
# NCI Technique Classifier v2
Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.
## Model Description
This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
- **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists
- **Stage 2**: This model identifies which specific techniques are used
### Techniques Detected
| ID | Technique | Description |
|----|-----------|-------------|
| 0 | Loaded_Language | Using words with strong emotional implications |
| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear |
| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues |
| 3 | Repetition | Repeating the same message multiple times |
| 4 | Flag-Waving | Appeals to patriotism or group identity |
| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations |
| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit |
| 7 | Black-and-White_Fallacy | Presenting only two options when more exist |
| 8 | Causal_Oversimplification | Assuming a single cause for complex issues |
| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics |
| 10 | Straw_Man | Misrepresenting someone's argument |
| 11 | Red_Herring | Introducing irrelevant information |
| 12 | Doubt | Questioning credibility of sources |
| 13 | Appeal_to_Authority | Citing authorities to support claims |
| 14 | Thought-terminating_Cliches | Using clichés to end discussion |
| 15 | Bandwagon | Appeal to popularity |
| 16 | Slogans | Brief, striking phrases |
| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear |
## Training
- **Base Model**: `answerdotai/ModernBERT-base`
- **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test)
- **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Hardware**: NVIDIA A10G GPU
## Performance
| Metric | Score |
|--------|-------|
| Micro F1 | 80.2% |
| Macro F1 | 63.9% |
| Micro Precision | 83.4% |
| Micro Recall | 77.4% |
### Per-Technique Performance (selected)
| Technique | F1 Score |
|-----------|----------|
| Loaded_Language | 97.0% |
| Appeal_to_fear-prejudice | 89.7% |
| Name_Calling,Labeling | 84.3% |
| Flag-Waving | 82.1% |
## Usage
### With Transformers
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
text = "The radical left is DESTROYING our great nation!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)[0]
# Get techniques above threshold
threshold = 0.5
techniques = list(model.config.id2label.values())
detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
print(detected)
```
### With NCI Protocol
```python
from nci.transformers.two_stage_pipeline import TwoStagePipeline
pipeline = TwoStagePipeline.from_pretrained(
binary_model="synapti/nci-binary-detector-v2",
technique_model="synapti/nci-technique-classifier-v2",
)
result = pipeline.analyze("The radical left is DESTROYING our great nation!")
print(f"Has propaganda: {result.has_propaganda}")
print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
```
### ONNX Inference
ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup).
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
session = ort.InferenceSession("onnx/model.onnx")
text = "WAKE UP AMERICA!"
inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)
outputs = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid
```
## Limitations
- Trained primarily on English news articles
- May not generalize well to social media or other domains
- Threshold of 0.5 may need adjustment for specific use cases
- Multi-label classification means multiple techniques can be detected per text
## Citation
```bibtex
@misc{nci-technique-classifier-v2,
author = {Synapti},
title = {NCI Technique Classifier v2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
}
```
## License
MIT License