File size: 5,967 Bytes
95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f 95130a1 4ff9e0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- propaganda-detection
- binary-classification
- modernbert
- nci-protocol
- text-classification
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
datasets:
- synapti/nci-binary-classification
base_model: answerdotai/ModernBERT-base
model-index:
- name: nci-binary-detector-v2
results:
- task:
type: text-classification
name: Binary Propaganda Detection
dataset:
name: NCI Binary Classification
type: synapti/nci-binary-classification
split: test
metrics:
- type: accuracy
value: 0.994
name: Accuracy
- type: f1
value: 0.994
name: F1
- type: precision
value: 0.989
name: Precision
- type: recall
value: 1.000
name: Recall
---
# NCI Binary Propaganda Detector v2
This model is Stage 1 of the NCI (Narrative Control Index) two-stage propaganda detection pipeline. It performs binary classification to detect whether text contains ANY propaganda techniques.
## Model Description
- **Model Type:** Binary text classifier
- **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Training Data:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification) (24,517 train, 1,727 validation, 1,729 test)
- **Language:** English
- **License:** Apache 2.0
## Performance
| Metric | Value |
|--------|-------|
| **Accuracy** | 99.4% |
| **Precision** | 98.9% |
| **Recall** | 100.0% |
| **F1 Score** | 99.4% |
| **False Positive Rate** | 1.47% |
| **False Negative Rate** | 0.00% |
### Confusion Matrix (Test Set, n=1,729)
```
Predicted
No Prop | Has Prop
Actual No Prop: 736 | 11
Actual Has Prop: 0 | 982
```
### Threshold Analysis
| Threshold | Accuracy | Precision | Recall | F1 |
|-----------|----------|-----------|--------|-----|
| 0.3 | 99.2% | 98.6% | 100% | 99.3% |
| 0.4 | 99.2% | 98.7% | 100% | 99.3% |
| **0.5** | **99.4%** | **98.9%** | **100%** | **99.4%** |
| 0.6 | 99.7% | 99.4% | 100% | 99.7% |
| 0.7 | 99.7% | 99.5% | 100% | 99.7% |
**Recommended threshold:** 0.5 (default) or 0.6 for reduced false positives
## Training Details
- **Loss Function:** Focal Loss (gamma=2.0, alpha=0.25) for class imbalance
- **Optimizer:** AdamW with weight decay 0.01
- **Learning Rate:** 2e-5 with warmup ratio 0.1
- **Batch Size:** 16 (effective 32 with gradient accumulation)
- **Epochs:** 5 with early stopping (patience=3)
- **Best Model Selection:** Based on F1 score on validation set
## Usage
### With Transformers Pipeline
```python
from transformers import pipeline
detector = pipeline(
"text-classification",
model="synapti/nci-binary-detector-v2"
)
result = detector("The radical left is DESTROYING our country!")
# [{"label": "has_propaganda", "score": 0.99}]
result = detector("The Federal Reserve announced a 0.25% rate increase.")
# [{"label": "no_propaganda", "score": 0.98}]
```
### With AutoModel
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector-v2")
text = "Wake up, people! They are hiding the truth from you!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
propaganda_prob = probs[0, 1].item()
print(f"Propaganda probability: {propaganda_prob:.2%}")
```
### Two-Stage Pipeline (Recommended)
For full propaganda analysis with technique identification:
```python
from transformers import pipeline
# Stage 1: Binary detection
binary_detector = pipeline(
"text-classification",
model="synapti/nci-binary-detector-v2"
)
# Stage 2: Technique classification
technique_classifier = pipeline(
"text-classification",
model="synapti/nci-technique-classifier-v2",
top_k=None
)
text = "Some text to analyze..."
# Run Stage 1
binary_result = binary_detector(text)[0]
if binary_result["label"] == "has_propaganda" and binary_result["score"] >= 0.5:
# Run Stage 2 only if propaganda detected
techniques = technique_classifier(text)[0]
detected = [t for t in techniques if t["score"] >= 0.3]
print(f"Detected techniques: {[t['label'] for t in detected]}")
else:
print("No propaganda detected")
```
## Labels
| Label ID | Label Name | Description |
|----------|------------|-------------|
| 0 | no_propaganda | Text does not contain propaganda techniques |
| 1 | has_propaganda | Text contains one or more propaganda techniques |
## Intended Use
### Primary Use Cases
- Media literacy tools and browser extensions
- Content moderation assistance
- Research on information manipulation
- Educational platforms for critical thinking
### Out of Scope
- Censorship or automated content removal
- Political targeting or surveillance
- Single-source truth determination
## Limitations
- Optimized for English text
- May have reduced performance on very short texts (<10 words)
- Trained primarily on political/news content; domain shift may affect performance
- Should be used as one signal among many, not as sole arbiter
## Related Models
- **Stage 2:** [synapti/nci-technique-classifier-v2](https://huggingface.co/synapti/nci-technique-classifier-v2) - Multi-label technique classification
- **Dataset:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification)
## Citation
If you use this model, please cite:
```bibtex
@misc{nci-binary-detector-v2,
author = {Synapti},
title = {NCI Binary Propaganda Detector v2},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/synapti/nci-binary-detector-v2}
}
```
|