File size: 5,967 Bytes

---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- propaganda-detection
- binary-classification
- modernbert
- nci-protocol
- text-classification
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
datasets:
- synapti/nci-binary-classification
base_model: answerdotai/ModernBERT-base
model-index:
- name: nci-binary-detector-v2
  results:
  - task:
      type: text-classification
      name: Binary Propaganda Detection
    dataset:
      name: NCI Binary Classification
      type: synapti/nci-binary-classification
      split: test
    metrics:
    - type: accuracy
      value: 0.994
      name: Accuracy
    - type: f1
      value: 0.994
      name: F1
    - type: precision
      value: 0.989
      name: Precision
    - type: recall
      value: 1.000
      name: Recall
---

# NCI Binary Propaganda Detector v2

This model is Stage 1 of the NCI (Narrative Control Index) two-stage propaganda detection pipeline. It performs binary classification to detect whether text contains ANY propaganda techniques.

## Model Description

- **Model Type:** Binary text classifier
- **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Training Data:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification) (24,517 train, 1,727 validation, 1,729 test)
- **Language:** English
- **License:** Apache 2.0

## Performance

| Metric | Value |
|--------|-------|
| **Accuracy** | 99.4% |
| **Precision** | 98.9% |
| **Recall** | 100.0% |
| **F1 Score** | 99.4% |
| **False Positive Rate** | 1.47% |
| **False Negative Rate** | 0.00% |

### Confusion Matrix (Test Set, n=1,729)
```
                  Predicted
                  No Prop | Has Prop
Actual No Prop:      736  |     11
Actual Has Prop:       0  |    982
```

### Threshold Analysis

| Threshold | Accuracy | Precision | Recall | F1 |
|-----------|----------|-----------|--------|-----|
| 0.3 | 99.2% | 98.6% | 100% | 99.3% |
| 0.4 | 99.2% | 98.7% | 100% | 99.3% |
| **0.5** | **99.4%** | **98.9%** | **100%** | **99.4%** |
| 0.6 | 99.7% | 99.4% | 100% | 99.7% |
| 0.7 | 99.7% | 99.5% | 100% | 99.7% |

**Recommended threshold:** 0.5 (default) or 0.6 for reduced false positives

## Training Details

- **Loss Function:** Focal Loss (gamma=2.0, alpha=0.25) for class imbalance
- **Optimizer:** AdamW with weight decay 0.01
- **Learning Rate:** 2e-5 with warmup ratio 0.1
- **Batch Size:** 16 (effective 32 with gradient accumulation)
- **Epochs:** 5 with early stopping (patience=3)
- **Best Model Selection:** Based on F1 score on validation set

## Usage

### With Transformers Pipeline

```python
from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector-v2"
)

result = detector("The radical left is DESTROYING our country!")
# [{"label": "has_propaganda", "score": 0.99}]

result = detector("The Federal Reserve announced a 0.25% rate increase.")
# [{"label": "no_propaganda", "score": 0.98}]
```

### With AutoModel

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector-v2")

text = "Wake up, people! They are hiding the truth from you!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    propaganda_prob = probs[0, 1].item()

print(f"Propaganda probability: {propaganda_prob:.2%}")
```

### Two-Stage Pipeline (Recommended)

For full propaganda analysis with technique identification:

```python
from transformers import pipeline

# Stage 1: Binary detection
binary_detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector-v2"
)

# Stage 2: Technique classification
technique_classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier-v2",
    top_k=None
)

text = "Some text to analyze..."

# Run Stage 1
binary_result = binary_detector(text)[0]
if binary_result["label"] == "has_propaganda" and binary_result["score"] >= 0.5:
    # Run Stage 2 only if propaganda detected
    techniques = technique_classifier(text)[0]
    detected = [t for t in techniques if t["score"] >= 0.3]
    print(f"Detected techniques: {[t['label'] for t in detected]}")
else:
    print("No propaganda detected")
```

## Labels

| Label ID | Label Name | Description |
|----------|------------|-------------|
| 0 | no_propaganda | Text does not contain propaganda techniques |
| 1 | has_propaganda | Text contains one or more propaganda techniques |

## Intended Use

### Primary Use Cases
- Media literacy tools and browser extensions
- Content moderation assistance
- Research on information manipulation
- Educational platforms for critical thinking

### Out of Scope
- Censorship or automated content removal
- Political targeting or surveillance
- Single-source truth determination

## Limitations

- Optimized for English text
- May have reduced performance on very short texts (<10 words)
- Trained primarily on political/news content; domain shift may affect performance
- Should be used as one signal among many, not as sole arbiter

## Related Models

- **Stage 2:** [synapti/nci-technique-classifier-v2](https://huggingface.co/synapti/nci-technique-classifier-v2) - Multi-label technique classification
- **Dataset:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification)

## Citation

If you use this model, please cite:

```bibtex
@misc{nci-binary-detector-v2,
  author = {Synapti},
  title = {NCI Binary Propaganda Detector v2},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector-v2}
}
```