synapti
/

nci-binary-detector-v2

@@ -1,78 +1,214 @@
 ---
-library_name: transformers
 license: apache-2.0
-base_model: answerdotai/ModernBERT-base
 tags:
-- generated_from_trainer
 metrics:
 - accuracy
 - f1
 - precision
 - recall
 model-index:
 - name: nci-binary-detector-v2
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# nci-binary-detector-v2
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0026
-- Accuracy: 0.9936
-- F1: 0.9944
-- Precision: 0.9889
-- Recall: 1.0
-- Roc Auc: 0.9989
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 32
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 32
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 5
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Accuracy | F1     | Precision | Recall | Roc Auc |
-|:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|:-------:|
-| 0.0232        | 0.1305 | 100  | 0.0114          | 0.9496   | 0.9575 | 0.9272    | 0.9899 | 0.9948  |
-| 0.0144        | 0.2609 | 200  | 0.0025          | 0.9925   | 0.9935 | 0.9890    | 0.9980 | 0.9997  |
-| 0.0074        | 0.3914 | 300  | 0.0037          | 0.9948   | 0.9955 | 0.9960    | 0.9949 | 0.9996  |
-| 0.0028        | 0.5219 | 400  | 0.0022          | 0.9971   | 0.9975 | 0.9960    | 0.9990 | 0.9995  |
-| 0.002         | 0.6523 | 500  | 0.0038          | 0.9942   | 0.9950 | 0.9910    | 0.9990 | 0.9983  |
-| 0.0004        | 0.7828 | 600  | 0.0023          | 0.9971   | 0.9975 | 0.9970    | 0.9980 | 0.9987  |
-| 0.0052        | 0.9132 | 700  | 0.0008          | 0.9959   | 0.9965 | 0.9930    | 1.0    | 1.0000  |
-### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.1
-- Tokenizers 0.22.1

 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
 tags:
+- propaganda-detection
+- binary-classification
+- modernbert
+- nci-protocol
+- text-classification
+pipeline_tag: text-classification
 metrics:
 - accuracy
 - f1
 - precision
 - recall
+datasets:
+- synapti/nci-binary-classification
+base_model: answerdotai/ModernBERT-base
 model-index:
 - name: nci-binary-detector-v2
+  results:
+  - task:
+      type: text-classification
+      name: Binary Propaganda Detection
+    dataset:
+      name: NCI Binary Classification
+      type: synapti/nci-binary-classification
+      split: test
+    metrics:
+    - type: accuracy
+      value: 0.994
+      name: Accuracy
+    - type: f1
+      value: 0.994
+      name: F1
+    - type: precision
+      value: 0.989
+      name: Precision
+    - type: recall
+      value: 1.000
+      name: Recall
 ---
+# NCI Binary Propaganda Detector v2
+This model is Stage 1 of the NCI (Narrative Control Index) two-stage propaganda detection pipeline. It performs binary classification to detect whether text contains ANY propaganda techniques.
+## Model Description
+- **Model Type:** Binary text classifier
+- **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
+- **Training Data:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification) (24,517 train, 1,727 validation, 1,729 test)
+- **Language:** English
+- **License:** Apache 2.0
+## Performance
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | 99.4% |
+| **Precision** | 98.9% |
+| **Recall** | 100.0% |
+| **F1 Score** | 99.4% |
+| **False Positive Rate** | 1.47% |
+| **False Negative Rate** | 0.00% |
+### Confusion Matrix (Test Set, n=1,729)
+```
+                  Predicted
+                  No Prop | Has Prop
+Actual No Prop:      736  |     11
+Actual Has Prop:       0  |    982
+```
+### Threshold Analysis
+| Threshold | Accuracy | Precision | Recall | F1 |
+|-----------|----------|-----------|--------|-----|
+| 0.3 | 99.2% | 98.6% | 100% | 99.3% |
+| 0.4 | 99.2% | 98.7% | 100% | 99.3% |
+| **0.5** | **99.4%** | **98.9%** | **100%** | **99.4%** |
+| 0.6 | 99.7% | 99.4% | 100% | 99.7% |
+| 0.7 | 99.7% | 99.5% | 100% | 99.7% |
+**Recommended threshold:** 0.5 (default) or 0.6 for reduced false positives
+## Training Details
+- **Loss Function:** Focal Loss (gamma=2.0, alpha=0.25) for class imbalance
+- **Optimizer:** AdamW with weight decay 0.01
+- **Learning Rate:** 2e-5 with warmup ratio 0.1
+- **Batch Size:** 16 (effective 32 with gradient accumulation)
+- **Epochs:** 5 with early stopping (patience=3)
+- **Best Model Selection:** Based on F1 score on validation set
+## Usage
+### With Transformers Pipeline
+```python
+from transformers import pipeline
+detector = pipeline(
+    "text-classification",
+    model="synapti/nci-binary-detector-v2"
+)
+result = detector("The radical left is DESTROYING our country!")
+# [{"label": "has_propaganda", "score": 0.99}]
+result = detector("The Federal Reserve announced a 0.25% rate increase.")
+# [{"label": "no_propaganda", "score": 0.98}]
+```
+### With AutoModel
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector-v2")
+tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector-v2")
+text = "Wake up, people! They are hiding the truth from you!"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.softmax(outputs.logits, dim=1)
+    propaganda_prob = probs[0, 1].item()
+print(f"Propaganda probability: {propaganda_prob:.2%}")
+```
+### Two-Stage Pipeline (Recommended)
+For full propaganda analysis with technique identification:
+```python
+from transformers import pipeline
+# Stage 1: Binary detection
+binary_detector = pipeline(
+    "text-classification",
+    model="synapti/nci-binary-detector-v2"
+)
+# Stage 2: Technique classification
+technique_classifier = pipeline(
+    "text-classification",
+    model="synapti/nci-technique-classifier-v2",
+    top_k=None
+)
+text = "Some text to analyze..."
+# Run Stage 1
+binary_result = binary_detector(text)[0]
+if binary_result["label"] == "has_propaganda" and binary_result["score"] >= 0.5:
+    # Run Stage 2 only if propaganda detected
+    techniques = technique_classifier(text)[0]
+    detected = [t for t in techniques if t["score"] >= 0.3]
+    print(f"Detected techniques: {[t['label'] for t in detected]}")
+else:
+    print("No propaganda detected")
+```
+## Labels
+| Label ID | Label Name | Description |
+|----------|------------|-------------|
+| 0 | no_propaganda | Text does not contain propaganda techniques |
+| 1 | has_propaganda | Text contains one or more propaganda techniques |
+## Intended Use
+### Primary Use Cases
+- Media literacy tools and browser extensions
+- Content moderation assistance
+- Research on information manipulation
+- Educational platforms for critical thinking
+### Out of Scope
+- Censorship or automated content removal
+- Political targeting or surveillance
+- Single-source truth determination
+## Limitations
+- Optimized for English text
+- May have reduced performance on very short texts (<10 words)
+- Trained primarily on political/news content; domain shift may affect performance
+- Should be used as one signal among many, not as sole arbiter
+## Related Models
+- **Stage 2:** [synapti/nci-technique-classifier-v2](https://huggingface.co/synapti/nci-technique-classifier-v2) - Multi-label technique classification
+- **Dataset:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification)
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{nci-binary-detector-v2,
+  author = {Synapti},
+  title = {NCI Binary Propaganda Detector v2},
+  year = {2024},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/synapti/nci-binary-detector-v2}
+}
+```