synapti commited on
Commit
4ff9e0f
·
verified ·
1 Parent(s): 27d58bb

Update model card with v2 evaluation metrics

Browse files
Files changed (1) hide show
  1. README.md +186 -50
README.md CHANGED
@@ -1,78 +1,214 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
- base_model: answerdotai/ModernBERT-base
 
 
5
  tags:
6
- - generated_from_trainer
 
 
 
 
 
7
  metrics:
8
  - accuracy
9
  - f1
10
  - precision
11
  - recall
 
 
 
12
  model-index:
13
  - name: nci-binary-detector-v2
14
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- # nci-binary-detector-v2
 
21
 
22
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
23
- It achieves the following results on the evaluation set:
24
- - Loss: 0.0026
25
- - Accuracy: 0.9936
26
- - F1: 0.9944
27
- - Precision: 0.9889
28
- - Recall: 1.0
29
- - Roc Auc: 0.9989
30
 
31
- ## Model description
 
 
 
 
 
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
 
 
34
 
35
- ## Intended uses & limitations
36
 
37
- More information needed
 
 
 
38
 
39
- ## Training and evaluation data
40
 
41
- More information needed
 
 
 
 
42
 
43
- ## Training procedure
 
 
 
44
 
45
- ### Training hyperparameters
46
 
47
- The following hyperparameters were used during training:
48
- - learning_rate: 2e-05
49
- - train_batch_size: 16
50
- - eval_batch_size: 32
51
- - seed: 42
52
- - gradient_accumulation_steps: 2
53
- - total_train_batch_size: 32
54
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
55
- - lr_scheduler_type: linear
56
- - lr_scheduler_warmup_ratio: 0.1
57
- - num_epochs: 5
58
- - mixed_precision_training: Native AMP
59
 
60
- ### Training results
61
 
62
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | Roc Auc |
63
- |:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|:-------:|
64
- | 0.0232 | 0.1305 | 100 | 0.0114 | 0.9496 | 0.9575 | 0.9272 | 0.9899 | 0.9948 |
65
- | 0.0144 | 0.2609 | 200 | 0.0025 | 0.9925 | 0.9935 | 0.9890 | 0.9980 | 0.9997 |
66
- | 0.0074 | 0.3914 | 300 | 0.0037 | 0.9948 | 0.9955 | 0.9960 | 0.9949 | 0.9996 |
67
- | 0.0028 | 0.5219 | 400 | 0.0022 | 0.9971 | 0.9975 | 0.9960 | 0.9990 | 0.9995 |
68
- | 0.002 | 0.6523 | 500 | 0.0038 | 0.9942 | 0.9950 | 0.9910 | 0.9990 | 0.9983 |
69
- | 0.0004 | 0.7828 | 600 | 0.0023 | 0.9971 | 0.9975 | 0.9970 | 0.9980 | 0.9987 |
70
- | 0.0052 | 0.9132 | 700 | 0.0008 | 0.9959 | 0.9965 | 0.9930 | 1.0 | 1.0000 |
71
 
 
72
 
73
- ### Framework versions
74
 
75
- - Transformers 4.57.3
76
- - Pytorch 2.9.1+cu128
77
- - Datasets 4.4.1
78
- - Tokenizers 0.22.1
 
 
 
 
 
 
1
  ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
  tags:
7
+ - propaganda-detection
8
+ - binary-classification
9
+ - modernbert
10
+ - nci-protocol
11
+ - text-classification
12
+ pipeline_tag: text-classification
13
  metrics:
14
  - accuracy
15
  - f1
16
  - precision
17
  - recall
18
+ datasets:
19
+ - synapti/nci-binary-classification
20
+ base_model: answerdotai/ModernBERT-base
21
  model-index:
22
  - name: nci-binary-detector-v2
23
+ results:
24
+ - task:
25
+ type: text-classification
26
+ name: Binary Propaganda Detection
27
+ dataset:
28
+ name: NCI Binary Classification
29
+ type: synapti/nci-binary-classification
30
+ split: test
31
+ metrics:
32
+ - type: accuracy
33
+ value: 0.994
34
+ name: Accuracy
35
+ - type: f1
36
+ value: 0.994
37
+ name: F1
38
+ - type: precision
39
+ value: 0.989
40
+ name: Precision
41
+ - type: recall
42
+ value: 1.000
43
+ name: Recall
44
  ---
45
 
46
+ # NCI Binary Propaganda Detector v2
47
+
48
+ This model is Stage 1 of the NCI (Narrative Control Index) two-stage propaganda detection pipeline. It performs binary classification to detect whether text contains ANY propaganda techniques.
49
+
50
+ ## Model Description
51
+
52
+ - **Model Type:** Binary text classifier
53
+ - **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
54
+ - **Training Data:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification) (24,517 train, 1,727 validation, 1,729 test)
55
+ - **Language:** English
56
+ - **License:** Apache 2.0
57
+
58
+ ## Performance
59
+
60
+ | Metric | Value |
61
+ |--------|-------|
62
+ | **Accuracy** | 99.4% |
63
+ | **Precision** | 98.9% |
64
+ | **Recall** | 100.0% |
65
+ | **F1 Score** | 99.4% |
66
+ | **False Positive Rate** | 1.47% |
67
+ | **False Negative Rate** | 0.00% |
68
+
69
+ ### Confusion Matrix (Test Set, n=1,729)
70
+ ```
71
+ Predicted
72
+ No Prop | Has Prop
73
+ Actual No Prop: 736 | 11
74
+ Actual Has Prop: 0 | 982
75
+ ```
76
+
77
+ ### Threshold Analysis
78
+
79
+ | Threshold | Accuracy | Precision | Recall | F1 |
80
+ |-----------|----------|-----------|--------|-----|
81
+ | 0.3 | 99.2% | 98.6% | 100% | 99.3% |
82
+ | 0.4 | 99.2% | 98.7% | 100% | 99.3% |
83
+ | **0.5** | **99.4%** | **98.9%** | **100%** | **99.4%** |
84
+ | 0.6 | 99.7% | 99.4% | 100% | 99.7% |
85
+ | 0.7 | 99.7% | 99.5% | 100% | 99.7% |
86
+
87
+ **Recommended threshold:** 0.5 (default) or 0.6 for reduced false positives
88
+
89
+ ## Training Details
90
+
91
+ - **Loss Function:** Focal Loss (gamma=2.0, alpha=0.25) for class imbalance
92
+ - **Optimizer:** AdamW with weight decay 0.01
93
+ - **Learning Rate:** 2e-5 with warmup ratio 0.1
94
+ - **Batch Size:** 16 (effective 32 with gradient accumulation)
95
+ - **Epochs:** 5 with early stopping (patience=3)
96
+ - **Best Model Selection:** Based on F1 score on validation set
97
+
98
+ ## Usage
99
+
100
+ ### With Transformers Pipeline
101
+
102
+ ```python
103
+ from transformers import pipeline
104
+
105
+ detector = pipeline(
106
+ "text-classification",
107
+ model="synapti/nci-binary-detector-v2"
108
+ )
109
+
110
+ result = detector("The radical left is DESTROYING our country!")
111
+ # [{"label": "has_propaganda", "score": 0.99}]
112
+
113
+ result = detector("The Federal Reserve announced a 0.25% rate increase.")
114
+ # [{"label": "no_propaganda", "score": 0.98}]
115
+ ```
116
+
117
+ ### With AutoModel
118
+
119
+ ```python
120
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
121
+ import torch
122
+
123
+ model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector-v2")
124
+ tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector-v2")
125
+
126
+ text = "Wake up, people! They are hiding the truth from you!"
127
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
128
+
129
+ with torch.no_grad():
130
+ outputs = model(**inputs)
131
+ probs = torch.softmax(outputs.logits, dim=1)
132
+ propaganda_prob = probs[0, 1].item()
133
+
134
+ print(f"Propaganda probability: {propaganda_prob:.2%}")
135
+ ```
136
+
137
+ ### Two-Stage Pipeline (Recommended)
138
+
139
+ For full propaganda analysis with technique identification:
140
 
141
+ ```python
142
+ from transformers import pipeline
143
 
144
+ # Stage 1: Binary detection
145
+ binary_detector = pipeline(
146
+ "text-classification",
147
+ model="synapti/nci-binary-detector-v2"
148
+ )
 
 
 
149
 
150
+ # Stage 2: Technique classification
151
+ technique_classifier = pipeline(
152
+ "text-classification",
153
+ model="synapti/nci-technique-classifier-v2",
154
+ top_k=None
155
+ )
156
 
157
+ text = "Some text to analyze..."
158
+
159
+ # Run Stage 1
160
+ binary_result = binary_detector(text)[0]
161
+ if binary_result["label"] == "has_propaganda" and binary_result["score"] >= 0.5:
162
+ # Run Stage 2 only if propaganda detected
163
+ techniques = technique_classifier(text)[0]
164
+ detected = [t for t in techniques if t["score"] >= 0.3]
165
+ print(f"Detected techniques: {[t['label'] for t in detected]}")
166
+ else:
167
+ print("No propaganda detected")
168
+ ```
169
 
170
+ ## Labels
171
 
172
+ | Label ID | Label Name | Description |
173
+ |----------|------------|-------------|
174
+ | 0 | no_propaganda | Text does not contain propaganda techniques |
175
+ | 1 | has_propaganda | Text contains one or more propaganda techniques |
176
 
177
+ ## Intended Use
178
 
179
+ ### Primary Use Cases
180
+ - Media literacy tools and browser extensions
181
+ - Content moderation assistance
182
+ - Research on information manipulation
183
+ - Educational platforms for critical thinking
184
 
185
+ ### Out of Scope
186
+ - Censorship or automated content removal
187
+ - Political targeting or surveillance
188
+ - Single-source truth determination
189
 
190
+ ## Limitations
191
 
192
+ - Optimized for English text
193
+ - May have reduced performance on very short texts (<10 words)
194
+ - Trained primarily on political/news content; domain shift may affect performance
195
+ - Should be used as one signal among many, not as sole arbiter
 
 
 
 
 
 
 
 
196
 
197
+ ## Related Models
198
 
199
+ - **Stage 2:** [synapti/nci-technique-classifier-v2](https://huggingface.co/synapti/nci-technique-classifier-v2) - Multi-label technique classification
200
+ - **Dataset:** [synapti/nci-binary-classification](https://huggingface.co/datasets/synapti/nci-binary-classification)
 
 
 
 
 
 
 
201
 
202
+ ## Citation
203
 
204
+ If you use this model, please cite:
205
 
206
+ ```bibtex
207
+ @misc{nci-binary-detector-v2,
208
+ author = {Synapti},
209
+ title = {NCI Binary Propaganda Detector v2},
210
+ year = {2024},
211
+ publisher = {HuggingFace},
212
+ url = {https://huggingface.co/synapti/nci-binary-detector-v2}
213
+ }
214
+ ```